First-class JSONC manipulation in Rust
In Deno and dprint (two Rust projects I maintain), there are certain cases where a JSON with comments (JSONC) configuration file needs to be programmatically updated.
For example, running the following in Deno...
> deno add jsr:@david/dax
Add jsr:@david/dax@0.42.0
...adds the @david/dax JSR package as a dependency to the configuration file.
Current approach (not good)
Our current approach involves parsing a JSONC file with jsonc-parser to an AST, then using that to build up a collection of "text changes" and finally applying the text changes to the original text.
For example, say I have the following dprint.jsonc file and we want to add a new url to the plugins array:
{
"plugins": [
"https://plugins.dprint.dev/json-0.19.1.wasm"
]
}
To do that, we'd examine this code, then construct a collection of text changes like the following and have some other code manipulate the original string to apply theses changes.
[{
"range": [66, 66],
"newText": ",\n \"https://plugins.dprint.dev/toml-0.6.3.wasm\""
}]
Example non-Rust pseudocode
/// Adds a plugin url to the dprint config file's plugins array.
/// ```jsonc
/// {
/// "plugins": [
/// "https://plugins.dprint.dev/toml-0.6.3.wasm",
/// "<new url goes here>"
/// ]
/// }
/// ```
function addPluginToJson(jsonText, url) {
const changes = [];
// parse to an ast
const ast = parseJson(jsonText);
// if the root is not an object, just replace it with one
if (ast.value?.kind !== "object") {
return `{
"plugins": [
"${url}"
]
}
`;
}
// find the plugins property
const pluginsProp = ast.value.properties
.find(p => p.name === "plugins");
if (pluginsProp?.value?.kind !== "array") {
// doesn't exist, so add it to the root object
const lastProperty = ast.value.properties.at(-1);
const insertIndex = lastProperty?.end ?? ast.value.start + 1;
const maybeComma = lastProperty == null ? "," : "";
changes.push({
range: [insertIndex, insertIndex],
text: `${maybeComma}\n "plugins": [\n "${url}" ]`,
});
} else {
// add the url to the existing plugins array
const lastPlugin = pluginsProp.value.at(-1);
const insertIndex = lastPlugin?.end ?? pluginsProp.value.start + 1;
const maybeComma = lastPlugin == null ? "," : "";
changes.push({
range: [insertIndex, insertIndex],
text: `${maybeComma}\n "${url}"`,
});
}
// apply the text changes to the json text
return applyTextChanges(jsonText, changes);
}
This is very complex. To do the high level task of adding an array element, we need to do a lot of low level work. A proper implementation of this would need to deal with indentation, understand what newline kind the file uses, handle comments, and understand if the file uses trailing commas.
We could address these concerns in the code, but doing so would significantly increase its complexity and hurt maintainability. It would mean similar complex solutions throughout the codebase making new features, changes, and bug fixes time consuming.
Discarded Solution: Better text change API
Some solutions in the wild look like this:
const editResult = modify(jsonText, ["plugins"], newPluginUrl, {
isArrayInsertion: true,
});
const newText = applyEdits(jsonText, editResult);
While this solution works for many cases, I don't believe it provides the flexibility I want for more complex JSONC modifications, such as manipulating comments. I also wanted a solution where subsets of the JSONC data can be focused on and manipulated in place.
Goal
The API I idealized was one where the code looks similar to this list where everything is described at a high level:
- Parse the text.
- Get and ensure the root value is an object.
- Get and ensure that object has a plugins array value property.
- Append the url to the plugins array.
- Get the final text.
Solution
The newly released 0.26 version of
jsonc-parser now includes a "cst"
feature that can be enabled in your Cargo.toml file:
jsonc-parser = { version = "0.26", features = ["cst"] }
This exposes the jsonc_parser::cst
module.
Now, let's rewrite the above example code using this new API:
use jsonc_parser::cst::CstRootNode;
use jsonc_parser::cst::CstInputValue;
use jsonc_parser::errors::ParseError;
use jsonc_parser::json;
/// Add a plugin url to the dprint config file's plugins array.
///
/// ```jsonc
/// {
/// "plugins": [
/// "https://plugins.dprint.dev/toml-0.6.3.wasm",
/// "<new url goes here>"
/// ]
/// }
/// ```
pub fn add_to_plugins_array(
file_text: &str,
url: &str,
) -> Result<String, ParseError> {
let root_node = CstRootNode::parse(file_text, &Default::default())?;
let root_obj = root_node.object_value_or_set();
let plugins = root_obj.array_value_or_set("plugins");
plugins.ensure_multiline();
plugins.append(json!(url));
Ok(root_node.to_string())
}
The complexity is abstracted away, and low level concerns are automatically handled.
- Comments in the file are maintained and not shifted around when making changes.
- Proper indentation and newlines are handled for us.
- If the data currently uses trailing commas, that will be respected.
- Trailing commas can be forced by calling
root_obj.set_trailing_commas(...)
- Trailing commas can be forced by calling
There's a lot more you can do with this. I'd recommend reading the documentation to see what's possible and please consider contributing if you see any other improvements. Also, please open issues for any bugs or scenarios you think it could be smarter about.
Implementation
This implementation uses a concrete syntax tree (CST) which is like an abstract syntax tree (AST), but also stores the whitespace, tokens, and comments in the tree. This allows for easily manipulating the tree in place taking into account everything found in the file, then printing it out when done.
For parsing, I didn't want to implement a new parser for the CST, so I just reused the existing AST parser in jsonc-parser, then converted that to a CST. The parser already had an option for collecting tokens & comments, and if you have the AST, tokens, comments, & original text, you can easily construct a CST.
On the internal structure of the CST, I didn't want to include any dependencies
to help with this (by default, jsonc-parser
has zero dependencies), so I
rolled with my own solution. Internally, each node in the tree contains an
Rc<RefCell<T>>
where T
is its data and parent. The parent is referenced via
a weak reference so that
the memory used gets cleaned up when you're done (this means you must not drop
the root node or a panic may occur to prevent bugs when doing certain operations
). I'm unsure if this is the best solution here, but it seems to work fine and
generally the root node is kept around to get the final text anyway.