I'm not extremely familiar with the details of incremental parsing, but I have used Cursorless, a VSCode extension based on tree-sitter for voice controlled structured editing, and it is pretty powerful. You can use the structured editing when you want and also normal editing in between. Occasionally the parser will get things wrong and only change/take/select part of a function or what have you, but in general it's very useful, and I tend to miss it now that I am no longer voice coding much. I seem to remember that there was a similar extension for emacs (sans voice control). treemacs, or something? Anyone used that?
Trying to use any kind of syntax highlighter with TeX is a pain in the butt. I didn't mean LaTeX there. I mean TeX, which can rewrite it's own lexer, and a lot of libraries work by doing so. I move in and out of TeXInfo syntax and it basically just causes most editors to sit there screaming that everything is broken.
I think simpler is better when it comes to structured editing. Recursive teXt has the advantage that it proposes a simple block structure built into the text itself [1]. Of course, you will need to design your language to take advantage of this block structure.
No doubt, brackets of course also convey structure. But I think indentation
is better for visualising block structure. Inside these blocks, you can still use brackets, and errors like missing opening or closing brackets will not spill over into other blocks.
> it is now clear to me that there is ongoing work on structured editing which either doesn’t know about incremental parsing in general, or Tim’s algorithms specifically. I hope this post serves as a useful advert to such folk
I'm curious about this unnamed ongoing work (that is unaware of incremental parsing).
I don't know specifically - but even now, i still end up having to explain to people that incremental parsing/lexing (particularly without error recovery) is not hard, it is not really complicated, and as the author here said, Tim (et al) have made beautiful algorithms that make this stuff easy.
Heck, incremental lexing is even easy to explain.
For each token, track where the lexer actually looked in the input stream to make decisions. Any time that part of the input stream changes, every token to actually look at the changed portion of the input stream is re-lexed, and if the result changes, keep re-lexing until the before/after tokenstreams sync up again or you run out of input. That's it.
You can also make a dumber version that statically calculates the maximum lookahead (lookbehind if you support that too) of the entire grammer, or the maximum possible lookahead per token, and uses that instead of tracking the actual lookahead used.
In practice, this is often harder than just tracking the actual lookahead used.
In an LL system like ANTLR, incremental parsing is very similar - since it generates top-down parsers, it's the same basic theory - track what token ranges were looked at as you parse.
During incremental update, only descend into portions of the parse tree where the token ranges looked at contain modified tokens.
Bottom up is trickier.
Error recovery is the meaningfully tricky part in all of this.
Before tree-sitter, I was constantly explaining this stuff to people (I followed the projects that these algorithms came out of - ENSEMBLE, HARMONIA, etc).
After more people get that there are ways of doing this, but you still run into people who are re-creating things we solved in pretty great ways many years ago.
[0] https://www.cursorless.org/
reply