Incremental parsing support #102

chrisjsewell · 2022-11-14T18:00:49Z

chrisjsewell
Nov 14, 2022

hey @jgm, as I see in #100 you are doing some updates to the parser,
I'm interested, have you considered supporting incremental parsing, or how this might be possible with djot?
(for things like syntax highlighting, LSPs etc, to update an existing AST, given an incremental change to the source text)

I know of limited support for this in markdown, e.g.: https://toastui.medium.com/the-need-for-a-new-markdown-parser-and-why-e6a7f1826137, but it feels like it should be easier with djot?

jgm · 2022-11-14T18:13:53Z

jgm
Nov 14, 2022
Maintainer

The tokenizer will be able to operate incrementally, in the sense that you can push some new input text and it will give you some match objects. For syntax highlighting that may be enough. (You don't need an AST for that.)

However, the match objects specify offsets for the elements, which means that if you modify an earlier part of the document, everything after that has to be recomputed.

0 replies

matklad · 2022-11-14T18:27:40Z

matklad
Nov 14, 2022

My gut feeling is that an incremental parser isn't strictly needed, and that just a fast non-incremental one would be enough. That definitely works for programming languages. For markup, that might not hold in the limit (text documents tend to be longer more freqently), but I am fairly certain that should be OK most of the time.

In any case, I think a rather "dumb" strategy can work for djot -- as the block structure is fairly robust, reparsing just damaged blocks should be OK.

The hard part isn't the parser per se, but rather incremental modification for AST. In rust-analyzer, our strategy is that the equivalent of matches events doesn't include offsets, only the lengths (that is, composite nodes don't carry any offsent/length info, tokens carry their length). This makes the operation of splicing a sequence of matches correct. Its up to the caller's code to arrange the sequence of matches into cheaply splicable data structure, and build a tree with range sums on top of that.

A nice resource for incremental trees is https://github.com/apple/swift/tree/main/lib/Syntax#syntax (or maybe pining @matklad on http://rust-lang.zulipchat.com).

0 replies

jgm · 2022-11-14T18:53:49Z

jgm
Nov 14, 2022
Maintainer

For markup, that might not hold in the limit (text documents tend to be longer more freqently), but I am fairly certain that should be OK most of the time.

I've tried using the playground (djot lua code compiled to wasm) to modify a 270K source file (150 page user manual). It's a bit laggy but still useable, with the 400ms debounce I'm using.

0 replies

jgm · 2022-11-14T18:55:54Z

jgm
Nov 14, 2022
Maintainer

In rust-analyzer, our strategy is that the equivalent of matches events doesn't include offsets, only the lengths (that is, composite nodes don't carry any offsent/length info, tokens carry their length). This makes the operation of splicing a sequence of matches correct.

That's a neat approach. We should think about whether that could work for djot.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incremental parsing support #102

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Incremental parsing support #102

chrisjsewell Nov 14, 2022

Replies: 4 comments

jgm Nov 14, 2022 Maintainer

matklad Nov 14, 2022

jgm Nov 14, 2022 Maintainer

jgm Nov 14, 2022 Maintainer

chrisjsewell
Nov 14, 2022

jgm
Nov 14, 2022
Maintainer

matklad
Nov 14, 2022

jgm
Nov 14, 2022
Maintainer

jgm
Nov 14, 2022
Maintainer