-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: unbounded lookahead using adaptive predict #1714
Conversation
51f7fbe
to
b51a35d
Compare
Hello @msujew Feature Questions
Code Organization / Beginning Code Review
|
So, basically the
Good question. I haven't tested it so far. I plan to evaluate the performance more in-depth (mostly cpu/memory consumption) for my thesis, but that will take some time. I would be grateful if you were to do some basic performance testing for this feature. In theory the performance of this should be quite good, especially after it completed its "warmup"-phase. Compared to the usual LL(k) approach, the lookahead strategy in adaptive LL(*) is build during parsing and not during the self analysis. When it parses an alternative for the first time, it will run the Thanks for the initial review/feedback, I will get to that soon. |
Hybrid ApproachGenerally this the optimal approach imho.
|
I've tried running some benchmarks but I am getting parsing errors. If I change it to use "maxLookahead=0" and "adaptivePredict=true" it fails the very simple unit test. |
Interesting, it seems like every example grammar does not work with that config. Weird that my tests couldn't catch that. Thanks for the info, I'll investigate it. |
736c12e
to
1310160
Compare
@bd82 I fixed most grammars for now. The
I was able to test my approach for the JSON and CSS grammars using the benchmark. However, I am not happy with the current results: I will rethink my approach and take a closer look at what Antlr does. |
|
Yes.
Sure, I planned on correcting the issues first. Nevertheless, I think my current approach to the issue is too naive. I looked at Antlr's implementation of LL(*) (which is actually interpreted, there is no code generation done for the lookahead automaton), and I'm missing some crucial elements of the algorithm, which should be computed during the self analysis phase. For example, Antlr actually builds two separate Automaton. The lookahead automaton is build during parsing, while the other is build at startup and is used to compute the actual lookahead automaton.
That's true, however my main incentive for building this feature - besides writing about it in my masters thesis - is using it in Langium for every lookahead k > 1. That would heavily improve usability of the framework for more complex alternatives. Therefore I aim to keep performance impact at a minimum. |
If we can achieve good enough, performance I'd like to evaluate using the adaptive predict like this in Chevrotain |
BTW @msujew WDYT about implementing the new lookahead logic as a separate package?
I am trying to slowly transition Chevrotain to a mono-repo structure. Naturally this would only be possible if the new Lookahead logic has no depedencies on |
Sounds reasonable. That would probably even help me quite a lot, just to basically have a "clean slate" to start the feature off. I've limited myself a bit by reusing existing interfaces as much as possible. Makes testing probably also a bit easier. |
@bd82 I have spent the last few weekends on working on a new implementation based on the original ANTLR implementation. The results for JSON, CSS and ECMA5 processed purely through LL(*) are quite promising, even without the optimization of using LL(1) where appropriate (that should improve performance noticably): I assume I'll have something ready for a basic review by next weekend. :) |
I will try to have a look at this on the weekend |
1310160
to
efa4bdd
Compare
@bd82 Alright, pushed everything to the upstream branch connected to this PR. 14 tests are currently failing, but that's because they either rely on old behavior (I completely removed LL(k) lookahead for now) or override the related APIs, which aren't called anymore. Take your time with reviewing, this isn't time sensitive for Langium and my thesis doesn't depend on getting this merged. |
bf767ab
to
d280830
Compare
@bd82 Great news, I just added a small optimization, replacing the DFA cache (previously a It is now way faster than LL(k) for more complex grammars (at least in my benchmarks). Additionally, I was able to fix a few more tests, which were failing due to a bad implementation of LL(1) on my part. Only 6 tests are still failing, which rely on overriding the LL(k) lookahead method. Also please ignore the changed configs and debug code. I plan on cleaning this up sometime. |
@msujew |
Okay I'm finally starting to dive into this topic. Also ran the benchmark on:
It shows slightly different results than what you have posted, but they are still very good (none blocking) and there can be differences between different engines/cpus/... so that does not matter... |
Hi @msujew I am merging the current work into new branch |
Implements an LL(*) lookahead algorithm and removes the existing LL(k) lookahead. The design is heavily inspired by Antlr's existing LL(*) solution titled "Adaptive LL(*)". In fact, the general structure of the Antlr lookahead code has been adapted to provide this feature.
There is a caveat that differentiates this approach to the LL(k) lookahead algorithm: As we don't simply iterate over a sorted list of alternatives, we can't decide on taking the "first" alternative in the alternation. Instead the LL(*) algorithm will always return the longest matching alternative, just like Antlr does.
While this PR is mostly feature complete, there are still some things that have yet to be addressed. Nevertheless, I wanted to create the PR at this stage to get some feedback on the design of the feature. The documentation for this feature is still missing and validation