-
Notifications
You must be signed in to change notification settings - Fork 210
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RoadMap / Future Plans #1270
Comments
I thought I'd check back and see how chevrotain was going and I saw this "future plan". I'm still of the opinion you could simplify the main use cases and add standard streaming. I think it could be something like this: const chevrotain = require('chevrotain')
// or, import chevrotain from 'chevrotain'
// then, to use only the lexer, build it with the tokens like:
const lexer = chevrotain.lexer({
// this is an options object for whatever settings you allow.
// provide the tokens in an array imported from another file:
tokens: require('./my-tokens.js'),
})
// then, use the lexer...
const tokens = lexer.lex(string) // or tokenize(string), of course.
// for both lexing and parsing use the main exported function
// to build the parser.
// Note, this handles building the lexer internally, and,
// also does the performSelfAnalysis() before returning the parser.
// That way, all that is handled internally instead of the dev writing it out.
const parser = chevrotain({
// once again, the options object for settings.
// again, provide the tokens:
tokens: require('./my-tokens.js'),
// provide the grammar's rules in an array so they're ordered:
rules: require('./my-grammar.js'),
})
// now, parsing strings is straight forward:
// Note, this avoids making the dev set the input string on
// the parser before calling a rule. It's done internally by the
// library. It could start with the first rule in the provided array
// and go thru the rules until it finds one which matches the
// string and starts the parsing, or, allow them to set which
// rule to run first as an option to the `chevrotain()` function,
// or, make the parse() function accept an options object
// which contains the `string` property, and optionally,
// the name of the rule to call to start.
const result = parser.parse(string)
// or,
const result = parser.parse({
string, // the input string
start: 'ruleName', // the rule to start with
})
// for streaming, get a writer stream for them to
// write chunks of strings to:
const writer = parser.writer({ /* any options */ })
// then, stream the input to the parser's writer:
someInputStream.pipe(writer) In the files To get the streaming to work, the parsing needs to use some kind of "runner" or "executor" which knows how to work the parser for a string input. It holds the input string, so, yet another reason to not have the dev's set the string on the parser object itself. When it runs out of string content, or, has some at the end which it isn't able to match yet, it can hold onto the last bit of unused string, and call the "next/done" callback and return, waiting for another chunk of data to be provided. When it comes in, it'll start with the unused bit of string from before, possibly combining it with the new chunk first, and continue on. Either parsing ends up reaching a rule which is a terminal for all parsing, or, the parsing could go on indefinitely as more chunks come in. Also, whether the parsing produces an AST, or a result object, is still up to the dev. They could do what they want in the rules to produce output. Might allow them to provide a function in the options object which returns an object, or an array, or a class instance, or whatever they want as the "result". Then, provide that to the rules functions when they're called so they can load stuff into the result. The streaming version would then need to provide a callback function to receive the "result" when one is finished. // so, add a callback to streaming:
const writer = parser.writer(myResultCallbackFn)
inputStream.pipe(writer)
// or, for the options object:
const writer = parser.writer({
done: myResultCallbackFn,
})
inputStream.pipe(writer) I seem to remember there was a visitor exported, too. To do the visitor thing, provide the visitor as a function to the parse call or writer options, like this: // synchronous:
const result = parser.parse({
string: inputString,
visitor: myVisitorFn,
})
// streaming:
const writer = parser.writer({
visitor: myVisitorFn,
done: myResultCallbackFn, // if still needed...
})
inputStream.pipe(writer) This style seems more JavaScript-land to me than the current chevrotain API. I'm not sure which features, if any, this doesn't allow for, so, if there's something glaringly missing, I'm not doing it intentionally, I'm just not aware of it. Oh, and, for the tokens definition, the chevrotain package's export can still contain the Token class for them to extend or instantiate when creating tokens. And the rules can still be defined like they currently are by calling the RULE function, but, the builder function does that for them, the dev only provides the name and the function. So, like this: // the way it is now with a reference to the parser as a dollar sign,
// and all tokens available in the outer scope:
$.RULE("selectClause", () => {
$.CONSUME(Select)
$.AT_LEAST_ONE_SEP({
SEP: Comma,
DEF: () => {
$.CONSUME(Identifier)
}
})
})
// file 'my-grammar.js':
// the way to define the rule as an element in an array which the
// parser will call to make the rule:
modules.export = [ // the rules, in order:
// Note:
// 1. named function means parser can get 'selectClause' by fn.name.
// 2. the other rules are provided via the `rules` arg.
// 3. the tokens are all available in the `tokens` arg.
// 4. the result, if provided as an option to chevrotain(), is the third arg.
function selectClause(rules, tokens, result) {
// `this` is the parser/runner/recorder thing.
this.consume(tokens.select)
this.atLeastOneSep({
sep: tokens.comma,
def: () => {
this.consume(tokens.identifier)
}
})
},
]
// then, the builder/parser calls the RULE function using the name of the
// of the function(s) provided in the rules array and each function as the
// second arg. It handles this work itself, not the dev.
// so, for each rule in the array (a rule being a function):
theParser.RULE(fn.name, fn)
// so, something like:
rulesArray.forEach(fn => { this.RULE(fn.name, fn) }, parser) Anyway, just my opinion, my thoughts, I think this looks more JavaScript-y and allows concurrent parsing of different strings at the same time. Could ask for a writer and pipe input to it, then ask for another writer and pipe input to it, and let the streaming go with each runner/executor holding the internal state so they're separate from each other even tho they're working their way thru their own input at the same time, asynchronously, as they get more input. |
Hello @elidoran and thanks for providing this in-depth feedback 👍 More JavaScript-y style APIsWhen I created the original project which eventually became Chevrotain Unfortunately implementing stylistic / subjective API changes is outside the scope at this time. Also note that many(most?) of the stylistic changes you recommend should be possible
Steaming APIsThis may arrive at some point in the future, particularly if I implement some lexer adapter API (#528 ), My own use cases for Chevrotain Parsers if often around Editors / IDEs. Cheers. |
closing this in favor of #1739 |
The main focus at this point is simplification of Chevrotain and reducing the API surface area. The goal here is to reduce the TCO of maintaining Chevrotain. This means deprecating and removing some features that are:
A secondary topic is modernization of tooling and the code base.
Potential features/components for deprecation evaluation.
Modernization Topics
The text was updated successfully, but these errors were encountered: