-
Notifications
You must be signed in to change notification settings - Fork 15
Reconfigured chumsky example to be more idiomatically geared toward performance #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Actually, I ended up adding support for equivalent codepoint parsing back in (and found it to have little meaningful difference for performance regardless): the parser should be the same as the others now. |
Yes, I would like to avoid getting into performance games. My focus for doing so is through what is idiomatic for the parser and I am mostly determining that by what maintainers are putting in their end-user documentation. Some of the changes (e.g. switching to However, as chumsky's example is also showing other features, like error recovery, I think I'm fine with dropping those. This is all fuzzy because these can't really be straight comparisons across them. |
Certainly. As I've seen myself, it's all too easy to over-fit for a specific benchmark at the cost of anything useful. I've adjusted the example to use the idiomatic text parsers provided: the effect on performance is fairly negligible anyway. My interpretation of this repository is that it's showing the results a user can expect if they flick the library to whatever the 'maximum performance' settings are but otherwise stick to idiomatic constructs: I think the state of the example is now as close to that as I can get. |
As I am not an expert in any of these individual libraries, I would feel much more comfortable having these more structurally match the idiomtic examples. This will make it easier for me to update between breaking changes. For example,
No, it is purely on idiomatic. There are changes I could make to make the winnow code faster that I don't do. When I update to |
Something else I would say is that it would be nice to have some more diverse sample data.
Then that's fine: chumsky defaults to faster configuration (hence this PR's diff having a negative size), so it's hard to see anything unidiomatic here. |
I am open to changing the test data so long as its hefty enough that most forms of jitter will have less of an impact.. My preference is to stick to one piece of test data.
I can understand if you feel this is idiomatic but I would still like to stick to something based off an example for maintenance sake. |
I'm not really sure how to respond to this, since I'm now not sure what the intention here is. If the goal is to just copy-and-paste the first obvious example from any given library, then the benchmarks are going to be near-enough useless to any potential users. Most of the libraries, and chumsky in particular, are designed from the start to be extensible and to allow users to choose where they want to sit on the performance vs error quality vs recovery spectrum. Chumsky's JSON example is set up specifically to show off the error and recovery system (since that's the novel thing the library brings to the table), deliberately at the expense of performance. A 'first impressions' example like this doesn't communicate anything useful to users that are interested specifically in performance at all. As-is, the changes I've made in this PR are both a simplification (in terms of code size, and in terms of the crate's idiomatic abstractions), as well as a move toward functionality that's more representative of what other parsers being compared are actually doing. I don't think I can justify things any better than that. If you're concerned about maintainability: don't be. Everything being used here is sufficiently fundamental to the API that it's not going to change. The code should keep working just fine far into the future. I'm always happy to be pinged if you want it to be updated and you're not sure how, of course. |
While I appreciate your offer for help, in the end I am responsible for maintaining this repo and have to be willing to commit to maintain what is here. Only accepting external examples, with minor adjustments, is a way for me to make sure I can commit to continue to offer this. While you mention the accuracy of the results, that goes both ways. I either delegate responsibility for what is "idiomatic" to each library's documentation or I'm taking on the responsibility for auditing PRs for how "idiomatic" they are without me being familiar enough with them for what they do. I offered to allow the error recovery calls to be removed. I can understand that the structure of the parser might need to also change to better support that. Not as a workaround to this problem but in case its a helpful suggestion, what I do with winnow is I have multiple json examples that are meant to highlight different features / approaches and the diff between examples is meant to be kept to only those differences. I've even considered setting it up so the diffs get rendered in the docs. |
I'm not really sure what you're asking me to do in order to satisfy your requirements, sorry. My issue is that as-is, the benchmark paints an unrepresentative and unfair view of the performance users can expect on account of the fact that the library is being configured to do a number of things not even provided as options by the other libraries. I'm not interested in gaming the benchmarks - I just want users to not jump to unjustified conclusions given that most folks are going to be taking these numbers as authoritative. I understand that you can't delegate to the point that this becomes a 'Benchmarks Game' scenario, but that does necessarily mean you taking the time to understand what counts as 'idiomatic' in each library rather than just using the first example that you happen to chance upon. That's just part-and-parcel of responsibly maintaining a benchmark like this that users are going to be drawing conclusions from, in my view. Please, tell me what you're looking for from this MR in actionable terms. I'm not opening this MR in bad faith. Right now, I'm not seeing much alternative than to advise users that they ignore the numbers in this repository because they do not usefully represent reality. I really don't want to be doing that because it feels unnecessarily adversarial. If it's interesting to you, I recently updated the benchmarks in my own repository, which are similar but are performed on more diverse sample data. |
I'm sorry, I thought my request is clear. I will accept a trivial reduction of an existing example. |
My apologies, I think I misunderstood what you were saying. I've changed the PR to be a strict reduction of the In addition, and to help ease any possible maintenance burden, I've added this as an example in the chumsky repo (i.e: if you ever want to update things, you can just copy it from there). I'll do my best to keep it as idiomatic/fair as I can. |
0ce7b82 updated the numbers. This also included a Rust 1.85.1 -> 1.86 upgrade
Is most of that overhead from the rich error? I would be surprised if the no-error parser case gets slowed down by Error reporting is one of those areas that make this comparison hard.
(not even bother to compare the others since it would take a bit more work to figure out) |
I wonder if I should add an example of error output for each one with the caveat that libraries may support customizing errors to balance performance vs quality. That way, regardless of defaults, its clearer what trade off an individual implementation has made. |
Most likely, yes. It's probably the case that recovery alone adds relatively little to execution time.
Certainly. I imagine that there isn't any common denominator to be found between the libraries, let alone one that can provide useful expectations. I don't envy your position in trying to balance these things. That said, the fact that a parser library can't be configured for speed is still a meaningful data point for users. I think benchmarks should seek to represent the 'worst of the best' (i.e: the worse-performing aspects of a best-practice written-for-speed implementation), and so 'best' should naturally include pushing whatever 'go fast' button the library has.
When you say 'location', what do you mean? Chumsky also always tracks location (i.e: the position in the input that an error occurred at) since it's important information for error prioritisation & merging logic, but spans specifically are a layer on top of that which can be opted into according to which combinator of input and error types you decide to use.
As I say, I think the story is a bit more complicated than it might appear for chumsky. There's a lot going on 'under the hood' that isn't obvious at first glance. Even If |
Ah, good to know! I had assumed it didn't, like
I've not looked into span tracking yet for winnow, in part assuming (likely incorrectly) that supporting that is likely a point where people would switch from parsing directly to an AST to separate lexing and parsing (and then the span is stored inside your Token). If you have more background on how users specify spans for parsing directly to an AST and why its important for that use case, I'd love to learn it! |
So a thing I've found that is really important is the ability to be able to generate fresh spans that correspond to a region of the input. For example, your input might be a list of token/span tuples like let struct_parser: impl Parser<_, StructNode> = ... ;
struct_parser.map_with(|st, e| (st, e.span())) And now your output type is The logic for doing this gets a bit convoluted if you don't design for it from the get-go (I think I'm on my 5th from-scratch rewrite and only now do I feel like it's truly integrated in a nice way), but if you know roughly what shape your span types are, you can merge them together to do this without too much effort. For types like Very happy to discuss this in more detail if you're interested in doing so! This is one of those things I've done way too much thinking about over the past few years. |
Ok, so this isn't about taking an error location when parsing a |
Nice! I'll take a read through that. |
Hopefully you'll agree that nothing here is leaning too aggressively into the benchmark in question: just removing the 'frilly extras' that chumsky provides that favour error quality over performance (the implementation is copied almost verbatim from chumsky's benchmarks, albeit adjusted to take
&str
as input instead of&[u8]
and with zero-copy features removed to be consistent with the existing benchmarks here).I've not updated the benchmark results, I'll leave you to do that so as to make sure they're consistent with your setup.
Edit: I realise that the change isn't quite equivalent regarding unicode codepoint parsing: if you think that it's likely to have sufficient impact on the benchmarks and would like me to adjust it, I can do so.