Skip to content

Reconfigured chumsky example to be more idiomatically geared toward performance #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 14, 2025

Conversation

zesterer
Copy link
Contributor

@zesterer zesterer commented Apr 1, 2025

Hopefully you'll agree that nothing here is leaning too aggressively into the benchmark in question: just removing the 'frilly extras' that chumsky provides that favour error quality over performance (the implementation is copied almost verbatim from chumsky's benchmarks, albeit adjusted to take &str as input instead of &[u8] and with zero-copy features removed to be consistent with the existing benchmarks here).

I've not updated the benchmark results, I'll leave you to do that so as to make sure they're consistent with your setup.

Edit: I realise that the change isn't quite equivalent regarding unicode codepoint parsing: if you think that it's likely to have sufficient impact on the benchmarks and would like me to adjust it, I can do so.

@zesterer
Copy link
Contributor Author

zesterer commented Apr 1, 2025

Actually, I ended up adding support for equivalent codepoint parsing back in (and found it to have little meaningful difference for performance regardless): the parser should be the same as the others now.

@epage
Copy link
Collaborator

epage commented Apr 1, 2025

Hopefully you'll agree that nothing here is leaning too aggressively into the benchmark in question: just removing the 'frilly extras' that chumsky provides that favour error quality over performance (the implementation is copied almost verbatim from chumsky's benchmarks, albeit adjusted to take &str as input instead of &[u8] and with zero-copy features removed to be consistent with the existing benchmarks here).

Yes, I would like to avoid getting into performance games. My focus for doing so is through what is idiomatic for the parser and I am mostly determining that by what maintainers are putting in their end-user documentation.

Some of the changes (e.g. switching to one_of in different cases, hand-implementing logic that was coming from chumsky) feels like its moving away from that spirit if you didn't consider that reasonable to show in your examples. In saying that, I recognize there can be bias in the results based on what APIs any individual parser naturally exposes (e.g. nom and winnow have float which avoids needing to choose between int(10) and one_of(...)).

However, as chumsky's example is also showing other features, like error recovery, I think I'm fine with dropping those.

This is all fuzzy because these can't really be straight comparisons across them.

@zesterer
Copy link
Contributor Author

zesterer commented Apr 1, 2025

Yes, I would like to avoid getting into performance games.

Certainly. As I've seen myself, it's all too easy to over-fit for a specific benchmark at the cost of anything useful. I've adjusted the example to use the idiomatic text parsers provided: the effect on performance is fairly negligible anyway.

My interpretation of this repository is that it's showing the results a user can expect if they flick the library to whatever the 'maximum performance' settings are but otherwise stick to idiomatic constructs: I think the state of the example is now as close to that as I can get.

@epage
Copy link
Collaborator

epage commented Apr 1, 2025

As I am not an expert in any of these individual libraries, I would feel much more comfortable having these more structurally match the idiomtic examples. This will make it easier for me to update between breaking changes. For example, combine is stuck at v3 because I only ever found a json example for that version. Dealing with small deviations, like removing error recovery, are much easier for me to manage.

My interpretation of this repository is that it's showing the results a user can expect if they flick the library to whatever the 'maximum performance' settings

No, it is purely on idiomatic. There are changes I could make to make the winnow code faster that I don't do. When I update to nom v8, I kept to their example using the idiomtic parser functions even though those act as a barrier that stops the propagation of the GAT information that could make things faster.

@zesterer
Copy link
Contributor Author

zesterer commented Apr 1, 2025

Something else I would say is that it would be nice to have some more diverse sample data. canada.json is relatively flat and includes little variety in the constructs it contains: I suspect that it's more likely a test of the ability of the CPU to perform branch prediction than it is of the parsers' ability to handle arbitrary input. I have some more diverse JSON in chumsky's benchmark, if it's interesting to you, although I imagine there is a better corpus behind a quick search.

No, it is purely on idiomatic.

Then that's fine: chumsky defaults to faster configuration (hence this PR's diff having a negative size), so it's hard to see anything unidiomatic here.

@epage
Copy link
Collaborator

epage commented Apr 1, 2025

I suspect that it's more likely a test of the ability of the CPU to perform branch prediction than it is of the parsers' ability to handle arbitrary input. I have some more diverse JSON in chumsky's benchmark, if it's interesting to you, although I imagine there is a better corpus behind a quick search.

I am open to changing the test data so long as its hefty enough that most forms of jitter will have less of an impact.. My preference is to stick to one piece of test data.

Then that's fine: chumsky defaults to faster configuration (hence this PR's diff having a negative size), so it's hard to see anything unidiomatic here.

I can understand if you feel this is idiomatic but I would still like to stick to something based off an example for maintenance sake.

@zesterer
Copy link
Contributor Author

zesterer commented Apr 1, 2025

I can understand if you feel this is idiomatic but I would still like to stick to something based off an example for maintenance sake.

I'm not really sure how to respond to this, since I'm now not sure what the intention here is. If the goal is to just copy-and-paste the first obvious example from any given library, then the benchmarks are going to be near-enough useless to any potential users.

Most of the libraries, and chumsky in particular, are designed from the start to be extensible and to allow users to choose where they want to sit on the performance vs error quality vs recovery spectrum. Chumsky's JSON example is set up specifically to show off the error and recovery system (since that's the novel thing the library brings to the table), deliberately at the expense of performance. A 'first impressions' example like this doesn't communicate anything useful to users that are interested specifically in performance at all.

As-is, the changes I've made in this PR are both a simplification (in terms of code size, and in terms of the crate's idiomatic abstractions), as well as a move toward functionality that's more representative of what other parsers being compared are actually doing. I don't think I can justify things any better than that.

If you're concerned about maintainability: don't be. Everything being used here is sufficiently fundamental to the API that it's not going to change. The code should keep working just fine far into the future. I'm always happy to be pinged if you want it to be updated and you're not sure how, of course.

@epage
Copy link
Collaborator

epage commented Apr 11, 2025

While I appreciate your offer for help, in the end I am responsible for maintaining this repo and have to be willing to commit to maintain what is here. Only accepting external examples, with minor adjustments, is a way for me to make sure I can commit to continue to offer this.

While you mention the accuracy of the results, that goes both ways. I either delegate responsibility for what is "idiomatic" to each library's documentation or I'm taking on the responsibility for auditing PRs for how "idiomatic" they are without me being familiar enough with them for what they do. I offered to allow the error recovery calls to be removed. I can understand that the structure of the parser might need to also change to better support that. Not as a workaround to this problem but in case its a helpful suggestion, what I do with winnow is I have multiple json examples that are meant to highlight different features / approaches and the diff between examples is meant to be kept to only those differences. I've even considered setting it up so the diffs get rendered in the docs.

@zesterer
Copy link
Contributor Author

zesterer commented Apr 13, 2025

I'm not really sure what you're asking me to do in order to satisfy your requirements, sorry.

My issue is that as-is, the benchmark paints an unrepresentative and unfair view of the performance users can expect on account of the fact that the library is being configured to do a number of things not even provided as options by the other libraries. I'm not interested in gaming the benchmarks - I just want users to not jump to unjustified conclusions given that most folks are going to be taking these numbers as authoritative.

I understand that you can't delegate to the point that this becomes a 'Benchmarks Game' scenario, but that does necessarily mean you taking the time to understand what counts as 'idiomatic' in each library rather than just using the first example that you happen to chance upon. That's just part-and-parcel of responsibly maintaining a benchmark like this that users are going to be drawing conclusions from, in my view.

Please, tell me what you're looking for from this MR in actionable terms. I'm not opening this MR in bad faith. Right now, I'm not seeing much alternative than to advise users that they ignore the numbers in this repository because they do not usefully represent reality. I really don't want to be doing that because it feels unnecessarily adversarial.

If it's interesting to you, I recently updated the benchmarks in my own repository, which are similar but are performed on more diverse sample data.

@epage
Copy link
Collaborator

epage commented Apr 14, 2025

Please, tell me what you're looking for from this MR in actionable terms.

I'm sorry, I thought my request is clear. I will accept a trivial reduction of an existing example.

@zesterer
Copy link
Contributor Author

I'm sorry, I thought my request is clear. I will accept a trivial reduction of an existing example.

My apologies, I think I misunderstood what you were saying.

I've changed the PR to be a strict reduction of the json.rs example present in chumsky - nothing has been added or changed beyond the defaults provided by chumsky.

In addition, and to help ease any possible maintenance burden, I've added this as an example in the chumsky repo (i.e: if you ever want to update things, you can just copy it from there). I'll do my best to keep it as idiomatic/fair as I can.

@epage epage merged commit a0030cd into rosetta-rs:main Apr 14, 2025
4 of 5 checks passed
@epage
Copy link
Collaborator

epage commented Apr 14, 2025

0ce7b82 updated the numbers. This also included a Rust 1.85.1 -> 1.86 upgrade

  • Binary size: 297K to 135K
  • Parse time: 83ms to 31ms

Is most of that overhead from the rich error? I would be surprised if the no-error parser case gets slowed down by recover_with.

Error reporting is one of those areas that make this comparison hard.

  • nom is using VerboseError which tracks the error location and parser-specific context
    • used in the original example
    • default error type is Error which tracks error location and an ErrorKind
  • winnow is using ContextError which tracks parser-specific context (winnow always tracks error location independent of error type)
    • used in the original example
    • is the default error type
  • chumsky is using EmptyErr which is more like () in nom (which is less featureful than () in winnow)
    • used in the original example
    • is the default error type

(not even bother to compare the others since it would take a bit more work to figure out)

@epage
Copy link
Collaborator

epage commented Apr 14, 2025

I wonder if I should add an example of error output for each one with the caveat that libraries may support customizing errors to balance performance vs quality. That way, regardless of defaults, its clearer what trade off an individual implementation has made.

@zesterer
Copy link
Contributor Author

zesterer commented Apr 14, 2025

Is most of that overhead from the rich error? I would be surprised if the no-error parser case gets slowed down by recover_with.

Most likely, yes. Rich does a lot of internal bookkeeping to track what possible set of tokens or patterns might be expected at each position, the contexts in which errors occurred, the reason for the error, and more. I've done a lot of work to make it at least reasonably performant, but choosing to use it is an active decision to favour user ergonomics over speed.

It's probably the case that recovery alone adds relatively little to execution time.

Error reporting is one of those areas that make this comparison hard.

Certainly. I imagine that there isn't any common denominator to be found between the libraries, let alone one that can provide useful expectations. I don't envy your position in trying to balance these things.

That said, the fact that a parser library can't be configured for speed is still a meaningful data point for users. I think benchmarks should seek to represent the 'worst of the best' (i.e: the worse-performing aspects of a best-practice written-for-speed implementation), and so 'best' should naturally include pushing whatever 'go fast' button the library has.

* winnow is using `ContextError` which tracks parser-specific context (winnow always tracks error location independent of error type)

When you say 'location', what do you mean? Chumsky also always tracks location (i.e: the position in the input that an error occurred at) since it's important information for error prioritisation & merging logic, but spans specifically are a layer on top of that which can be opted into according to which combinator of input and error types you decide to use.

* chumsky is using `EmptyErr` which is more like `()` in nom (which is less featureful than `()` in winnow)

As I say, I think the story is a bit more complicated than it might appear for chumsky. There's a lot going on 'under the hood' that isn't obvious at first glance. Even EmptyErr is still tracking both an error location and also backtracking information. Obviously, it's a bit difficult to say how much of this actually ends up appearing in the final executable once it's been ravaged by LLVM, as I'm sure is true of winnow also.

If winnow is also tracking the full span (i.e: not just the location) then I'd be happy to switch the error type out to Cheap, which would be closer in principle to what winnow is doing.

@epage
Copy link
Collaborator

epage commented Apr 14, 2025

? Chumsky also always tracks location (i.e: the position in the input that an error occurred at) since it's important information for error prioritisation & merging logic,

Ah, good to know! I had assumed it didn't, like nom.

If winnow is also tracking the full span (i.e: not just the location) then I'd be happy to switch the error type out to Cheap, which would be closer in principle to what winnow is doing.

I've not looked into span tracking yet for winnow, in part assuming (likely incorrectly) that supporting that is likely a point where people would switch from parsing directly to an AST to separate lexing and parsing (and then the span is stored inside your Token). If you have more background on how users specify spans for parsing directly to an AST and why its important for that use case, I'd love to learn it!

@zesterer
Copy link
Contributor Author

zesterer commented Apr 14, 2025

If you have more background on how users specify spans for parsing directly to an AST and why its important for that use case, I'd love to learn it!

So a thing I've found that is really important is the ability to be able to generate fresh spans that correspond to a region of the input. For example, your input might be a list of token/span tuples like &[(Token, Span)]. But then your parser parses, say, a struct definition like struct Foo { a: String } and now you want to be able to come up with a span that covers the whole struct (since that's quite useful for nice error diagnostics). Chumsky does this by manufacturing a new span based on those that appear within the input. So you can do:

let struct_parser: impl Parser<_, StructNode> = ... ;

struct_parser.map_with(|st, e| (st, e.span()))

And now your output type is (StructNode, Span), where the second element is a span that's been fabricated in such a way that it covers whatever tokens were parsed by struct_parser (or, more accurately, the span is the union of all those that contributed to the struct definition).

The logic for doing this gets a bit convoluted if you don't design for it from the get-go (I think I'm on my 5th from-scratch rewrite and only now do I feel like it's truly integrated in a nice way), but if you know roughly what shape your span types are, you can merge them together to do this without too much effort.

For types like &str, this isn't too bad: your span is your location, since it's just a byte offset. But for token slices or token trees, you might have a span-per-token or even nested spans going on and you've got to figure out how to pipe all of that through so that you can always generate a useful and coherent span on the fly whenever the user needs to label an AST node with one.

Very happy to discuss this in more detail if you're interested in doing so! This is one of those things I've done way too much thinking about over the past few years.

@epage
Copy link
Collaborator

epage commented Apr 15, 2025

Ok, so this isn't about taking an error location when parsing a &str and turning it into a meaningful Span but aggregating spans when parsing &[(Token, Span)]. We added support for this in Winnow 0.7 when using the TokenSlice<T> input wrapper (though any custom input stream can support this as well). As we aren't opinionated on the structure of the token, it does require implementing a trait on T (and because I reused a trait, it can be a little confusing).

EDIT: Initial winnow discussion

@zesterer
Copy link
Contributor Author

Nice! I'll take a read through that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants