-
Notifications
You must be signed in to change notification settings - Fork 228
internal: upgrade chumsky to 0.10 in lexer #5223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add feature flag chumsky-10 that when active, makes the lexer use a different implementation. For now, that implementation is a stub that throws an error, but this sets up the structure to allow incrementally building the chumsky 0.10 lexer while keeping the 0.9 one working.
This commit completes Phase II of the chumsky 0.10 migration plan by: 1. Implementing a minimal lexer interface that compiles 2. Providing stub implementations for test compatibility 3. Setting up conditional test execution based on feature flags Note that this is a minimal implementation that provides the API structure but doesn't yet implement the actual lexer functionality. Full implementation will be done in Phase III. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Create the basic combinator infrastructure that will be used in Phase III. - Define Parser trait and essential combinators like map, then, etc. - Set up basic parser combinators (just, any, end, etc.) - Create token-specific combinators for lexing - Maintain fallback imperative implementation to ensure tests pass
- Updated lexer implementation to work with Chumsky 0.10 API - Modified token parsers to use Stream instead of raw strings - Added proper test setup for the new Chumsky version - Fixed issues with mapped values and error handling - Implemented basic string parsing (more advanced features to come later) 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Enhance error messages to include position information and better formatted errors. This maintains compatibility with the external API while improving the user experience.
🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
- Extract common components in comment, time, and number parsers - Refactor triple quoted string parser to avoid duplication - Create helper functions to improve readability and reduce code repetition 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Added historical comments that provide context about implementation decisions: - More details about doc comment error handling - Documentation of how multi-level quoted strings worked in 0.9 - Additional explanation of timezone handling for SQL compatibility 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
A PR to upgrade the lexer to use chumsky 0.10 while maintaining the existing parser with chumsky 0.9. The changes include:
- Adding conditional compilation and feature flags to separate chumsky 0.9 (for the parser) and chumsky 0.10 (for the lexer)
- Refactoring error handling in error.rs to support both chumsky versions
- Updating Cargo.toml to conditionally include the chumsky_0_10 dependency
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
File | Description |
---|---|
prqlc/prqlc-parser/src/test.rs | Added conditional compilation and a TODO comment for unicode error tests |
prqlc/prqlc-parser/src/parser/mod.rs | Added a comment explaining the dual API usage |
prqlc/prqlc-parser/src/error.rs | Updated ErrorSource enum definitions for chumsky 0.9 and 0.10 |
prqlc/prqlc-parser/Cargo.toml | Introduced new feature flag and dependency for chumsky 0.10 |
Comments suppressed due to low confidence (1)
prqlc/prqlc-parser/src/test.rs:37
- [nitpick] Consider adding equivalent tests for chumsky 0.10 behavior instead of limiting the test to the configuration when the chumsky-10 feature is disabled.
// TODO: fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully the suggestions help :)
Additionally, I've written up a rough migration guide for chumsky 0.9 -> 0.10 that you might find useful.
@zesterer thank you very much for your review! greatly appreciated |
chumsky 0.10 is a total rewrite from 0.9, and much of our code also needs rewriting if we're to upgrade
this creates a feature that uses chumsky 0.10 only for the lexer, as a way of making progress. claude code helped a lot, which benefits from smaller problems.
before merging, we would keep only the chumsky 0.10 code for the lexer (and possibly upgrade the parser, I'm not sure)
so far, it mostly works; only a couple of failing tests. I'm not sure how to get our "any number of odd quotes" lexing correctly, and there's a small issue with the end of input. I'll ask about the first over at chumsky
(this isn't the most important thing for PRQL, but I'm trying to get back into it, and wanted to spend some time trying a moderately difficult project with LLMs, so this seemed like a reasonable case. though also I thought it would be much easier! very open to feedback on whether there's a way to more incrementally make the changes, rather than basically rewriting from scratch. chumsky's type errors don't make it easy to play whack-a-mole)