Skip to content

internal: upgrade chumsky to 0.10 in lexer #5223

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 59 commits into
base: main
Choose a base branch
from

Conversation

max-sixty
Copy link
Member

chumsky 0.10 is a total rewrite from 0.9, and much of our code also needs rewriting if we're to upgrade

this creates a feature that uses chumsky 0.10 only for the lexer, as a way of making progress. claude code helped a lot, which benefits from smaller problems.

before merging, we would keep only the chumsky 0.10 code for the lexer (and possibly upgrade the parser, I'm not sure)

so far, it mostly works; only a couple of failing tests. I'm not sure how to get our "any number of odd quotes" lexing correctly, and there's a small issue with the end of input. I'll ask about the first over at chumsky

(this isn't the most important thing for PRQL, but I'm trying to get back into it, and wanted to spend some time trying a moderately difficult project with LLMs, so this seemed like a reasonable case. though also I thought it would be much easier! very open to feedback on whether there's a way to more incrementally make the changes, rather than basically rewriting from scratch. chumsky's type errors don't make it easy to play whack-a-mole)

max-sixty and others added 30 commits March 31, 2025 17:43
Add feature flag chumsky-10 that when active, makes the lexer use a different implementation.
For now, that implementation is a stub that throws an error, but this sets up the structure to allow
incrementally building the chumsky 0.10 lexer while keeping the 0.9 one working.
This commit completes Phase II of the chumsky 0.10 migration plan by:
1. Implementing a minimal lexer interface that compiles
2. Providing stub implementations for test compatibility
3. Setting up conditional test execution based on feature flags

Note that this is a minimal implementation that provides the API
structure but doesn't yet implement the actual lexer functionality.
Full implementation will be done in Phase III.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Create the basic combinator infrastructure that will be used in Phase III.
- Define Parser trait and essential combinators like map, then, etc.
- Set up basic parser combinators (just, any, end, etc.)
- Create token-specific combinators for lexing
- Maintain fallback imperative implementation to ensure tests pass
- Updated lexer implementation to work with Chumsky 0.10 API
- Modified token parsers to use Stream instead of raw strings
- Added proper test setup for the new Chumsky version
- Fixed issues with mapped values and error handling
- Implemented basic string parsing (more advanced features to come later)

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
max-sixty and others added 11 commits April 1, 2025 23:13
Enhance error messages to include position information and better formatted errors. This maintains compatibility with the external API while improving the user experience.
🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Extract common components in comment, time, and number parsers
- Refactor triple quoted string parser to avoid duplication
- Create helper functions to improve readability and reduce code repetition

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Added historical comments that provide context about implementation decisions:
- More details about doc comment error handling
- Documentation of how multi-level quoted strings worked in 0.9
- Additional explanation of timezone handling for SQL compatibility

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@max-sixty max-sixty requested a review from Copilot April 2, 2025 17:42
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

A PR to upgrade the lexer to use chumsky 0.10 while maintaining the existing parser with chumsky 0.9. The changes include:

  • Adding conditional compilation and feature flags to separate chumsky 0.9 (for the parser) and chumsky 0.10 (for the lexer)
  • Refactoring error handling in error.rs to support both chumsky versions
  • Updating Cargo.toml to conditionally include the chumsky_0_10 dependency

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.

File Description
prqlc/prqlc-parser/src/test.rs Added conditional compilation and a TODO comment for unicode error tests
prqlc/prqlc-parser/src/parser/mod.rs Added a comment explaining the dual API usage
prqlc/prqlc-parser/src/error.rs Updated ErrorSource enum definitions for chumsky 0.9 and 0.10
prqlc/prqlc-parser/Cargo.toml Introduced new feature flag and dependency for chumsky 0.10
Comments suppressed due to low confidence (1)

prqlc/prqlc-parser/src/test.rs:37

  • [nitpick] Consider adding equivalent tests for chumsky 0.10 behavior instead of limiting the test to the configuration when the chumsky-10 feature is disabled.
// TODO: fix

Copy link

@zesterer zesterer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully the suggestions help :)

Additionally, I've written up a rough migration guide for chumsky 0.9 -> 0.10 that you might find useful.

@max-sixty
Copy link
Member Author

@zesterer thank you very much for your review! greatly appreciated

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants