-
-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(ast/estree): faster UTF-8 to UTF-16 span conversion #9349
perf(ast/estree): faster UTF-8 to UTF-16 span conversion #9349
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has enabled the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
CodSpeed Performance ReportMerging #9349 will not alter performanceComparing Summary
|
The latest https://crates.io/crates/oxc_ast/0.52.0 crate is 184kb, I think it's time to split all the visitors to speed up compilation? |
70c1d90
to
129b74d
Compare
4f43e48
to
05e3799
Compare
129b74d
to
3b80d73
Compare
05e3799
to
7427900
Compare
3b80d73
to
9d71dc7
Compare
ee04d1e
to
898ab78
Compare
Yes. The visitor in this PR is not compiled unless I also wonder if we should move Unfortunately the enormous |
Merge activity
|
Speed up UTF-8 to UTF-16 `Span` conversion by processing span offsets in ascending order. This involves, for each node, processing `span.start`, then visiting all children and processing their `Span`s, and then finally processing `span.end`. When the AST has come direct from the parser, this means all offsets are processed 100% in ascending order (and the visitor used here has manually-written visitation methods for the few types where that's not always the case otherwise e.g. `export {x}`). If the AST *has* been modified, code may have moved around, so ascending order won't necessarily be preserved, but still it *mostly* will. Optimize for this visitation order. Instead of doing a binary search through the whole UTF8-UTF16 translation table on every single span offset, assume the current offset is in same region as the last. Only resort to binary search as a de-opt when that's not the case. Most JS/TS files don't contain many non-ASCII characters, so these cases will be rare.
898ab78
to
61939ca
Compare
## [0.53.0] - 2025-02-26 - #9289 - 4a5a7cf napi/parser: [**BREAKING**] Remove magic string; enable utf16 span converter by default (#9291) (Boshen) ### Features - 5c775ea ast/estree: Enable serialization without TS fields (#9285) (overlookmotel) - f21740e data_structures: Add `CodeBuffer::print_bytes_iter_unchecked` method (#9337) (overlookmotel) - e10fb97 ecmascript: Improve may_have_side_effects for `.length` (#9366) (sapphi-red) - 35e5ca9 ecmascript: Improve may_have_side_effects for `instanceof` (#9365) (sapphi-red) - 11012c6 ecmascript: Improve ValueType for coalesce operator (#9354) (sapphi-red) - b7998fd ecmascript: To_number for object without toString (#9353) (sapphi-red) - e51d563 minifier: Concatenate strings with template literals on right side (#9356) (sapphi-red) - 9d7db54 minifier: Concatenate strings with template literals (#9355) (sapphi-red) - 835ee95 wasm: Return estree with utf16 span offsets (#9376) (Boshen) ### Bug Fixes - 6a8f53f ast/estree: Visit `JSXOpeningFragment` and `JSXClosingFragment` (#9342) (overlookmotel) - e303767 ast/estree: Fix ESTree AST for imports and exports (#9282) (overlookmotel) - 54d59f1 data_structures: Stack types correctly report allocation size if allocation failure during grow (#9317) (overlookmotel) - f5c8698 ecmascript: Correct may_have_side_effects for classes (#9367) (sapphi-red) - d3ed128 minifier: Do not remove `=== 0` if the lhs can be NaN (#9352) (sapphi-red) ### Performance - 82adab9 ast/estree: Speed up building UTF8-UTF16 translation table with SIMD (#9359) (overlookmotel) - 61939ca ast/estree: Faster UTF-8 to UTF-16 span conversion (#9349) (overlookmotel) - 1bfc459 ast/estree: Pre-allocate `CodeBuffer` for JSON output (#9340) (overlookmotel) - 018c523 ast/estree: `ESTree` serializer use `CodeBuffer` (#9331) (overlookmotel) - 35ee399 codegen: Use `iter::repeat_n` in `CodeBuffer` (#9325) (overlookmotel) ### Documentation - 8bd3e39 data_structures: Uppercase SAFETY comments (#9330) (overlookmotel) ### Refactor - d94fc15 allocator: Reduce scope of `unsafe` blocks (#9319) (overlookmotel) - 7427900 ast: Re-order `ExportDefaultDeclaration` fields (#9348) (overlookmotel) - b09249c ast/estree: Rename serializers and serialization methods (#9284) (overlookmotel) - 55ed1df ast/estree: Shorten `ESTree` impls for enums (#9275) (overlookmotel) - 9d98444 codegen, data_structures: Move `CodeBuffer` into `oxc_data_structures` crate (#9326) (overlookmotel) - 6a4e892 data_structures: Add debug assertion to `CodeBuffer::peek_nth_char_back` and improve safety docs (#9336) (overlookmotel) - fc46218 data_structures: `CodeBuffer::print_str` use `Vec::extend_from_slice` (#9332) (overlookmotel) - 690bae5 data_structures: Stack types const assert `T` is not zero-size type (#9318) (overlookmotel) - 10ba2ea data_structures: Reduce scope of `unsafe` blocks (#9316) (overlookmotel) - beb8382 data_structures: `CodeBuffer::print_bytes_unchecked` take a byte slice (#9327) (overlookmotel) - faf966f ecmascript: Don't check side effects in constant_evaluation (#9122) (sapphi-red) - 2faabe1 estree: Make `itoa` dependency optional (#9338) (overlookmotel) - 4e9e8cf lexer: Reduce scope of `unsafe` blocks (#9320) (overlookmotel) - c31b53f mangler: Reduce scope of `unsafe` blocks (#9321) (overlookmotel) - f10a6da mangler: Move base54 into seperate mod (#9278) (Cameron) - 12e89e0 syntax: Reduce scope of `unsafe` blocks (#9322) (overlookmotel) - f39be5f traverse: Reduce scope of `unsafe` blocks (#9323) (overlookmotel) Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Speed up UTF-8 to UTF-16
Span
conversion by processing span offsets in ascending order.This involves, for each node, processing
span.start
, then visiting all children and processing theirSpan
s, and then finally processingspan.end
.When the AST has come direct from the parser, this means all offsets are processed 100% in ascending order (and the visitor used here has manually-written visitation methods for the few types where that's not always the case otherwise e.g.
export {x}
).If the AST has been modified, code may have moved around, so ascending order won't necessarily be preserved, but still it mostly will.
Optimize for this visitation order. Instead of doing a binary search through the whole UTF8-UTF16 translation table on every single span offset, assume the current offset is in same region as the last. Only resort to binary search as a de-opt when that's not the case. Most JS/TS files don't contain many non-ASCII characters, so these cases will be rare.