Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ast/estree): faster UTF-8 to UTF-16 span conversion #9349

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented Feb 25, 2025

Speed up UTF-8 to UTF-16 Span conversion by processing span offsets in ascending order.

This involves, for each node, processing span.start, then visiting all children and processing their Spans, and then finally processing span.end.

When the AST has come direct from the parser, this means all offsets are processed 100% in ascending order (and the visitor used here has manually-written visitation methods for the few types where that's not always the case otherwise e.g. export {x}).

If the AST has been modified, code may have moved around, so ascending order won't necessarily be preserved, but still it mostly will.

Optimize for this visitation order. Instead of doing a binary search through the whole UTF8-UTF16 translation table on every single span offset, assume the current offset is in same region as the last. Only resort to binary search as a de-opt when that's not the case. Most JS/TS files don't contain many non-ASCII characters, so these cases will be rare.

Copy link
Contributor Author

overlookmotel commented Feb 25, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions github-actions bot added A-parser Area - Parser A-ast Area - AST A-ast-tools Area - AST tools labels Feb 25, 2025
@github-actions github-actions bot added the C-performance Category - Solution not expected to change functional behavior, only performance label Feb 25, 2025
Copy link

codspeed-hq bot commented Feb 25, 2025

CodSpeed Performance Report

Merging #9349 will not alter performance

Comparing 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion (61939ca) with main (4f2fc39)

Summary

✅ 33 untouched benchmarks

@Boshen
Copy link
Member

Boshen commented Feb 25, 2025

The latest https://crates.io/crates/oxc_ast/0.52.0 crate is 184kb, I think it's time to split all the visitors to speed up compilation?

@overlookmotel overlookmotel force-pushed the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch 2 times, most recently from 70c1d90 to 129b74d Compare February 25, 2025 04:48
@overlookmotel overlookmotel force-pushed the 02-25-refactor_ast_re-order_exportdefaultdeclaration_fields branch from 4f43e48 to 05e3799 Compare February 25, 2025 04:48
@Boshen Boshen changed the base branch from 02-25-refactor_ast_re-order_exportdefaultdeclaration_fields to graphite-base/9349 February 25, 2025 05:06
@Boshen Boshen force-pushed the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch from 129b74d to 3b80d73 Compare February 25, 2025 05:17
@Boshen Boshen force-pushed the graphite-base/9349 branch from 05e3799 to 7427900 Compare February 25, 2025 05:17
@Boshen Boshen changed the base branch from graphite-base/9349 to main February 25, 2025 05:18
@Boshen Boshen force-pushed the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch from 3b80d73 to 9d71dc7 Compare February 25, 2025 05:18
@overlookmotel overlookmotel force-pushed the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch 2 times, most recently from ee04d1e to 898ab78 Compare February 25, 2025 18:35
@overlookmotel overlookmotel marked this pull request as ready for review February 25, 2025 18:47
@overlookmotel
Copy link
Contributor Author

overlookmotel commented Feb 25, 2025

The latest https://crates.io/crates/oxc_ast/0.52.0 crate is 184kb, I think it's time to split all the visitors to speed up compilation?

Yes. The visitor in this PR is not compiled unless serialize feature is enabled, though.

I also wonder if we should move Visit and VisitMut traits into a new oxc_visit crate. Some use cases only need the parser and AST itself, so having to compile these huge files is unhelpful in those cases.

Unfortunately the enormous AstBuilder I think needs to stay in oxc_ast crate, because we want to eventually force all node creation to go through AstBuilder. But we could put it behind a feature. I don't think linter uses it, for example.

@graphite-app graphite-app bot added the 0-merge Merge with Graphite Merge Queue label Feb 26, 2025
Copy link

graphite-app bot commented Feb 26, 2025

Merge activity

Speed up UTF-8 to UTF-16 `Span` conversion by processing span offsets in ascending order.

This involves, for each node, processing `span.start`, then visiting all children and processing their `Span`s, and then finally processing `span.end`.

When the AST has come direct from the parser, this means all offsets are processed 100% in ascending order (and the visitor used here has manually-written visitation methods for the few types where that's not always the case otherwise e.g. `export {x}`).

If the AST *has* been modified, code may have moved around, so ascending order won't necessarily be preserved, but still it *mostly* will.

Optimize for this visitation order. Instead of doing a binary search through the whole UTF8-UTF16 translation table on every single span offset, assume the current offset is in same region as the last. Only resort to binary search as a de-opt when that's not the case. Most JS/TS files don't contain many non-ASCII characters, so these cases will be rare.
@graphite-app graphite-app bot force-pushed the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch from 898ab78 to 61939ca Compare February 26, 2025 02:11
@graphite-app graphite-app bot merged commit 61939ca into main Feb 26, 2025
26 checks passed
@graphite-app graphite-app bot deleted the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch February 26, 2025 02:18
Boshen added a commit that referenced this pull request Feb 26, 2025
## [0.53.0] - 2025-02-26

- #9289
- 4a5a7cf napi/parser: [**BREAKING**] Remove magic string; enable utf16
span converter by default (#9291) (Boshen)

### Features

- 5c775ea ast/estree: Enable serialization without TS fields (#9285)
(overlookmotel)
- f21740e data_structures: Add `CodeBuffer::print_bytes_iter_unchecked`
method (#9337) (overlookmotel)
- e10fb97 ecmascript: Improve may_have_side_effects for `.length`
(#9366) (sapphi-red)
- 35e5ca9 ecmascript: Improve may_have_side_effects for `instanceof`
(#9365) (sapphi-red)
- 11012c6 ecmascript: Improve ValueType for coalesce operator (#9354)
(sapphi-red)
- b7998fd ecmascript: To_number for object without toString (#9353)
(sapphi-red)
- e51d563 minifier: Concatenate strings with template literals on right
side (#9356) (sapphi-red)
- 9d7db54 minifier: Concatenate strings with template literals (#9355)
(sapphi-red)
- 835ee95 wasm: Return estree with utf16 span offsets (#9376) (Boshen)

### Bug Fixes

- 6a8f53f ast/estree: Visit `JSXOpeningFragment` and
`JSXClosingFragment` (#9342) (overlookmotel)
- e303767 ast/estree: Fix ESTree AST for imports and exports (#9282)
(overlookmotel)
- 54d59f1 data_structures: Stack types correctly report allocation size
if allocation failure during grow (#9317) (overlookmotel)
- f5c8698 ecmascript: Correct may_have_side_effects for classes (#9367)
(sapphi-red)
- d3ed128 minifier: Do not remove `=== 0` if the lhs can be NaN (#9352)
(sapphi-red)

### Performance

- 82adab9 ast/estree: Speed up building UTF8-UTF16 translation table
with SIMD (#9359) (overlookmotel)
- 61939ca ast/estree: Faster UTF-8 to UTF-16 span conversion (#9349)
(overlookmotel)
- 1bfc459 ast/estree: Pre-allocate `CodeBuffer` for JSON output (#9340)
(overlookmotel)
- 018c523 ast/estree: `ESTree` serializer use `CodeBuffer` (#9331)
(overlookmotel)
- 35ee399 codegen: Use `iter::repeat_n` in `CodeBuffer` (#9325)
(overlookmotel)

### Documentation

- 8bd3e39 data_structures: Uppercase SAFETY comments (#9330)
(overlookmotel)

### Refactor

- d94fc15 allocator: Reduce scope of `unsafe` blocks (#9319)
(overlookmotel)
- 7427900 ast: Re-order `ExportDefaultDeclaration` fields (#9348)
(overlookmotel)
- b09249c ast/estree: Rename serializers and serialization methods
(#9284) (overlookmotel)
- 55ed1df ast/estree: Shorten `ESTree` impls for enums (#9275)
(overlookmotel)
- 9d98444 codegen, data_structures: Move `CodeBuffer` into
`oxc_data_structures` crate (#9326) (overlookmotel)
- 6a4e892 data_structures: Add debug assertion to
`CodeBuffer::peek_nth_char_back` and improve safety docs (#9336)
(overlookmotel)
- fc46218 data_structures: `CodeBuffer::print_str` use
`Vec::extend_from_slice` (#9332) (overlookmotel)
- 690bae5 data_structures: Stack types const assert `T` is not zero-size
type (#9318) (overlookmotel)
- 10ba2ea data_structures: Reduce scope of `unsafe` blocks (#9316)
(overlookmotel)
- beb8382 data_structures: `CodeBuffer::print_bytes_unchecked` take a
byte slice (#9327) (overlookmotel)
- faf966f ecmascript: Don't check side effects in constant_evaluation
(#9122) (sapphi-red)
- 2faabe1 estree: Make `itoa` dependency optional (#9338)
(overlookmotel)
- 4e9e8cf lexer: Reduce scope of `unsafe` blocks (#9320) (overlookmotel)
- c31b53f mangler: Reduce scope of `unsafe` blocks (#9321)
(overlookmotel)
- f10a6da mangler: Move base54 into seperate mod (#9278) (Cameron)
- 12e89e0 syntax: Reduce scope of `unsafe` blocks (#9322)
(overlookmotel)
- f39be5f traverse: Reduce scope of `unsafe` blocks (#9323)
(overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-merge Merge with Graphite Merge Queue A-ast Area - AST A-ast-tools Area - AST tools A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants