Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(ast/estree): speed up building UTF8-UTF16 translation table with SIMD #9359

Conversation

overlookmotel
Copy link
Contributor

@overlookmotel overlookmotel commented Feb 25, 2025

Use SIMD operations to speed up building the table of UTF8-UTF16 offset translations.

Previously, we looped through the source byte-by-byte, checking if each byte is ASCII, one-by-one. Instead, use SIMD to check blocks of 32 bytes at a time, and only drop down to byte-by-byte processing if the quick check does find any non-ASCII bytes in that block.

Building the table is about 7x faster in our benchmark after this PR.

Note: There are no explicit SIMD operations involved in the code. Just the code is written so that the compiler can auto-vectorize it.

We could get even better performance if we did use SIMD proper - on x86_64 with AVX-512 or Apple M4, could process blocks of 64 bytes in one go, and with less instructions. But that involves a lot of complications, so I think this is a decent compromise for now.

Copy link
Contributor Author

overlookmotel commented Feb 25, 2025


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • 0-merge - adds this PR to the back of the merge queue
  • hotfix - for urgent hot fixes, skip the queue and merge this PR next

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link

codspeed-hq bot commented Feb 25, 2025

CodSpeed Performance Report

Merging #9359 will improve performances by 6.16%

Comparing 02-25-perf_ast_estree_speed_up_building_utf8-utf16_translation_table_with_simd (82adab9) with main (f5c8698)

Summary

⚡ 1 improvements
✅ 32 untouched benchmarks

Benchmarks breakdown

Benchmark BASE HEAD Change
estree[checker.ts] 119.6 ms 112.6 ms +6.16%

@overlookmotel overlookmotel marked this pull request as ready for review February 25, 2025 19:31
@overlookmotel overlookmotel force-pushed the 02-25-perf_ast_estree_speed_up_building_utf8-utf16_translation_table_with_simd branch from ba4be9f to 6c2dd44 Compare February 25, 2025 19:43
@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Feb 26, 2025
Copy link
Member

Boshen commented Feb 26, 2025

Merge activity

  • Feb 25, 9:08 PM EST: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Feb 25, 9:08 PM EST: A user added this pull request to the Graphite merge queue.
  • Feb 25, 9:19 PM EST: The Graphite merge queue couldn't merge this PR because it was not satisfying all requirements (Failed CI: 'Clippy').
  • Feb 26, 5:16 AM EST: The merge label '0-merge' was detected. This PR will be added to the Graphite merge queue once it meets the requirements.
  • Feb 26, 5:16 AM EST: A user added this pull request to the Graphite merge queue.
  • Feb 26, 5:17 AM EST: A user merged this pull request with the Graphite merge queue.

@graphite-app graphite-app bot force-pushed the 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion branch from 898ab78 to 61939ca Compare February 26, 2025 02:11
graphite-app bot pushed a commit that referenced this pull request Feb 26, 2025
… SIMD (#9359)

Use SIMD operations to speed up building the table of UTF8-UTF16 offset translations.

Previously, we looped through the source byte-by-byte, checking if each byte is ASCII, one-by-one. Instead, use SIMD to check blocks of 32 bytes at a time, and only drop down to byte-by-byte processing if the quick check does find any non-ASCII bytes in that block.

Building the table is about 7x faster in our benchmark after this PR.

Note: There are no explicit SIMD operations involved in the code. Just the code is written so that the compiler can auto-vectorize it.

We could get even better performance if we did use SIMD proper - on x86_64 with AVX-512 or Apple M4, could process blocks of 64 bytes in one go, and with less instructions. But that involves a lot of complications, so I think this is a decent compromise for now.
@graphite-app graphite-app bot force-pushed the 02-25-perf_ast_estree_speed_up_building_utf8-utf16_translation_table_with_simd branch from 6c2dd44 to 28a79f2 Compare February 26, 2025 02:11
Base automatically changed from 02-25-perf_ast_estree_faster_utf-8_to_utf-16_span_conversion to main February 26, 2025 02:18
@graphite-app graphite-app bot removed the 0-merge Merge with Graphite Merge Queue label Feb 26, 2025
… SIMD (#9359)

Use SIMD operations to speed up building the table of UTF8-UTF16 offset translations.

Previously, we looped through the source byte-by-byte, checking if each byte is ASCII, one-by-one. Instead, use SIMD to check blocks of 32 bytes at a time, and only drop down to byte-by-byte processing if the quick check does find any non-ASCII bytes in that block.

Building the table is about 7x faster in our benchmark after this PR.

Note: There are no explicit SIMD operations involved in the code. Just the code is written so that the compiler can auto-vectorize it.

We could get even better performance if we did use SIMD proper - on x86_64 with AVX-512 or Apple M4, could process blocks of 64 bytes in one go, and with less instructions. But that involves a lot of complications, so I think this is a decent compromise for now.
@overlookmotel overlookmotel force-pushed the 02-25-perf_ast_estree_speed_up_building_utf8-utf16_translation_table_with_simd branch from 28a79f2 to 82adab9 Compare February 26, 2025 09:49
@overlookmotel overlookmotel marked this pull request as draft February 26, 2025 09:51
@overlookmotel
Copy link
Contributor Author

Miri passes (see #9370). I think this is good to go.

@overlookmotel overlookmotel marked this pull request as ready for review February 26, 2025 10:05
@Boshen Boshen added the 0-merge Merge with Graphite Merge Queue label Feb 26, 2025
@graphite-app graphite-app bot merged commit 82adab9 into main Feb 26, 2025
27 checks passed
@graphite-app graphite-app bot deleted the 02-25-perf_ast_estree_speed_up_building_utf8-utf16_translation_table_with_simd branch February 26, 2025 10:17
overlookmotel added a commit that referenced this pull request Feb 26, 2025
#9359 introduced unsafe code to `oxc_ast` crate. Run Miri on `oxc_ast`
crate, but only when the files containing that unsafe code are altered.
Boshen added a commit that referenced this pull request Feb 26, 2025
## [0.53.0] - 2025-02-26

- #9289
- 4a5a7cf napi/parser: [**BREAKING**] Remove magic string; enable utf16
span converter by default (#9291) (Boshen)

### Features

- 5c775ea ast/estree: Enable serialization without TS fields (#9285)
(overlookmotel)
- f21740e data_structures: Add `CodeBuffer::print_bytes_iter_unchecked`
method (#9337) (overlookmotel)
- e10fb97 ecmascript: Improve may_have_side_effects for `.length`
(#9366) (sapphi-red)
- 35e5ca9 ecmascript: Improve may_have_side_effects for `instanceof`
(#9365) (sapphi-red)
- 11012c6 ecmascript: Improve ValueType for coalesce operator (#9354)
(sapphi-red)
- b7998fd ecmascript: To_number for object without toString (#9353)
(sapphi-red)
- e51d563 minifier: Concatenate strings with template literals on right
side (#9356) (sapphi-red)
- 9d7db54 minifier: Concatenate strings with template literals (#9355)
(sapphi-red)
- 835ee95 wasm: Return estree with utf16 span offsets (#9376) (Boshen)

### Bug Fixes

- 6a8f53f ast/estree: Visit `JSXOpeningFragment` and
`JSXClosingFragment` (#9342) (overlookmotel)
- e303767 ast/estree: Fix ESTree AST for imports and exports (#9282)
(overlookmotel)
- 54d59f1 data_structures: Stack types correctly report allocation size
if allocation failure during grow (#9317) (overlookmotel)
- f5c8698 ecmascript: Correct may_have_side_effects for classes (#9367)
(sapphi-red)
- d3ed128 minifier: Do not remove `=== 0` if the lhs can be NaN (#9352)
(sapphi-red)

### Performance

- 82adab9 ast/estree: Speed up building UTF8-UTF16 translation table
with SIMD (#9359) (overlookmotel)
- 61939ca ast/estree: Faster UTF-8 to UTF-16 span conversion (#9349)
(overlookmotel)
- 1bfc459 ast/estree: Pre-allocate `CodeBuffer` for JSON output (#9340)
(overlookmotel)
- 018c523 ast/estree: `ESTree` serializer use `CodeBuffer` (#9331)
(overlookmotel)
- 35ee399 codegen: Use `iter::repeat_n` in `CodeBuffer` (#9325)
(overlookmotel)

### Documentation

- 8bd3e39 data_structures: Uppercase SAFETY comments (#9330)
(overlookmotel)

### Refactor

- d94fc15 allocator: Reduce scope of `unsafe` blocks (#9319)
(overlookmotel)
- 7427900 ast: Re-order `ExportDefaultDeclaration` fields (#9348)
(overlookmotel)
- b09249c ast/estree: Rename serializers and serialization methods
(#9284) (overlookmotel)
- 55ed1df ast/estree: Shorten `ESTree` impls for enums (#9275)
(overlookmotel)
- 9d98444 codegen, data_structures: Move `CodeBuffer` into
`oxc_data_structures` crate (#9326) (overlookmotel)
- 6a4e892 data_structures: Add debug assertion to
`CodeBuffer::peek_nth_char_back` and improve safety docs (#9336)
(overlookmotel)
- fc46218 data_structures: `CodeBuffer::print_str` use
`Vec::extend_from_slice` (#9332) (overlookmotel)
- 690bae5 data_structures: Stack types const assert `T` is not zero-size
type (#9318) (overlookmotel)
- 10ba2ea data_structures: Reduce scope of `unsafe` blocks (#9316)
(overlookmotel)
- beb8382 data_structures: `CodeBuffer::print_bytes_unchecked` take a
byte slice (#9327) (overlookmotel)
- faf966f ecmascript: Don't check side effects in constant_evaluation
(#9122) (sapphi-red)
- 2faabe1 estree: Make `itoa` dependency optional (#9338)
(overlookmotel)
- 4e9e8cf lexer: Reduce scope of `unsafe` blocks (#9320) (overlookmotel)
- c31b53f mangler: Reduce scope of `unsafe` blocks (#9321)
(overlookmotel)
- f10a6da mangler: Move base54 into seperate mod (#9278) (Cameron)
- 12e89e0 syntax: Reduce scope of `unsafe` blocks (#9322)
(overlookmotel)
- f39be5f traverse: Reduce scope of `unsafe` blocks (#9323)
(overlookmotel)

Co-authored-by: Boshen <1430279+Boshen@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
0-merge Merge with Graphite Merge Queue A-ast Area - AST C-performance Category - Solution not expected to change functional behavior, only performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants