Skip to content

Commit

Permalink
fix detail
Browse files Browse the repository at this point in the history
Signed-off-by: martinvuyk <martin.vuyklop@gmail.com>
  • Loading branch information
martinvuyk committed Feb 5, 2025
1 parent 57ac861 commit 03847e1
Showing 1 changed file with 3 additions and 3 deletions.
6 changes: 3 additions & 3 deletions proposals/string-ascii-unicode-utf-grapheme.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,15 +21,15 @@ with supporting only ASCII is that other encodings can get corrupted.

- UTF-8 is 1-4 sets of 8 bits long
- UTF-16 is 1-2 sets of 16 bits long
- UTF-32 is 1 set 32 bits long
- UTF-32 is 1 set of 32 bits long

#### When slicing by unicode codepoint e.g. "🔥🔥🔥" (\U0001f525\U0001f525\U0001f525)

- UTF-8: 12 bytes long. The first byte of each fire can be used to know the
- UTF-8: 12 sets (12 bytes) long. The first byte of each fire can be used to know the
length, and the next bytes are what is known as a continuation byte. There are
several approaches to achieve the slicing, they can be explored with
benchmarking later on.
- UTF-16: 6 bytes long. It's very similar in procedure to UTF-8.
- UTF-16: 6 sets long. It's very similar in procedure to UTF-8.
- UTF-32: It is fastest since it's direct index access. This is not the case
when supporting graphemes.

Expand Down

0 comments on commit 03847e1

Please sign in to comment.