diff --git a/proposals/string-ascii-unicode-utf-grapheme.md b/proposals/string-ascii-unicode-utf-grapheme.md index 74dccc3c56..fe4f79723b 100644 --- a/proposals/string-ascii-unicode-utf-grapheme.md +++ b/proposals/string-ascii-unicode-utf-grapheme.md @@ -21,15 +21,15 @@ with supporting only ASCII is that other encodings can get corrupted. - UTF-8 is 1-4 sets of 8 bits long - UTF-16 is 1-2 sets of 16 bits long -- UTF-32 is 1 set 32 bits long +- UTF-32 is 1 set of 32 bits long #### When slicing by unicode codepoint e.g. "🔥🔥🔥" (\U0001f525\U0001f525\U0001f525) -- UTF-8: 12 bytes long. The first byte of each fire can be used to know the +- UTF-8: 12 sets (12 bytes) long. The first byte of each fire can be used to know the length, and the next bytes are what is known as a continuation byte. There are several approaches to achieve the slicing, they can be explored with benchmarking later on. -- UTF-16: 6 bytes long. It's very similar in procedure to UTF-8. +- UTF-16: 6 sets long. It's very similar in procedure to UTF-8. - UTF-32: It is fastest since it's direct index access. This is not the case when supporting graphemes.