Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode 16.0 #60

Merged
merged 7 commits into from
Sep 30, 2024
Merged

Unicode 16.0 #60

merged 7 commits into from
Sep 30, 2024

Conversation

Carbon225
Copy link
Contributor

@Carbon225 Carbon225 commented Sep 17, 2024

Validation tests pass.

This test needs to be updated (we need to find a character that is not assigned to a script, 771 is now Hebrew):

    # unknown script for character
    c = chr(771)
    e: CurableSequence = ens_process(f'bitcoin.bitcin.bi̇tcin.bitсin{c}').error
    assert e.general_info == 'Contains visually confusing characters from multiple scripts (Latin plus other scripts)'
    assert (
        e.sequence_info
        == 'This character is disallowed because it is visually confusing with another character from the Latin script'
    )

There is no pyunormalize for unicode 16.0.

UnicodeWarning: Unicode version mismatch: pyunormalize is using 15.1.0, but the ENS Normalization spec is for 16.0.0 (2024-09-10T20:47:54.200Z)

@djstrong
Copy link
Contributor

@Carbon225 There is pyunormalize for unicode 16.0

@Carbon225
Copy link
Contributor Author

I wrote this code:

for cp in tqdm(range(0, 0x10FFFF)):
    c = chr(cp)
    e = ens_process(f'bitcoin.bitcin.bi̇tcin.bitсin{c}').error
    try:
        assert e.general_info == 'Contains visually confusing characters from multiple scripts (Latin plus other scripts)'
        assert (
            e.sequence_info
            == 'This character is disallowed because it is visually confusing with another character from the Latin script'
        )
        print(f'{cp} - OK')
        break
    except AssertionError:
        pass

It should find a new example for this test.
But it did not find any characters that cause this message.

I will make this test xfail, most likely all codepoints are now included in ens-normalize-js.

@Carbon225
Copy link
Contributor Author

I changed this in the readme:

Based on JavaScript implementation version 1.9.0

Original implementation based on JavaScript implementation version 1.9.0

I did not want people thinking we are using some old reference implementation.

Copy link
Member

@lightwalker-eth lightwalker-eth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Carbon225 @djstrong Looks good!

I shared 3 comments with feedback. Suggest we action the comment I shared on the README.md file in this PR. For the other comments, appreciate your advice and good to action those separately in a different PR.

README.md Outdated
@@ -10,7 +10,7 @@
* This library is being maintained by the team at [NameHash Labs](https://namehashlabs.org) as part of the greater [NameGuard](https://nameguard.io) solution to help protect the ENS community.
* Passes **100%** of the [official validation tests](https://github.com/adraffy/ens-normalize.js/tree/main/validate) (validated automatically with pytest on Linux, MacOS, and Windows, see below for details).
* Passes an [additional suite of further tests](/tools/updater/update-ens.js#L54) for compatibility with the official [Javascript reference implementation](https://github.com/adraffy/ens-normalize.js) and code testing coverage.
* Based on [JavaScript implementation version 1.9.0](https://github.com/adraffy/ens-normalize.js/tree/4873fbe6393e970e186ab57860cc59cbbb1fa162).
* Original implementation based on [JavaScript implementation version 1.9.0](https://github.com/adraffy/ens-normalize.js/tree/4873fbe6393e970e186ab57860cc59cbbb1fa162).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Original implementation based on [JavaScript implementation version 1.9.0](https://github.com/adraffy/ens-normalize.js/tree/4873fbe6393e970e186ab57860cc59cbbb1fa162).
* Supports updates to the ENS Normalize reference implementation for Unicode 16.0.

Spent some time thinking about this line. I think it's best we just fully remove the old idea / bullet point here. I agree it made since when we first released this library, but we already are making reference to Raffy's JS implementation up above and no benefit for anyone if we go into this detail here.

Alternatively, I think it's now useful to make a note here to identify which Unicode version we've updated this library to support.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lightwalker-eth I like the idea of mentioning unicode support. But what do you mean by "supports updates"?
Is it supposed to mean "Up-to-date with ENS Normalize reference implementation as of Unicode version 16.0"?
I'm happy to go with your version if you think it sounds ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Carbon225 Good catch. My suggestion could have been more clear. I like the way you proposed writing it:

"Up-to-date with ENS Normalize reference implementation as of Unicode version 16.0."

tools/updater/spec.json Outdated Show resolved Hide resolved
tests/ens-normalize-tests.json Outdated Show resolved Hide resolved
Carbon225 and others added 2 commits September 25, 2024 14:13
Signed-off-by: kwrobel.eth <djstrong@gmail.com>
@Carbon225 Carbon225 merged commit 6991e35 into main Sep 30, 2024
7 checks passed
@djstrong djstrong deleted the unicode-16.0 branch September 30, 2024 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants