Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key extract #1950

Merged
merged 11 commits into from
Jun 13, 2022
Merged

Key extract #1950

merged 11 commits into from
Jun 13, 2022

Conversation

robertbastian
Copy link
Member

@robertbastian robertbastian commented May 26, 2022

Fixes #1533

@robertbastian robertbastian requested review from Manishearth, sffc and a team as code owners May 26, 2022 05:22
let keys = BufReader::new(file)
.lines()
.collect::<std::io::Result<HashSet<String>>>()?;
pub fn keys_from_bin<P: AsRef<Path>>(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this you can actually point your build.rs at your target folder and then find a fixed point in your compilation. Luckily it's just one step.

@@ -183,9 +183,3 @@ args = [
"--all-locales",
"--ignore-missing-data",
]

[tasks.icu4x-key-extract]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Please restore the CI job that tests this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now in a unit test so it's in ci-job-test.

@@ -1,5 +0,0 @@
datetime/lengths@1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue: Please restore this file and ensure that work_log.wasm is unchanged across the PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What for? I've removed the key-file functionality from datagen as users can provide a list of keys as either a &[ResourceKey] or &[&str], and it's not needed as an intermediate format for key extraction anymore either.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key file tests two things:

  1. That key extraction works
  2. That compiling the binary with the appropriate optimization options produces the same key list, and that we are aware when the key list changes (i.e., a golden data test)

(1) is tested elsewhere now, which is fine, but not (2) I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it looks like (2) is tested.

Why did the WASM file have a +2MB change, then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's new. Before we had checked in the key file, now we're checking in the binary

@robertbastian robertbastian requested a review from sffc May 27, 2022 23:07
Arg::with_name("KEY_FILE")
.long("key-file")
Arg::with_name("BINARY")
.long("binary")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm still not completely onboard with this much indirection and removing the key file. But, if you include this option, please name it something such as:

  • --key-binary
  • --inspect

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I don't really think that key-extract will be everyone's preferred workflow: i still think supporting key files is useful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely support a data file, but @robertbastian wanted to add a .toml file that contains all of the configurations that can be passed on the command line: keys, locales, CLDR/ICU versions, etc. I do have a slight preference for keeping --key-file until we get to that point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that seems ok to me

exit 0
else
echo "*****"
echo "work_log+keys.txt do not match! Actual generated output:"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Please restore helpful comments like this one into test_keys_from_bin if that test is replacing this test

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert_eq will already give helpful comments.

assert_eq!(
keys_from_file(file).unwrap(),
keys_from_bin(PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("tests/data/work_log.wasm"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: It looks like work_log.wasm is a one-off file that does not need an automated job to re-generate it. Please:

  1. Re-build it with empty data so that it isn't 2.47 MB
  2. Leave some instructions on how to manually re-create the file if we ever want to do so in the future

Copy link
Member

@Manishearth Manishearth May 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably add a regenerate-work-log makefile task or something

@robertbastian robertbastian requested a review from sffc June 7, 2022 22:35
);

assert_eq!(
ResourceKey::construct_internal(tagged!("hello/world@")),
Err(("[0-9]", 26))
Err(("[0-9]", concat!(leading_tag!(), "hello/world@").len()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can go back to the NUL thing if you want

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I still am more comfy with having something besides just a NUL deliminating the strings. We are introducing these identifiers into the lexicon, and it is not at all unreasonable to think that they might end up in legitimate programs. We should have some ICU4X-specific prefix and/or suffix to label them.

sffc
sffc previously approved these changes Jun 8, 2022
sffc
sffc previously approved these changes Jun 10, 2022
Manishearth
Manishearth previously approved these changes Jun 10, 2022
@robertbastian robertbastian requested a review from sffc June 13, 2022 14:29
@robertbastian robertbastian dismissed stale reviews from Manishearth and sffc via 011332c June 13, 2022 14:30
@robertbastian robertbastian merged commit 0a12d60 into unicode-org:main Jun 13, 2022
@robertbastian robertbastian deleted the keyex branch June 17, 2022 07:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improve icu4x_key_extract
3 participants