-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Key extract #1950
Key extract #1950
Conversation
provider/datagen/src/lib.rs
Outdated
let keys = BufReader::new(file) | ||
.lines() | ||
.collect::<std::io::Result<HashSet<String>>>()?; | ||
pub fn keys_from_bin<P: AsRef<Path>>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this you can actually point your build.rs
at your target folder and then find a fixed point in your compilation. Luckily it's just one step.
@@ -183,9 +183,3 @@ args = [ | |||
"--all-locales", | |||
"--ignore-missing-data", | |||
] | |||
|
|||
[tasks.icu4x-key-extract] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Please restore the CI job that tests this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now in a unit test so it's in ci-job-test
.
@@ -1,5 +0,0 @@ | |||
datetime/lengths@1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: Please restore this file and ensure that work_log.wasm
is unchanged across the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What for? I've removed the key-file functionality from datagen as users can provide a list of keys as either a &[ResourceKey]
or &[&str]
, and it's not needed as an intermediate format for key extraction anymore either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key file tests two things:
- That key extraction works
- That compiling the binary with the appropriate optimization options produces the same key list, and that we are aware when the key list changes (i.e., a golden data test)
(1) is tested elsewhere now, which is fine, but not (2) I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, it looks like (2) is tested.
Why did the WASM file have a +2MB change, then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it's new. Before we had checked in the key file, now we're checking in the binary
provider/datagen/src/bin/datagen.rs
Outdated
Arg::with_name("KEY_FILE") | ||
.long("key-file") | ||
Arg::with_name("BINARY") | ||
.long("binary") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I'm still not completely onboard with this much indirection and removing the key file. But, if you include this option, please name it something such as:
--key-binary
--inspect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I don't really think that key-extract will be everyone's preferred workflow: i still think supporting key files is useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should definitely support a data file, but @robertbastian wanted to add a .toml file that contains all of the configurations that can be passed on the command line: keys, locales, CLDR/ICU versions, etc. I do have a slight preference for keeping --key-file
until we get to that point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that seems ok to me
exit 0 | ||
else | ||
echo "*****" | ||
echo "work_log+keys.txt do not match! Actual generated output:" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Please restore helpful comments like this one into test_keys_from_bin
if that test is replacing this test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert_eq will already give helpful comments.
assert_eq!( | ||
keys_from_file(file).unwrap(), | ||
keys_from_bin(PathBuf::from(env!("CARGO_MANIFEST_DIR")).join("tests/data/work_log.wasm")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It looks like work_log.wasm is a one-off file that does not need an automated job to re-generate it. Please:
- Re-build it with empty data so that it isn't 2.47 MB
- Leave some instructions on how to manually re-create the file if we ever want to do so in the future
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably add a regenerate-work-log makefile task or something
); | ||
|
||
assert_eq!( | ||
ResourceKey::construct_internal(tagged!("hello/world@")), | ||
Err(("[0-9]", 26)) | ||
Err(("[0-9]", concat!(leading_tag!(), "hello/world@").len())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can go back to the NUL thing if you want
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I still am more comfy with having something besides just a NUL deliminating the strings. We are introducing these identifiers into the lexicon, and it is not at all unreasonable to think that they might end up in legitimate programs. We should have some ICU4X-specific prefix and/or suffix to label them.
Fixes #1533