-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement icu4x-key-extract by string tagging #1480
Conversation
Hooray! The files in the branch are the same across the force-push. 😃 ~ Your Friendly Jira-GitHub PR Checker Bot |
This is a clever approach! However, there are two issues:
|
The idea for 2 was something like struct ResourceKey {
tag_and_human: &'static str,
pub machine: Something,
}
impl ResourceKey {
fn human(&self) -> &str {
&self.tag_and_human[6..]
}
} But yes, this requires a macro, but why is that undesirable? |
We cannot get around the requirement to have a public const function constructor, because macros are codegen at the call site, and they need to be able to call a function in order to construct the ResourceKey. We could still choose to have a macro if we feel that it increases ergonomics, but we need to enforce invariants within the const function. So, basically, we could potentially make a ResourceKey constructor that looks like impl ResourceKey {
/// Returns an error if path_with_tag does not begin with "\nicu4x_key_tag" and does not end with "\n"
pub const fn try_new(path_with_tag: &str) -> Result<Self, ()> {
// validate that path_with_tag is in the correct syntax
// ...
}
} And then the macro could use If we go with the approach I've proposed in #1511, the const function takes a human-readable path with no requirement that it also include the tag, and the tag is purely internal to the code with the I do have some worry that if we make this revolve around the human-readable tag, we could end up with false positives. I think false positives are less likely to happen with the |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
provider/core/src/resource.rs
Outdated
// unsafe { | ||
// // Safe due to invariant that self.path is tagged correctly | ||
// core::str::from_utf8_unchecked(core::slice::from_raw_parts( | ||
// self.path.as_ptr().add(leading_tag!().len()), | ||
// self.path.len() - trailing_tag!().len() - leading_tag!().len(), | ||
// )) | ||
// } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah let's go with this
example::ResourceKey::get_path:
mov rax, qword ptr [rdi]
mov rdx, qword ptr [rdi + 8]
add rax, 14
add rdx, -15
ret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For completeness: self.path
example::ResourceKey::get_path:
mov rax, qword ptr [rdi]
mov rdx, qword ptr [rdi + 8]
ret
I think we can live with two extra adds.
provider/core/src/resource.rs
Outdated
// self.path.len() - trailing_tag!().len() - leading_tag!().len(), | ||
// )) | ||
// } | ||
&self.path[leading_tag!().len()..self.path.len() - trailing_tag!().len()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is indeed very panicky
example::ResourceKey::get_path:
push rax
mov rax, qword ptr [rdi]
mov rsi, qword ptr [rdi + 8]
lea rcx, [rsi - 1]
cmp rcx, 14
jb .LBB0_4
cmp rsi, 15
jb .LBB0_4
cmp byte ptr [rax + 14], -64
jl .LBB0_4
cmp byte ptr [rax + rcx], -65
jle .LBB0_4
add rax, 14
add rsi, -15
mov rdx, rsi
pop rcx
ret
.LBB0_4:
lea r8, [rip + .L__unnamed_1]
mov edx, 14
mov rdi, rax
call qword ptr [rip + core::str::slice_error_fail@GOTPCREL]
ud2
.L__unnamed_2:
.ascii "/app/example.rs"
.L__unnamed_1:
.quad .L__unnamed_2
.asciz "\017\000\000\000\000\000\000\000\"\000\000\000\n\000\000"
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is missing the actual icu4x-keyextract binary. Do you plan to add it?
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
The key extract 'binary' is here |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, and thank you for addressing all my concerns!
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
Notice: the branch changed across the force-push!
~ Your Friendly Jira-GitHub PR Checker Bot |
echo "datetime/symbols@1" >> ${2} | ||
echo "plurals/ordinal@1" >> ${2} | ||
""" | ||
script = '''strings ${1} | grep -e "icu4x_key_tag\w" | sed s/icu4x_key_tag// | sort > ${2}''' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought: Although using strings
is handy, we should still offer a cross-platform solution I think. Opinions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how complex strings
is, i.e. it can probably handle many different binary formats etc. Could be hard to match its sophistication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verify manually that it works for at least x86_64 (elf) and wasm files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you verify manually that it works for at least x86_64 (elf) and wasm files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I manually verified ELF, and WASM I believe as well, but that's covered by the CI test anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's this https://docs.microsoft.com/en-us/sysinternals/downloads/strings. The piping, grep
, and sed
probably won't work on windows though...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes me uneasy to think that a pivotal part of the ICU4X data pipeline rests on piping strings
to sed
. I'd rather have a tool that takes an executable and returns the key list.
(My approval stands; please file a follow up to discuss further)
#948
#1106