Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tinystr-neo to experimental/ #1508

Merged
merged 12 commits into from
Jan 18, 2022
Merged

Conversation

Manishearth
Copy link
Member

This is a clean attempt at tinystr, using const fn and fixed size arrays.

This should address zbraniecki/tinystr#45 / #881

The general idea is, instead of encoding strings as digits, we encode them as byte arrays. This has many advantages:

  • No mucking about with endianness
  • We can support things like TinyAsciiStr<5> or TinyAsciiStr<100>, no restrictions on N
  • ULE gets cleaner

A potential downside is that these are less aligned (we can write TinyStr4/etc wrappers that are better aligned), and .deref()/deserialize() may not be as efficient (we might need to benchmark this).

This is still kinda WIP, but I'm not sure how much time I'll have to work on this so I'm landing what I have so far so that others can help out. In particular I'd love help with tests, benches, and docs.

@Manishearth Manishearth requested a review from a team as a code owner January 14, 2022 21:42

let mut out = [0; N];
let mut i = 0;
while i < bytes.len() {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: can't be a for loop in const context

}

pub fn len(&self) -> usize {
self.bytes.iter().position(|x| *x == 0).unwrap_or(N)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: previously we were able to use .trailing_zeroes()/leading_zeroes() which are a single instruction. If rustc unrolls this loop for small N we should also be fine here, but I'm not sure if that's happening, we should bench this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah; this is one of the things I was wondering how we would implement efficiently. We should definitely test. I think this would be a good thing for IAI as opposed to Clippy.

}

#[inline]
fn visit_seq<A>(self, mut seq: A) -> Result<Self::Value, A::Error>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this is likely not as efficient as the deserialization from the tinystr crate.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add TinyStr4/etc as aligned wrappers that serialize to little-endian numbers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not too worried about impl Deserialize on TinyAsciiStr itself, because that one is only used when TinyAsciiStr is at the top level of a data struct. I anticipate the much more common case to be when it is in a ZeroVec, where the ULE impl will be used instead of the Deserialize impl.

// (online at: https://github.com/unicode-org/icu4x/blob/main/LICENSE ).

#[macro_export]
macro_rules! tinystr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: We maybe don't need the macro any more. People can just use the const function constructors directly. Fewer imports and friendlier to the compiler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So as mentioned in #1511 (comment) , I think it's important to give users a convenient way to call this in a const context, given that it's rare and unusual for Rust programmers to const locals wherever possible. Unlike that PR, there will be no exported consts for people to use this, so I'd prefer to keep the macro around.

MBEs (non proc macros, like this one) are pretty friendly to the compiler; and this is going to be even smaller once I rebase over the rust update. I don't find that argument that compelling.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, macro_rules is definitely an improvement over a proc macro. Okay, I'm convinced.

sffc
sffc previously approved these changes Jan 16, 2022
Copy link
Member

@sffc sffc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and please update the PR to Rust 1.58 so we can get rid of the runtime unwrap

@zbraniecki
Copy link
Member

Manish - do you plan to measure perf before merging o before switching callsites to it?

@Manishearth
Copy link
Member Author

Manishearth commented Jan 16, 2022

@zbraniecki before switching callsites. The crate is WIP (and experimental/), I just want to land what I have so far so that others (you?? 🥺) can help with docs, benches, API surface, etc.

@jira-pull-request-webhook
Copy link

Hooray! The files in the branch are the same across the force-push. 😃

~ Your Friendly Jira-GitHub PR Checker Bot

sffc
sffc previously approved these changes Jan 17, 2022
Comment on lines 9 to 10
const TINYSTR_MACRO_CONST: $crate::TinyAsciiStr<$n> =
$crate::TinyAsciiStr::from_str_panicky($s);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: does it work if you do this?

Suggested change
const TINYSTR_MACRO_CONST: $crate::TinyAsciiStr<$n> =
$crate::TinyAsciiStr::from_str_panicky($s);
const TINYSTR_MACRO_CONST: $crate::TinyAsciiStr<$n> =
$crate::TinyAsciiStr::from_str($s).unwrap();

or perhaps

Suggested change
const TINYSTR_MACRO_CONST: $crate::TinyAsciiStr<$n> =
$crate::TinyAsciiStr::from_str_panicky($s);
const TINYSTR_MACRO_CONST: $crate::TinyAsciiStr<$n> =
$crate::TinyAsciiStr::from_str($s).ok().expect("Invalid tinystr");

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(with the unwrap inside the const context)

Copy link
Member Author

@Manishearth Manishearth Jan 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, unwrap isn't stably const

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not even Option::unwrap? oh well... how about

const TINYSTR_MACRO_CONST: $crate::TinyAsciiStr<$n> =
            match $crate::TinyAsciiStr::from_str($s) {
    Ok(s) => s,
    Err(_) => panic!("{}", $s)  // Invalid tinystr
}

Or, keep it the way it is. It's just that it might be nice to not have a panicky library function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I didn't like that function either, moved

sffc
sffc previously approved these changes Jan 18, 2022
match $crate::TinyAsciiStr::from_bytes($s.as_bytes()) {
Ok(s) => s,
// Cannot format the error since formatting isn't const yet
Err(_) => panic!(concat!("Failed to construct tinystr from ", $s)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Praise: Good idea to use concat! to format a nicer message

@Manishearth Manishearth merged commit 30a3f07 into unicode-org:main Jan 18, 2022
@Manishearth Manishearth deleted the tinystr-neo branch January 18, 2022 18:06
@Manishearth Manishearth restored the tinystr-neo branch January 19, 2022 21:20
@Manishearth Manishearth deleted the tinystr-neo branch February 10, 2022 17:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants