-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient integer formatting into fixed-size buffer #546
Comments
After more careful thought, I’m not sure whether |
You can add jiff to the motivating examples. I am in general in favor of something like this, but definitely have some concerns. One is the concern you bring up, although I wonder if it's that bad of an idea to just specify that behavior (that digits are written to the end of the buffer). Like it doesn't seem that bad. The other concern I have is that this API doesn't account for formatting options like padding and what not. I suspect such things can be implemented on top of this by callers, but I think we'd want to actually try it with the proposed API to make sure it's reasonable (and other types of common configurations done to decimal formatting). Also curious to hear @dtolnay's thoughts on this as well! |
This seems like a good use for the (currently unstable) |
On second thought, taking a step back: the current implementations in the standard library already use essentially the proposed API (write digits into a small |
what about having the destination buffer type be maybe: /// could be sealed
pub trait BorrowedBufIntoStr<'a>: 'a {
fn cursor<'this>(&'this mut self) -> BorrowedCursor<'this>;
fn borrowed_buf_into_str(self) -> &'a mut str;
}
impl<'a> BorrowedBufIntoStr<'a> for BorrowedBuf<'a> { ... }
impl<'a, 'b> BorrowedBufIntoStr<'a> for &'a mut BorrowedBuf<'b> { ... }
impl <int> {
fn fmt_into<'a>(self, base: u32, options: FormatterOptions, buf: impl BorrowedBufIntoStr<'a>) -> Result<&'a mut str, NotEnoughSpace> { ... }
} |
As long as the implementation wants to write to the end of the buffer, that seems to clash with the |
Would a |
I think it would work if it was If we do want to have a dedicated type for the buffer, instead of raw "array of MaybeUninit" I think most of the ergonomic gains can be had with the signature |
if you're formatting in union BufferData<T> {
ints: MaybeUninit<[T; 3]>,
i8_bytes: MaybeUninit<[u8; 4]>, // enough space for "-128"
}
pub struct Buffer<T> {
data: BufferData<T>,
start: usize, // formatting likes to write from the end to the start
}
impl<T> Buffer<T> {
pub const new() -> Self {
Self {
data: BufferData { ints: MaybeUninit::uninit() },
start: mem::size_of::<Self>(),
}
}
pub const fn clear(&mut self) {
self.start = mem::size_of::<Self>();
}
pub const fn as_str(&self) -> &str {
unsafe {
let v = slice::from_raw_parts(&raw const self.data as *const u8, mem::size_of::<Self>());
str::from_utf8_unchecked(v.get_unchecked(self.start..))
}
}
const unsafe fn data_mut(&mut self) -> &mut [MaybeUninit<u8>] {
unsafe {
slice::from_raw_parts_mut(&raw mut self.data as *mut u8, mem::size_of::<Self>())
}
}
const unsafe fn push_front(&mut self, v: u8) {
self.start -= 1;
unsafe {
self.data_mut()[self.start] = MaybeUninit::new(v);
}
}
}
impl<T> Deref for Buffer<T> {
type Target = str;
fn deref(&self) -> &Self::Target {
self.as_str()
}
}
impl <signed-int> {
pub const fn format(self, buf: &mut Buffer<Self>) -> &str {
self.unsigned_abs().format(unsafe { &mut *(buf as *mut Buffer<Self> as *mut Buffer<<unsigned-int>>) });
if self < 0 {
unsafe { buf.push_front(b'-') };
}
buf.as_str()
}
}
impl <unsigned-int> {
pub const fn format(mut self, buf: &mut Buffer<Self>) -> &str {
buf.clear();
// not the most efficient algorithm...
unsafe {
loop {
self.push_front(b'0' + (self % 10) as u8);
self /= 10;
if self == 0 {
break;
}
}
}
buf.as_str()
}
} |
One little issue here is that you can't pass the |
Proposal
Problem statement
The standard library provides highly optimized implementations of integer to decimal string conversions, but these are only accessible via the
core::fmt
machinery, which forces 1-2 layers of dynamic dispatch between user code and the actual formatting logic. Benchmarks in the itoa crate demonstrate that side-steppingfmt
makes formatting much more efficient. Currently, any Rust user who wants that performance has to use third-party crates like itoa or lexical(-core) which essentially duplicate standard library functionality.Motivating examples or use cases
serde-json
http
: integer to HTTP header valuesmaud
: rendering integers in HTML templatesarrow-json
: converting data from the Arrow format to JSONlexical*
crates also include float formatting and integer/float parsing, so their list of dependents is less illustrative).compact_str
uses itoa for 128 bit integers, but for smaller integers they vendor the standard library code and modify it to write directly into their custom string type's buffer. However, in this case the buffer size is based on the magnitude of the integer, not the worst-case size.Solution sketch
This can be used from safe code, though it's a little more noisy than the itoa API since
MaybeUninit::uninit_array
is slated for removal:Alternatively, unsafe code can write directly into the buffer they want, e.g., for the itoa usage in(edit: not so simple, see first reply)http
could write directly into thespare_capacity_mut()
of theBytesMut
it creates. I believe it could also replace the homebrew integer formatting inrustix::DecInt
.Alternatives
The obvious option would be to import the API of itoa directly (single
Buffer
type withfn format(&mut self, n: impl SealedTrait) -> &str
), since it's already widely used. However:Not being able to format directly into part of a buffer you own is insufficient for some users, who end up vendoring their own integer formatting code (e.g.,(edit: not so simple, see first reply)rustix
andcompact_str
as mentioned under motivation).u<N>
andi<N>
types with generous limits onN
(e.g., Generic Integers V2: It's Time rfcs#3686) then a one-size-fits-all buffer may become excessively large. Even with a limit of N <= 4096, it would be well over a kilobyte. Even though the buffer doesn't have to be initialized, it'll still increase the stack frame size, which can have undesirable side effects on code generation (stack probes are generally inserted for frames larger than one page, SP-relative loads and stores need larger offsets that may no longer fit into an immediate operand).MAX_STR_LEN
), the general trend in the standard library is to have inherent associated functions and constants on every integer type, not traits implemented for every integer type.Other alternatives:
fn write(n: impl SomeTrait, buf: &mut [u8]) -> &[u8]
, but this requiresunsafe
to get a string out of it, and even if the return type is changed to&str
, it can panic if the buffer is too small for the givenn
and requires a fully initialized buffer.fn to_string(n: impl SomeTrait) -> String
but this requiresalloc
(not justcore
) and does an unnecessary heap allocation when the result is immediately copied into another buffer.Links and related work
write!
faster dtolnay/itoa#1What happens now?
This issue contains an API change proposal (or ACP) and is part of the libs-api team feature lifecycle. Once this issue is filed, the libs-api team will review open proposals as capability becomes available. Current response times do not have a clear estimate, but may be up to several months.
Possible responses
The libs team may respond in various different ways. First, the team will consider the problem (this doesn't require any concrete solution or alternatives to have been proposed):
Second, if there's a concrete solution:
The text was updated successfully, but these errors were encountered: