-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add as_ascii_unchecked() methods to char, u8, and str #137432
base: master
Are you sure you want to change the base?
Add as_ascii_unchecked() methods to char, u8, and str #137432
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) some time within the next two weeks. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is probably worth adding assert_unsafe_precondition
checks to these functions.
rust/library/core/src/ub_checks.rs
Lines 6 to 79 in 0769736
/// Checks that the preconditions of an unsafe function are followed. | |
/// | |
/// The check is enabled at runtime if debug assertions are enabled when the | |
/// caller is monomorphized. In const-eval/Miri checks implemented with this | |
/// macro for language UB are always ignored. | |
/// | |
/// This macro should be called as | |
/// `assert_unsafe_precondition!(check_{library,language}_ub, "message", (ident: type = expr, ident: type = expr) => check_expr)` | |
/// where each `expr` will be evaluated and passed in as function argument `ident: type`. Then all | |
/// those arguments are passed to a function with the body `check_expr`. | |
/// Pick `check_language_ub` when this is guarding a violation of language UB, i.e., immediate UB | |
/// according to the Rust Abstract Machine. Pick `check_library_ub` when this is guarding a violation | |
/// of a documented library precondition that does not *immediately* lead to language UB. | |
/// | |
/// If `check_library_ub` is used but the check is actually guarding language UB, the check will | |
/// slow down const-eval/Miri and we'll get the panic message instead of the interpreter's nice | |
/// diagnostic, but our ability to detect UB is unchanged. | |
/// But if `check_language_ub` is used when the check is actually for library UB, the check is | |
/// omitted in const-eval/Miri and thus if we eventually execute language UB which relies on the | |
/// library UB, the backtrace Miri reports may be far removed from original cause. | |
/// | |
/// These checks are behind a condition which is evaluated at codegen time, not expansion time like | |
/// [`debug_assert`]. This means that a standard library built with optimizations and debug | |
/// assertions disabled will have these checks optimized out of its monomorphizations, but if a | |
/// caller of the standard library has debug assertions enabled and monomorphizes an expansion of | |
/// this macro, that monomorphization will contain the check. | |
/// | |
/// Since these checks cannot be optimized out in MIR, some care must be taken in both call and | |
/// implementation to mitigate their compile-time overhead. Calls to this macro always expand to | |
/// this structure: | |
/// ```ignore (pseudocode) | |
/// if ::core::intrinsics::check_language_ub() { | |
/// precondition_check(args) | |
/// } | |
/// ``` | |
/// where `precondition_check` is monomorphic with the attributes `#[rustc_nounwind]`, `#[inline]` and | |
/// `#[rustc_no_mir_inline]`. This combination of attributes ensures that the actual check logic is | |
/// compiled only once and generates a minimal amount of IR because the check cannot be inlined in | |
/// MIR, but *can* be inlined and fully optimized by a codegen backend. | |
/// | |
/// Callers should avoid introducing any other `let` bindings or any code outside this macro in | |
/// order to call it. Since the precompiled standard library is built with full debuginfo and these | |
/// variables cannot be optimized out in MIR, an innocent-looking `let` can produce enough | |
/// debuginfo to have a measurable compile-time impact on debug builds. | |
#[macro_export] | |
#[unstable(feature = "ub_checks", issue = "none")] | |
macro_rules! assert_unsafe_precondition { | |
($kind:ident, $message:expr, ($($name:ident:$ty:ty = $arg:expr),*$(,)?) => $e:expr $(,)?) => { | |
{ | |
// This check is inlineable, but not by the MIR inliner. | |
// The reason for this is that the MIR inliner is in an exceptionally bad position | |
// to think about whether or not to inline this. In MIR, this call is gated behind `debug_assertions`, | |
// which will codegen to `false` in release builds. Inlining the check would be wasted work in that case and | |
// would be bad for compile times. | |
// | |
// LLVM on the other hand sees the constant branch, so if it's `false`, it can immediately delete it without | |
// inlining the check. If it's `true`, it can inline it and get significantly better performance. | |
#[rustc_no_mir_inline] | |
#[inline] | |
#[rustc_nounwind] | |
const fn precondition_check($($name:$ty),*) { | |
if !$e { | |
::core::panicking::panic_nounwind(concat!("unsafe precondition(s) violated: ", $message, | |
"\n\nThis indicates a bug in the program. \ | |
This Undefined Behavior check is optional, and cannot be relied on for safety.")); | |
} | |
} | |
if ::core::ub_checks::$kind() { | |
precondition_check($($arg,)*); | |
} | |
} | |
}; | |
} |
#[inline] | ||
pub const unsafe fn as_ascii_unchecked(&self) -> ascii::Char { | ||
// SAFETY: the caller promised that this char is ASCII. | ||
unsafe { ascii::Char::from_u8_unchecked(*self as u8) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the as
here will mean that, as far as Miri and the backend can tell, '\u{1234}'.as_ascii_unchecked()
is completely legal and fine, which seems wrong.
Probably should do something to avoid that and let Miri catch mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I am not particularly familiar with Miri so I'm unsure which kinds of UB it can and cannot catch. Would you be able to give me some pointers as to how to implement this feaure in a way that Miri can diagnose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add some assert_unsafe_precondition!
s using check_library_ub
. You want to ensure the input char/byte/string is within range (ie. run the validation that the checked counterparts do). I linked the assert_unsafe_precondition
docs above, and you can also search around std for examples of it being used.
This PR adds the
as_ascii_unchecked()
method tochar
,u8
, andstr
, allowing users to convert these types toascii::Char
s (see #110998) in anunsafe
context without first checking for validity. This method was already available for[u8]
, so this PR makes the API more consistent across other types.