Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add as_ascii_unchecked() methods to char, u8, and str #137432

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

djscythe
Copy link

This PR adds the as_ascii_unchecked() method to char, u8, and str, allowing users to convert these types to ascii::Chars (see #110998) in an unsafe context without first checking for validity. This method was already available for [u8], so this PR makes the API more consistent across other types.

@rustbot
Copy link
Collaborator

rustbot commented Feb 22, 2025

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) some time within the next two weeks.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

  • @rustbot author: the review is finished, PR author should check the comments and take action accordingly
  • @rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Feb 22, 2025
Copy link
Contributor

@Sky9x Sky9x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is probably worth adding assert_unsafe_precondition checks to these functions.

/// Checks that the preconditions of an unsafe function are followed.
///
/// The check is enabled at runtime if debug assertions are enabled when the
/// caller is monomorphized. In const-eval/Miri checks implemented with this
/// macro for language UB are always ignored.
///
/// This macro should be called as
/// `assert_unsafe_precondition!(check_{library,language}_ub, "message", (ident: type = expr, ident: type = expr) => check_expr)`
/// where each `expr` will be evaluated and passed in as function argument `ident: type`. Then all
/// those arguments are passed to a function with the body `check_expr`.
/// Pick `check_language_ub` when this is guarding a violation of language UB, i.e., immediate UB
/// according to the Rust Abstract Machine. Pick `check_library_ub` when this is guarding a violation
/// of a documented library precondition that does not *immediately* lead to language UB.
///
/// If `check_library_ub` is used but the check is actually guarding language UB, the check will
/// slow down const-eval/Miri and we'll get the panic message instead of the interpreter's nice
/// diagnostic, but our ability to detect UB is unchanged.
/// But if `check_language_ub` is used when the check is actually for library UB, the check is
/// omitted in const-eval/Miri and thus if we eventually execute language UB which relies on the
/// library UB, the backtrace Miri reports may be far removed from original cause.
///
/// These checks are behind a condition which is evaluated at codegen time, not expansion time like
/// [`debug_assert`]. This means that a standard library built with optimizations and debug
/// assertions disabled will have these checks optimized out of its monomorphizations, but if a
/// caller of the standard library has debug assertions enabled and monomorphizes an expansion of
/// this macro, that monomorphization will contain the check.
///
/// Since these checks cannot be optimized out in MIR, some care must be taken in both call and
/// implementation to mitigate their compile-time overhead. Calls to this macro always expand to
/// this structure:
/// ```ignore (pseudocode)
/// if ::core::intrinsics::check_language_ub() {
/// precondition_check(args)
/// }
/// ```
/// where `precondition_check` is monomorphic with the attributes `#[rustc_nounwind]`, `#[inline]` and
/// `#[rustc_no_mir_inline]`. This combination of attributes ensures that the actual check logic is
/// compiled only once and generates a minimal amount of IR because the check cannot be inlined in
/// MIR, but *can* be inlined and fully optimized by a codegen backend.
///
/// Callers should avoid introducing any other `let` bindings or any code outside this macro in
/// order to call it. Since the precompiled standard library is built with full debuginfo and these
/// variables cannot be optimized out in MIR, an innocent-looking `let` can produce enough
/// debuginfo to have a measurable compile-time impact on debug builds.
#[macro_export]
#[unstable(feature = "ub_checks", issue = "none")]
macro_rules! assert_unsafe_precondition {
($kind:ident, $message:expr, ($($name:ident:$ty:ty = $arg:expr),*$(,)?) => $e:expr $(,)?) => {
{
// This check is inlineable, but not by the MIR inliner.
// The reason for this is that the MIR inliner is in an exceptionally bad position
// to think about whether or not to inline this. In MIR, this call is gated behind `debug_assertions`,
// which will codegen to `false` in release builds. Inlining the check would be wasted work in that case and
// would be bad for compile times.
//
// LLVM on the other hand sees the constant branch, so if it's `false`, it can immediately delete it without
// inlining the check. If it's `true`, it can inline it and get significantly better performance.
#[rustc_no_mir_inline]
#[inline]
#[rustc_nounwind]
const fn precondition_check($($name:$ty),*) {
if !$e {
::core::panicking::panic_nounwind(concat!("unsafe precondition(s) violated: ", $message,
"\n\nThis indicates a bug in the program. \
This Undefined Behavior check is optional, and cannot be relied on for safety."));
}
}
if ::core::ub_checks::$kind() {
precondition_check($($arg,)*);
}
}
};
}

#[inline]
pub const unsafe fn as_ascii_unchecked(&self) -> ascii::Char {
// SAFETY: the caller promised that this char is ASCII.
unsafe { ascii::Char::from_u8_unchecked(*self as u8) }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, the as here will mean that, as far as Miri and the backend can tell, '\u{1234}'.as_ascii_unchecked() is completely legal and fine, which seems wrong.

Probably should do something to avoid that and let Miri catch mistakes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately I am not particularly familiar with Miri so I'm unsure which kinds of UB it can and cannot catch. Would you be able to give me some pointers as to how to implement this feaure in a way that Miri can diagnose?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add some assert_unsafe_precondition!s using check_library_ub. You want to ensure the input char/byte/string is within range (ie. run the validation that the checked counterparts do). I linked the assert_unsafe_precondition docs above, and you can also search around std for examples of it being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants