Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is read_volatile on uninitialized memory really undefined behavior? #2807

Closed
surban opened this issue Mar 8, 2023 · 21 comments
Closed

Is read_volatile on uninitialized memory really undefined behavior? #2807

surban opened this issue Mar 8, 2023 · 21 comments

Comments

@surban
Copy link

surban commented Mar 8, 2023

Hi!

I must read memory that is considered uninitialized by Rust. For that purpose I am using the core::ptr::read_volatile function. However, when running the program under miri it complains with the following error:

$ cargo +nightly miri run
Preparing a sysroot for Miri (target: x86_64-unknown-linux-gnu)... done
   Compiling defmt-ringbuf-miri v0.2.0 (/data/surban/dev/defmt-ringbuf-miri)
    Finished dev [unoptimized + debuginfo] target(s) in 0.01s
     Running `/home/surban/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/bin/cargo-miri runner target/miri/x86_64-unknown-linux-gnu/debug/defmt-ringbuf-miri`
error: Undefined Behavior: using uninitialized data, but this operation requires initialized memory
  --> src/ring_buffer.rs:44:29
   |
44 |             let signature = (addr_of!((*ptr).signature) as *const u32).read_volatile();
   |                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ using uninitialized data, but this operation requires initialized memory
   |
   = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior
   = help: see https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html for further information
   = note: BACKTRACE:
   = note: inside `ring_buffer::RingBuffer::<8192>::init` at src/ring_buffer.rs:44:29: 44:87
note: inside `main`
  --> src/main.rs:13:27
   |
13 |     let buffer = unsafe { RingBuffer::init(&mut BUFFER) };
   |                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

note: some details are omitted, run with `MIRIFLAGS=-Zmiri-backtrace=full` for a verbose backtrace

error: aborting due to previous error

The code is available at https://github.com/surban/defmt-ringbuf-miri.

See also this discussion thread for more background information. The conclusion is that core::ptr::read_volatile on uninitialized memory is okay because the compiler cannot make any assumptions about it, i.e. it could be IO memory that changes with every read.

Thus, to be consistent with the compiler's behavior, miri should allow reading of "uninitialized" memory through core::ptr::read_volatile.

@KamilaBorowska
Copy link

KamilaBorowska commented Mar 8, 2023

Rust cannot guarantee uninitialized read_volatile to not be UB. On platforms that mark uninitialized memory specifically (such as Itanium, albeit that one only does that for registers) it may still be UB.

On platforms without such constraints this may be safe, but the rules aren't clear.

See also rust-lang/unsafe-code-guidelines#152.

@RalfJung
Copy link
Member

RalfJung commented Mar 8, 2023

We also specifically asked LLVM if they could guarantee that volatile reads are frozen, and they said no. (Not to mention that an RFC would be required to expose freezing semantics in Rust.)

he conclusion is that core::ptr::read_volatile on uninitialized memory is okay because the compiler cannot make any assumptions about it, i.e. it could be IO memory that changes with every read.

I'm afraid that's not how this works. Just because the compiler currently happens to work like this, doesn't mean anything. The only thing that matters is what is explicitly stably documented.

Right now, whether a read is volatile or not makes no difference for whether a program has UB, ever.

I must read memory that is considered uninitialized by Rust.

Then you must use MaybeUninit. That's what that type is for. :)
(And you should only use volatile if this is indeed accessing memory that is shared with other devices.)

@VorfeedCanal
Copy link

More context (copy-pasted from URLO thread): I am writing a logger for an embedded system (STM32, but the details don't really matter here). The logger writes the log messages to a ring buffer in system memory (RAM), which can then be read at a later time.

That's most definitely about accessing memory that is shared with other devices and the trouble here is how to safely access memory which Rust (and Miri) believes to be unitialized, but which is not, in reality, uninitialized.

You can not do much of an embedded work without such facility and it would be nice to have some support on Miri side for these needs.

@workingjubilee
Copy link
Member

Hello, drive-by comment from a libs contributor / person who does an unfortunate amount of FFI:

We have discussed adding support for "allocating" things (really, declaring that a certain location is known to be shared memory) in Rust, and we do intend to add it to the language/standard library. It's mostly a matter of interface design. Yes, it's true that Miri doesn't offer much support for it right now.

If it has a known fixed address at program link time, you can probably use an extern { static } and symbol relocation. But that must still be initialized to the appropriate type at the program start. I don't know if Miri supports that, but if Miri does, it's important that it's an extern static. Defining the static as a Rust static makes it look like the Rust code is responsible for the static to Miri, when it's functionally a definition from "outside" Rust.

@surban
Copy link
Author

surban commented Mar 9, 2023

Then you must use MaybeUninit. That's what that type is for. :) (And you should only use volatile if this is indeed accessing memory that is shared with other devices.)

I am using MaybeUninit and still getting the error from miri. Thus I don't know how to write my init function so that I can safely read, verify and keep the memory at the location where my struct is placed.

That's most definitely about accessing memory that is shared with other devices and the trouble here is how to safely access memory which Rust (and Miri) believes to be unitialized, but which is not, in reality, uninitialized.

In fact the memory is not shared with any other device. It is "normal" memory, placed into a linker section that prevents the initialization code from clearing it at program startup. You might think about it as memory written by another device before the program was started if you wish. However, once the Rust code is running it has exclusive access to the memory location and no special handling is required.

What I need is a way to tell the compiler that the memory has been pre-initialized before the program was started and the compiler must not make any assumptions when first reading the memory. If neither ptr::read_volatile nor ptr::read are valid ways to do this, then what is?

If it has a known fixed address at program link time, you can probably use an extern { static } and symbol relocation. But that must still be initialized to the appropriate type at the program start. I don't know if Miri supports that, but if Miri does, it's important that it's an extern static. Defining the static as a Rust static makes it look like the Rust code is responsible for the static to Miri, when it's functionally a definition from "outside" Rust.

From the docs: "An immutable [extern] static must be initialized before any Rust code is executed". I cannot guarantee that the contents of the memory location are valid before Rust code start, i.e. I have a bool in my struct and after power-on the value of that memory location might be a random bit pattern, so neither 0 nor 1.

So maybe I need an extern static of MaybeUninit<_> type. Would that make sense?

@surban
Copy link
Author

surban commented Mar 9, 2023

I've run a few experiments (with the compiler, not miri) to check whether ptr::read_volatile vs ptr::read makes any difference and in fact it is essential to use ptr::read_volatile.

First I modified Ralf's example from https://www.ralfj.de/blog/2019/07/14/uninit.html and introduced a ptr::read before assuming the variable do be initialized.

use std::mem;

fn always_returns_true(x: u8) -> bool {
    x < 120 || x == 120 || x > 120
}

fn main() {
    let xu: mem::MaybeUninit<u8> = mem::MaybeUninit::uninit();
    unsafe { xu.as_ptr().read() };
    let x = unsafe { xu.assume_init() };
    assert!(always_returns_true(x));
}

This still results in undefined behavior: thread 'main' panicked at 'assertion failed: always_returns_true(x)', src/main.rs:11:5

Then I changed the read to use ptr::read_volatile.

use std::mem;

fn always_returns_true(x: u8) -> bool {
    x < 120 || x == 120 || x > 120
}

fn main() {
    let xu: mem::MaybeUninit<u8> = mem::MaybeUninit::uninit();
    unsafe { xu.as_ptr().read_volatile() };
    let x = unsafe { xu.assume_init() };
    assert!(always_returns_true(x));
}

This executes correctly.

https://play.rust-lang.org/?version=stable&mode=release&edition=2018&gist=49db6983bb7871162883a64fe77f3f23

@oli-obk
Copy link
Contributor

oli-obk commented Mar 9, 2023

This executes correctly.

You are relying on the result of undefined behaviour, LLVM just accidentally produces the assembly you desire in the second case

What I need is a way to tell the compiler that the memory has been pre-initialized before the program was started and the compiler must not make any assumptions when first reading the memory. If neither ptr::read_volatile nor ptr::read are valid ways to do this, then what is?

You should initialize your memory before reading from it. With miri you can use #[cfg(miri)] to only run the initialization logic in case you run miri.

@bjorn3
Copy link
Member

bjorn3 commented Mar 9, 2023

For your use case I did expect the following to work:

extern "C" {
    #[link_section = ".uninit"]
    static mut BUFFER: MaybeUninit<RingBuffer<8192>>;
}

That is rather than defining the BUFFER to have initialized memory, import it from the environment such that the compiler has to assume something else (like the previous invocation of the program) wrote to it before you read it.

This is not something miri can emulate without modifying miri to handle the BUFFER import by for example writing dummy data or zeroing it.

@surban
Copy link
Author

surban commented Mar 9, 2023

For your use case I did expect the following to work:

extern "C" {
    #[link_section = ".uninit"]
    static mut BUFFER: MaybeUninit<RingBuffer<8192>>;
}

This doesn't compile because link_section is not allowed on extern items. It makes kind of sense because extern doesn't define a symbol but expects it to be defined in another module the program is linked against. However, this isn't the case here. I really need to define the symbol in the Rust file.

@saethlin
Copy link
Member

saethlin commented Mar 9, 2023

I feel like this issue started off going in the wrong direction. The answer to the initial question is definitely "yes", but I'm not sure that it is relevant to the data structure you are trying to implement. It sounds to me like the memory for a RingBuffer is foreign memory, and since it is not a stack or heap allocation, it is not obvious to me that when it is created it is actually uninitialized in the same sense as a new stack or heap allocation is.

@bjorn3
Copy link
Member

bjorn3 commented Mar 9, 2023

This doesn't compile because link_section is not allowed on extern items.

Right

It makes kind of sense because extern doesn't define a symbol but expects it to be defined in another module the program is linked against. However, this isn't the case here. I really need to define the symbol in the Rust file.

You could define it in a linker script. Or maybe just #[no_mangle] on the static mut defined in rust is enough? #[no_mangle] allows writing from external code.

@RalfJung
Copy link
Member

RalfJung commented Mar 9, 2023

This executes correctly.

The generated program is a random sequence of bytes that just happens to take the shape of a seemingly working program by accident. Such is the joy of code that causes UB. You cannot deduce anything from what happens when you execute a program with UB, since that act is by itself meaningless. You need to establish that your program has no UB before making any inference based on what you see the program do after it came out of the compiler.

Thus I don't know how to write my init function so that I can safely read, verify and keep the memory at the location where my struct is placed.

Where is the ring buffer created and initialized?

I found this comment

// On the embedded system this would be placed in a linker section
// that is not initialized at start.
// #[link_section = ".uninit"]
static mut BUFFER: MaybeUninit<RingBuffer<8192>> = MaybeUninit::uninit();

but it says that the section is not initialized so this can't be enough yet. So where is this pre-initialization you are talking about happening?

What I need is a way to tell the compiler that the memory has been pre-initialized before the program was started and the compiler must not make any assumptions when first reading the memory. If neither ptr::read_volatile nor ptr::read are valid ways to do this, then what is?

That sounds like an extern static. If you declare the static in Rust, Miri will assume that it has the value you give it in Rust -- there is no indication to the opposite, and Miri does not attempt to support linker script shenanigans. What you are doing is the equivalent of adjusting the binary after it was produced by the Rust compiler -- that's not something Miri can help with since you left the world of Rust at this point. If you had used an extern static, Miri would have shown a very clear error message saying that this program is not supported since Miri can't know what the extern static does. But since you somehow managed to 'emulate' an extern static using a regular Rust static (?!?), Miri assumes you are writing normal Rust code, and executes the program accordingly.

So using read_volatile makes no difference here. And it seems like your memory is actually initialized, so you don't even need MaybeUninit. But since you are modifying the binary after it leaves the land of Rust, there is no way Miri could know that things are initialized -- Miri executes the program as if you would run it as a regular Rust binary without any special linker treatment.

If you want to execute such code in Miri, you have to write a suitable test harness that emulates the effects of whatever outside-of-Rust magic would usually happen. Note that why anyone would initialize their statics in such a complicated way is way beyond me so I can't really help with any of the details here, but as @saethlin noticed there's a definite XY problem here. Your question isn't about read_volatile or uninit memory, it is about linker script tricks that are outside of the realm of the Rust language.

That is all based on the assumption that the static is indeed pre-initialized somehow. But I haven't found a trace of that yet. As far as Rust is concerned, if you declare a static to be initialized with MaybeUninit::uninit() as you did, then that static is 'initialized' with uninit memory, and any attempt at reading an integer from it is UB. Something has to put actually sensible values there. Which component is doing that where?

@bjorn3
Copy link
Member

bjorn3 commented Mar 9, 2023

but it says that the section is not initialized so this can't be enough yet. So where is this pre-initialization you are talking about happening?

As I understand it the intention is that whatever logs left over from before a reboot show up. This can be modeled in the AM as the contents getting saved right before a reboot and being written again after the reboot before main. In case of a cold start the contents would be all zeros as ram gets initialized to all zeros.

@surban
Copy link
Author

surban commented Mar 9, 2023

Where is the ring buffer created and initialized?

The ring buffer is initialized in the init function and nowhere else. It is not accessed by any external code, hardware peripheral, memory chip with special behavior or anything else. The only special thing about it is the link_section = ".uninit" attribute on the static that contains it to prevent the startup code from overwriting it when the system is reset.

It works like this:

When the embedded system is first powered on, the memory contains a random bit pattern. The processor starts at the entry point and eventually RingBuffer::init is called. The signature validation will notice that the signature is invalid and initialize all variables in the struct RingBuffer to default values using ptr::write_volatile. Then assume_init is called since at this points all fields contain valid values.

Now, and here comes the interesting part, when the embedded system is reset (without power off, for example by its watchdog) the processor restarts execution at the entry point with its registers cleared. However, memory (RAM) is not cleared. This is absolutely normal behavior for memory. The RingBuffer::init function is called during startup and this time it reads the signature of the RingBuffer (which was written by its previous execution before the system was reset) and sees that it is valid. It also verifies that the read_pos and write_pos fields are within bounds and that the overwritten field contains a valid Rust boolean value. It then keeps the contents of the RingBuffer and calls assume_init because it has verified that all fields contain valid values.

Note: currently I am writing the same values back into memory if the validation has passed. This should not be necessary, but I am doing it in the hope that it avoids UB. Ideally I would just read the actual memory contents, verify it and then, if verification passes, call assume_init without performing a single write.

Why am I doing this?

The ring buffer contains log messages that should not be lost during a system reset.

Do I need MaybeUninit?

I think so, because the bit pattern at power up might be invalid for the boolean field overwritten. Also I want to prevent the user from calling other functions on RingBuffer before RingBuffer::init was called to verify its state and invariants.

Is it crazy?

No, not in the embedded world. There a many cases where it is necessary to preserve data during reset. For example, a stepper motor controller might want to preserve the physical position of an axis during reset (or even firmware update) because it has no sensors to determine the position and knows it only by counting how many steps it has done.

@KamilaBorowska
Copy link

KamilaBorowska commented Mar 9, 2023

I think a reasonable approach in your situation would be to initialize the memory on #[cfg(miri)], and use extern { static } otherwise. A static declared within Rust will follow Rust language semantics which don't allow not initializing memory (if you initialize memory with MaybeUninit::uninit() then you are initializing memory with uninitialized bytes), so it needs to be declared outside of Rust - a short C or assembly file declaring that static linked with Rust will work for that.

Then you won't really need volatile as your memory will be initialized from Rust's point of view. For state synchronization you may consider using core::sync::atomic::compiler_fence to make sure writes are properly ordered.

@saethlin
Copy link
Member

saethlin commented Mar 9, 2023

When the embedded system is first powered on, the memory contains a random bit pattern.

Even taking this as a given, calling MaybeUninit::uninit() does not model this property, and that is the problem with your code that Miri is identifying. If you read a u8 from something returned by MaybeUninit::uninit(), you do not get a "random bit pattern".

It sounds like what you need here is a way to get frozen memory or to freeze existing uninit memory, which Ralf suggested far above.

@surban
Copy link
Author

surban commented Mar 9, 2023

... and use extern { static } otherwise. A static declared within Rust will follow Rust language semantics which don't allow not initializing memory (if you initialize memory with MaybeUninit::uninit() then you are initializing memory with uninitialized bytes), so it needs to be declared outside of Rust - a short C or assembly file declaring that static linked with Rust will work for that.

I see. Thank you!

I assume this will work but is kind of unsatisfactory. It will require a working C compiler for every target, quite complicated build steps for cross compiling and the struct must be translated to C so that its size matches exactly with the Rust counterpart.

This might be becoming off-topic, but if this is possible in C and LLVM implements a C frontend, why can't we do the same directly from Rust, for example by using a special attribute?

@workingjubilee
Copy link
Member

Now, and here comes the interesting part, when the embedded system is reset (without power off, for example by its watchdog) the processor restarts execution at the entry point with its registers cleared. However, memory (RAM) is not cleared. This is absolutely normal behavior for memory.

But isn't a lot of memory quite volatile? DRAM requires continuous refreshing. You are talking about some kind of nonvolatile memory, or else SRAM that has the minimal power flowing it to retain full data remanence. Rust's semantics can't assume the presence of nonvolatile memory, which is part of why this seems so complicated to express despite being "simple".

I assume this will work but is kind of unsatisfactory. It will require a working C compiler for every target, quite complicated build steps for cross compiling and the struct must be translated to C so that its size matches exactly with the Rust counterpart.

Hmm... you could describe the section in pure asm! (I guess global_asm!?) and it would work, I think? Such is a declaration "outside of Rust" for the "abstract machine" parts of this conversation, though I am not sure how the linker resolves it, there.

If that doesn't work, probably we should support

extern {
    #[link_section = ".used_section"]
    static BUF: Buffer;
}

or have another way to bless this pattern, somehow.

@bjorn3
Copy link
Member

bjorn3 commented Mar 11, 2023

But isn't a lot of memory quite volatile? DRAM requires continuous refreshing.

If you reboot rather than shutdown and start up again later, the memory controller likely isn't offline long enough (if it is offline at all) for the DRAM to lose it's contents, right?

@RalfJung
Copy link
Member

RalfJung commented Mar 12, 2023

Okay I think we first and foremost have a Rust Abstract Machine / @rust-lang/opsem question here, not a Miri question:

If a mutable (or interior mutable) static is initialized in Rust to a certain value, then can the Rust AM assume that it does have that value when the program starts? Or is it okay to have some "before main setup" actually change the value of that static?

I would say for an immutable static, this is clearly UB -- using linker script tricks like what has been described here is just not allowed for those statics. But for statics that can be mutated, we already can't in general optimize assuming their values did not change, so... it seems reasonable to allow this?

With this way of thinking about the question, obviously Miri has no way of knowing that you are modifying the value of the static before Rust code starts running. On the Rust side you are saying this static is filled with uninit memory, but then you are making some outside-of-Rust assumptions to justify that actually when the program starts the static has a different value, in particular that the bytes now are initialized. So I don't think there is anything actionable on the Miri side here, and the discussion (with the clarified question) should move to https://github.com/rust-lang/unsafe-code-guidelines/ .

@RalfJung
Copy link
Member

Moved to rust-lang/unsafe-code-guidelines#397.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants