-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What about: seqlocks, load-release/store-acquire? #323
Comments
As further justification for my claims that load-release and store-acquire are (a) operations worth providing and (b) implementable the way I said: I did a bit of research into architecture-level memory ordering guarantees. In the past architecture manuals tended to talk more explicitly about instruction reordering, whereas these days they tend to be worded more in terms of abstract guarantees, similar to the C11 memory model. Even so, both the ARM and RISC-V specifications do still provide considerably stronger guarantees than the C11 model: they both require a single global ordering of all accesses in terms of, as I phrased it in my last post, "when they actually occur" (a more accurate phrase would be "when they complete"). Then there are various ways to ensure one access comes before another in that ordering, one of which is to put a suitable fence/barrier instruction between them.
I haven't looked at x86 but I'm pretty sure its memory model is strictly stronger. Thus, going from a C11's total modification order to a 'total access order' would still leave us making a conservative subset of the guarantees that most architectures provide. If some other architecture does not naturally provide the needed guarantees, at worst, load-release and store-acquire could be implemented as read-dont-modify-write, e.g. as ... Of course, adding new memory orderings would not be easy; at minimum it would require significant work on LLVM's side. But it would be nice to do. In the meantime, I contend that inline assembly – or FFI to out-of-line assembly – is in fact a sound way to take advantage of the stronger architectural guarantees. |
I'm not 100% sure, but I think you might be falling into the same trap that the author talks about in section 7:
Can you tell whether the proposed counterexample also applies to your version? |
Can't hardware memory models be more strict as you can rely on data dependencies, while software optimizations can easily break data dependencies? |
Sigh… I did fall into that trap, sort of. It's not that my proposed load-release is unsound or not equivalent to read-dont-modify-write. But it implies a stronger barrier than is needed for seqlocks. A release operation – C++11's release stores and release fences, as well as release loads under my proposed definition – waits for both prior loads and prior stores to complete. Formally speaking, if you have In RISC-V terms, a store-release is But a seqlock reader performs only loads, so it only needs In Linux terms, this is roughly the difference between I'm not sure how much performance difference the stronger fence would make in practice. It's still just a fence; the current CPU core doesn't need to take exclusive ownership of the cache line containing the atomic object or anything like that. (On x86, you need a However, if we're talking about adding new types of atomic accesses, it's definitely questionable to have it be stronger than needed for the motivating use case. Hmm. * English translation: one thread does |
Data dependencies are another case where the Linux kernel tries to use cheaper and subtler synchronization primitives than what C++11 atomics expose, but they're not relevant to seqlocks as far as I know. |
Worth nothing that LLVM already has optimizations in place for pub fn load_release(x: &AtomicUsize) -> usize {
x.fetch_add(0, Ordering::Release)
}
// example::load_release:
// mfence
// mov rax, qword ptr [rdi]
// ret |
C++'s P1478 proposes to add We should just add the same functions to Rust. LLVM will need to support them for C++ anyway, and they fit nicely in the current memory model. |
I agree, those operations make a lot of sense. |
I wrote an RFC for this today: rust-lang/rfcs#3301 |
Continuing a discussion in #321 that starts around here and ends here.
I agree it's equivalent to read-dont-modify-write. But phrasing it the way I did provides insight about how it can be efficiently implemented. I claim that a load-release can be implemented on most architectures as a fence instruction followed by a load instruction, and store-acquire can be implemented as a store instruction followed by a fence instruction. Notably:
This matches how Linux actually implements seqlocks: "load-release", "store-acquire".I could have made a mistake, but this is intended to match seqlocks' implementation, and to not require exclusive cache line access on typical CPUs – unlike a naive implementation of read-dont-modify-write in terms of compare_exchange or fetch_add, which would. The idea is that on those typical CPUs, memory accesses actually are implemented in a way that provides full sequential consistency, in terms of when the accesses actually occur. It's just that the order of "when the accesses actually occur" is different from program order thanks to reordering. (But this is just a motivation, not an attempt to prove correctness. The proof of correctness would come from the memory models in each architecture manual.)
The text was updated successfully, but these errors were encountered: