Atomic Ordering: Non-atomic vs Unordered

Hi everyone!

My question is related to memory ordering of atomic operations in LLVM IR, in particular, to a difference between NonAtomic and Unordered.

So, according to the docs, NonAtomic is designed to match semantics of C/C++ non-atomic accesses — that is any race on these accesses is treated as UB. Unordered is designed to match semantics of Java plain accesses (and plain accesses in other “safe” languages) so that racy accesses should have “somewhat defined” semantics.

I have several questions with respect to this.

  1. I’ve tried to check source code of several compilers of “safe” languages that emit LLVM IR, and it looks like none of them actually uses Unordered (instead they compile loads/stores to regular NonAtomic load/stores). Here are links to source code for GraalVM [1], GHC [1], Swift [1].
    So my questions are: should these compilers actually use Unordered memory order? Should usage of NonAtomic in the examples above be considered a bug? Should it be reported/discussed with the developers of said compilers? Is there any compiler in the wild that actually uses Unordered?

  2. What exactly are semantics guarantees of Unordered compared to NonAtomic. It is said that racy unordered accesses should have defined semantics, but how it could look like? Am I right that a high-level guarantee that is desirable here is that Unordered should guarantee “type-system soundness” for “safe” languages? In a sense that an Unordered load should read value actually belonging to the type of the variable (type in the type system of the compiled language).

  3. How optimizer treats Unordered compared to NonAtomic? What optimizations are applicable to NonAtomic accesses but not to Unordered? LLVM documentation mentions some of these, for example “load rematerialization”. Are there other interesting examples?

I am currently doing some research on LLVM memory model, so the answers to the questions above would be very helpful. In particular I want to understand what exactly the LLVM specification guarantees about “Unordered” and what optimizations on them it performs. I want to try to validate whether these optimizations actually preserve the guarantees provided by the spec.

The fundamental requirement for Unordered is that it should be impossible to observe any “out of thin air” values. An Unordered atomic load should always produce some previously-stored value, and not, say, a mixture of half of one value, and half of another value.

As the “Unordered” section in the doc you reference states,
“this prohibits any transformation that transforms a single load into multiple loads, transforms a store into multiple stores, narrows a store, or stores a value which would not be stored otherwise”. All of those are valid on non-atomic loads and stores, but are not valid on Unordered atomics.