Question about register spilling, rematerialization, and racy accesses

Hi all!

I was trying to reproduce some examples from the paper [1] (see Example 1 on page 2) and the paper [2] (example in section 2.2 on page 2). These examples try to justify races on non-atomic variables being UB.

In essence, both examples utilize potential ability of the compiler to remateralize load from shared variable. In the presence of the racy write, remateralized load can observe different value thus leading to unexpected program outcome.

I was trying to reproduce this behavior with the following example:

Simplifying it a bit, it looks as follows.

int a;

void* reader_thread(void* p) {
    int t = a;
    for (int i = 0; i < DUMP_N; ++i) {
        // some code to induce pressure on register allocator 
        ...
        // can the compiler remateralize this load and read from a again?
        dump[i] = t;
    }
    // print dump 
    ...
    return NULL;
}

void* writer_thread(void* p)
{
    a = 42;
    return NULL;
}

Based on the experiments with the compiler, I was able to reproduce one of the two scenarios.

(1) If there is not a lot of pressure on register allocator, then
the compiler loads from a to register and then promotes it on every use of t,
thus the stack slot for t is not allocated.

(2) If there is a lot of pressure on register allocator, then
stack slot for t is allocated, the load from a stores to this stack slot
and all subsequent uses of t load from this stack slot.

Both of these scenarios do not lead to interference with the racy write.
I.e. the reader thread either reads initial value 0, or the value 42 written by writer_thread.

My question is as follows: is the LLVM compiler even capable to remateralize loads from shared memory? On what conditions in general the compiler can spill and remateralize register without allocation of a stack slot?
Do you think the compiler should perform such an optimization (given how it can interfere with race writes, as the examples mentioned above demonstrate).

And, somewhat more general question, what other optimizations (besides register remateralization) you think can be problematic in the presence of races (and can actually be reproduced with the current compiler implementation)?

References:

[1] Dolan, S., Sivaramakrishnan, K. C., & Madhavapeddy, A. (2018). Bounding data races in space and time. ACM SIGPLAN Notices, 53(4), 242-255.

[2] Boehm, H. J. (2011, May). How to Miscompile Programs with Benign Data Races. In HotPar (pp. 3-3).

Only trivially rematerializable instructions are rematerialized. See the function isTriviallyReMaterializable for what exactly this means (llvm-project/TargetInstrInfo.h at 079c488c66056f7bf22c30ee280489f6c43b7658 · llvm/llvm-project · GitHub), but instructions that access memory are never trivially rematerializable.

There’s two things here, what should be permitted in LLVM IR, and what IR is generated a given language (C in this case). The LLVM IR semantics permit this kind of optimization for non-volatile non-atomic loads (as far as I’m aware), so we should in principle do this kind of optimisation. With regards to C/C++ the standard allows this kind of thing, so it makes sense to lower to ordinary LLVM loads. But adding a command-line option so that memory is by default more restrictive (e.g. making everything atomic with ordering monotonic or maybe unordered), which would then prevent this kind of thing, would possibly be a way of restricting compiler behaviour in this way.

Hi @john-brawn-arm , thank you for the response!

But adding a command-line option so that memory is by default more restrictive (e.g. making everything atomic with ordering monotonic or maybe unordered), which would then prevent this kind of thing, would possibly be a way of restricting compiler behaviour in this way.

Yeah, I suppose the purpose of Unordered is exactly to forbid such kind of optimizations (according to the docs rematerialization is among the list of prohibited opts).

However, what I noticed is that existing compilers from “safe” languages do not use Unordered at all, and just emit plain NonAtomic accesses for non-atomics in the source language. I’ve already asked similar question on the forum. I’ve checked several compilers that use LLVM, including Java/GraalVM, Kotlin/Native, GHC, Swift (see the links in my original question), and all of them seem to use NonAtomic accesses.

It is clearly goes against the LLVM spec, since LLVM doc recommends to use Unordered for this scenario (compilation from “safe” languages that cannot treat races on non-atomics as UB).
However, because the current implementation of the compiler is over-conservative with respect to rematerialization and other potentially problematic opts, the decision to use NonAtomic ordering does not really leads to bugs in practice.

What do you think regarding this problem? Is the use of NonAtomic in the aforementioned scenario can be classified as bug?

I’m not following this thread in detail, but this part caught my eye.

For at least Java/GraalVM, if they’re not using unordered to model normal java memory access, that is wrong. And observable in practice.

I can’t speak to the other languages memory models, but for Java, I am quite sure you need unordered to avoid observable memory model violations. Quite sure. :slight_smile: