LLVM introduces racy read - Unsafe transformation?

Hi,

I am looking for thoughts on the following LLVM transformation.

Consider the following transformation which replaces conditional load(a)
with load(a);select instructions.

Source

loadrace.zip (27.7 KB)

It's not clear to me that this is correct. Neither variable is atomic and so loads do not establish happens-before edges. The load of a is not observable and so is safe to hoist. According to the C[++]11 model the transformed version appears correct. There is no guarantee of serialisation of the loads of flag and a, because neither is atomic.

It's not actually clear to me that the original is race-free. If flag ever transitions from false to true without something else that establishes a happens-before relationship between these two threads then this is racy. If flag is always false, then this is not racy and the LLVM output is not *observably* racy (the IR does not permit this load to have observable side effects, and its result is never used). If flag is always true then this is racy. If flag transitions from true to false without a happens-before edge, then this is also racy.

David

Hi,

I agree that the earlier example was already racy due to shared flag
variable. Sorry for the wrong example. Please ignore it.

I have modified the program as follows. The only shared variable is 'a'
and the following is a non-racy program.

int a;

int readA(bool flag) {
int r=0;
  if(flag) {
    r = a;
}
return r;
}

void writeA(){
  a = 42;
}

int main() {

bool flag = false;
thread first (writeA);
thread second (readA, flag);

first.join();
second.join();

return 0;
}

The generated LLVM IR

; Function Attrs: nounwind readonly uwtable
define i32 @_Z5readAb(i1 zeroext %flag) #3 {
entry:
  %0 = load i32* @a, align 4
  %. = select i1 %flag, i32 %0, i32 0
  ret i32 %.
}

; Function Attrs: nounwind uwtable
define void @_Z6writeAv() #4 {
entry:
  store i32 42, i32* @a, align 4
  ret void
}

:

In the generated IR load(a) is independent of flag value which is not the
case in the source program. Hence there is an introduced race between
load(a) and store(a) operations when readA() and writeA() runs
concurrently.

Regards,
soham

I believe that this is still not observably racy. The load is not volatile which, in LLVM IR means that it is not allowed to have observable side effects. The address of the load is a constant, so a side-effect-free load whose result is not used does not appear to be racy within the semantics of LLVM IR - it's just an expensive way of doing a nop. It should materialise an undef value (if someone were to write a formal model of LLVM IR with full concurrency semantics), but a select where the unused value is undef is still well defined in LLVM IR.

There was a related discussion about a year ago regarding this exact transformation because, although it is not necessarily invalid, it is not necessarily a good idea. Reading the value of a will generate cache coherency traffic and the extra cache-line ping-pong may be slower than a conditional branch.

David

FWIW, the C and C++ memory models were design specifically to allow load
speculation, and the LLVM memory model was based heavily on them and
specifically allows load speculation.

As with any speculation, a number of invariants must hold that prove it to
be safe, and there are plenty of circumstances where it isn't profitable,
but the memory model is design to allow it.

Note that race detection tools typically turn these parts of the optimizer
off. You can see this in several places where we disable load speculation
or load widening when running under TSan specifically so that it can
enforce a more conservative model.

Unlike C++, LLVM IR has defined semantics in the case of data races. Your load may return undef (http://llvm.org/docs/Atomics.html#optimization-outside-atomic), but the program doesn’t have undefined behavior.