atomic ops are optimized with incorrect semantics .

Hi All,

With the “https://gcc.godbolt.org/z/yBYTrd” case .

the atomic is converted to non atomic ops for x86 like

from
xchg dword ptr [100], eax
to
mov dword ptr [100], 1

the pass is responsible for this tranformation was instCombine i.e InstCombiner::visitAtomicRMWInst

which converts the IR like
%0 = atomicrmw xchg i32* inttoptr (i64 100 to i32*), i32 1 monotonic
to
store atomic i32 1, i32* inttoptr (i64 100 to i32*) monotonic, align 4

which is valid for relax(monotonic) and release ordering as per the code out there.

we think that,its the inst lowering issue, where the atomic store was lowered to non-atomic store here.

to work around we changed our case to strong ordering .

and we are debugging the case with x86 and the same goes with an arm too.

Any suggestions or thoughts on the transformation? , will be helpful to proceed further.

FYI, the problem persists with LLVM-9, not with the LLVM-8.

Thank you
~Umesh

Hi,

It looks as if there are at least two kinds of at least borderline undefined behaviour here, which makes it difficult to evaluate this test case.

1. The atomic operation is in an infinite loop.
2. The address is a cast from a provenance-free integer.

That said, I don't believe that this optimisation is taking advantage of either of those. The atomic compare and exchange is relaxed and so does not establish a happens before edge with anything (including itself) and there is no guarantee under the C++11 memory model that there is any ordering of the loads and stores between threads. Given that the load here is unused, a compare and exchange reduces to a store. If it were sequentially consistent, for example, then this transform would not be valid and, indeed, LLVM does not perform the transformation in this case.

David

Hi Umesh,

Hi All,

With the "https://gcc.godbolt.org/z/yBYTrd" case .

the atomic is converted to non atomic ops for x86 like

from
xchg dword ptr [100], eax
to
mov dword ptr [100], 1

the pass is responsible for this tranformation was instCombine i.e InstCombiner::visitAtomicRMWInst

which converts the IR like
%0 = atomicrmw xchg i32* inttoptr (i64 100 to i32*), i32 1 monotonic
to
store atomic i32 1, i32* inttoptr (i64 100 to i32*) monotonic, align 4

Can you explain why you think this code generation is incorrect? A
simple mov is an atomic store on x86 from what I understand - the
godbolt example seems perfectly correct to me, what am I missing?

Cheers,
Nicolai

Thank you All and Nicolai,

This is what we get with ARM like

(CLANG 9.0.1.1)
1834 0000000080001ccc :
1835 80001ccc:¦ 52800028 ¦ mov¦w8, #0x1 ¦// #1
1836 80001cd0:¦ b9000008 ¦ str¦w8, [x0]
1837 80001cd4:¦ 17ffffff ¦ b¦ 80001cd0 <bm+0x4>

(CLANG 8.0.0.2)
1833 0000000080001c90 :
1834 80001c90:> 320003e8 > orr>w8, wzr, #0x1
1835 80001c94:> 885f7c1f > ldxr> wzr, [x0]
1836 80001c98:> 881f7c08 > stxr> wzr, w8, [x0]
1837 80001c9c:> 17fffffe > b> 80001c94 <bm+0x4>

may be in x86 ,you are right mov is atomic, but if we need to sync the operation with the previous insn’s then just having atomic doesn’t help right?

~Umesh

You need a sync to establish a happens-before edge. This is a relaxed-consistency store and so does not establish any happens-before relationship.

Atomic just means indivisible: any other thread is guaranteed that it will see not see any partial stores, it will see either the value from before this store or the value after. Because this does not establish any happens-before edges, the notion of 'before' is somewhat fuzzy here and there does not have to be any sequential ordering of threads that gives the same observations as their interleaved run.

David