[atomics][AArch64] Possible bug in cmpxchg lowering

Currently the AtomicExpandPass will lower the following IR:

define i1 @foo(i32* %obj, i32 %old, i32 %new) {
entry:
%v0 = cmpxchg weak volatile i32* %obj, i32 %old, i32 %new release acquire
%v1 = extractvalue { i32, i1 } %v0, 1
ret i1 %v1
}

to the equivalent of the following on AArch64:

**ldxr w8, [x0]**
cmp w8, w1
b.ne .LBB0_3
// BB#1: // %cmpxchg.trystore
stlxr w8, w2, [x0]
cbz w8, .LBB0_4
// BB#2: // %cmpxchg.failure
mov w0, wzr
ret
.LBB0_3: // %cmpxchg.nostore
clrex
mov w0, wzr
ret
.LBB0_4:
orr w0, wzr, #0x1
ret

GCC instead generates a ldaxr for the initial load, which seems more correct to me since it is honoring the requested failure case acquire ordering. I’d like to get other opinions on this before filing a bug.

I believe the code in AtomicExpand::expandAtomicCmpXchg() is responsible for this discrepancy, since it only uses the failure case memory order for targets that use fences (i.e. when TLI->shouldInsertFencesForAtomic(CI) is true).

Hi Geoff,

GCC instead generates a ldaxr for the initial load, which seems more correct
to me since it is honoring the requested failure case acquire ordering. I'd
like to get other opinions on this before filing a bug.

It's all a bit uncertain at the moment, as the C++ standard is at the
very least ambiguous (as is the LangRef). I think historically I
decided to interpret "no stronger than" to forbid a failure ordering
from having acquire semantics if the success ordering didn't (as in,
you can only drop requirements). I don't know if I ever convinced
anyone else of that (or even mentioned it to them) though.

But it looks like the constraints might well be relaxed in C++
(P0418r2: Fail or succeed: there is no atomic lattice),
and I know at least some Swift people are keen on what you're
describing being valid.

So changing LLVM to do the same as GCC does seem like a decent idea to
me. But remember to update the LangRef to make it clear.

Cheers.

Tim.

From LangRef: "

Oh, and if you do this you may want to check what Clang does when the
order isn't a compile-time constant. Schematically it does a couple of
switches and a nested set of cmpxchgs with all possible ordering
constraints.

Those switches may or may not include the cases you're interested in.

Cheers.

Tim.

I've done some more digging and it looks like:

- the code clang emits for the non-constant memory ordering parameters seems to agree (on aarch64) with the constant case in the release/acquire case, in that it ends up lowering to a cmpxchg release monotonic. On targets that use fences though the code generated could be different for the non-constant vs constant case.

- the reason cmpxchg release acquire doesn't fail validation is because AtomicOrdering.h defines isStrongerThan(Acquire, Release) to be false.

I don't have an immediate need to get this changed, so I think I'll just wait to see if there is any other input and file a bug.