[RFC] Add second "failure" AtomicOrdering to cmpxchg instruction

Hi all,

The C++11 (& C11) compare_exchange functions with explicit memory
order allow you to specify two sets of semantics, one for when the
exchange actually happens and one for when it fails. Unfortunately, at
the moment the LLVM IR "cmpxchg" instruction only has one ordering,
which means we get sub-optimal codegen.

This probably affects all architectures which use
load-linked/store-conditional instructions for atomic operations and
don't have versions with built-in acquire/release semantics (and so
need barriers). For example on ARMv7, an "acquire, relaxed" cmpxchg
currently generates:

        mov r9, #52
        mov r3, #42
LBB0_1: @ =>This Inner Loop Header: Depth=1
        ldrexb r1, [r0]
        cmp r1, r3
        bne LBB0_3
@ BB#2: @ in Loop: Header=BB0_1 Depth=1
        strexb r2, r9, [r0]
        cmp r2, #0
        bne LBB0_1
LBB0_3:
        dmb ish
        [...]

That second barrier could be skipped by the first "bne" if we knew
that the user didn't need any ordering in case of failure.

The other atomic operations are not affected, so I'd like to add a
second ordering operand to "cmpxchg" to support the full range of
possible requests. The suggested syntax would be

    cmpxchg [volatile] <ty>* <pointer>, <ty> <cmp>, <ty> <new>
[singlethread] <success ordering> <failure ordering>

That is, syntactically I propose:
  + No comma between the two
  + Both operands are required, even if "failure" is the natural pair
of "success"

Semantically, I would like to enforce the constraints in the standards:
  + failure <= success
  + failure != Release
  + failure != AcquireRelease.

Before the DAG, I suggest completely removing getOrdering from
CmpXchgInst, in favour of getSuccessOrdering and getFailureOrdering to
avoid errors. In the DAG, getOrdering would still exist and return the
success ordering, which I believe would make existing targets
conservatively correct. AtomicSDNode would add getFailureOrdering,
used only by ATOMIC_CMP_SWAP nodes.

Existing bitcode files would obviously be supported, and retain
current semantics by BitcodeReader dropping the "release" part of any
single ordering, and using the result as the "failure ordering" (as
currently documented in the LanguageRef).

I have attached initial patches for reference, but I'm not asking for
thorough review at this stage.

What do people think?

Cheers.

Tim.

clang-cmpxchg-failure.diff (3.79 KB)

llvm-cmpxchg-failure.diff (100 KB)

This sounds exactly like the right thing to do. FWIW PNaCl provisions for this at the moment.

I haven’t reviewed the patch, but your proposal sounds good.

Hi Tim,

The approach also sounds good to me. I had a look at the patch and the
huge size of it seems seems to be driven mostly by the replace of
order with success order and the addition of failure order (plus the
tests). Apart from a few whitespace changes in the beginning, it looks
good to me.

cheers,
--renato

Thanks for the responses. I've polished up the patch and put it up for
proper review in http://llvm-reviews.chandlerc.com/D3023 (CCed
llvm-commits, obviously).

Cheers.

Tim.