RFC: add "cmpxchg weak" to LLVM IR

Hi all,

I've decided the next step in atomics is the weak compare-and-exchange
operation. As with the failure order, I'm going t outline the
direction I'd like to take:

1. All cmpxchg instructions now return { iN, i1 } where the first
value is what we got before (the loaded result), the second == 1 if an
exchange took place.
1. "weak" is an optional modifier to the cmpxchg instructions. If
anyone wants a bikeshed to paint, this would be a good one. I wasn't
sure myself.

Theoretically, we could only modify the return type of the weak
variant, but I think that making it a global change helps keep the IR
uniform. Additionally, this extra bit actually exists on most
platforms even for the strong cmpxchg: on LL/SC ones by virtue of
control flow, and x86 (for example) sets ZF based on it.

Assuming that's OK, onto the DAG level. What I'd like to do is a
little odd, but makes sense I think:

1. Keep ATOMIC_CMP_SWAP with its current semantics: strong cmpxchg,
returning just the loaded value.
2. Add ATOMIC_CMP_SWAP_WITH_SUCCESS (better name suggestions welcome)
that has a second i1 value (initially) as with the IR instruction.
It's still a strong cmpxchg.
3. Expanding ATOMIC_CMP_SWAP_WITH_SUCCESS will yield an
ATOMIC_CMP_SWAP and a SETCC.

The advantage of this scheme is that existing targets can remain
mostly unchanged if they're not interested in the extra work required.

LL/SC architectures will be assumed to move over to something like the
IR level pass already used by ARM if they want weak support (or they
can add weak DAG nodes if they're masochists). x86 can switch its
Custom handling over to the new ATOMIC_CMP_SWAP_WITH_SUCCESS to get
rid of the vast majority of redundant comparisons coming out of C++
code with relative ease.

I've attached two LLVM patches here, in case anyone wants something to
actually play with. They're not quite ready yet, and Clang doesn't
produce the right code obviously, but they give an idea of where I'd
like to go and what the IR might look like:

1. The first patch adds "weak" in a fairly minimal set of places.
2. The second makes the x86 switch, which will get rid of the
redundant comparisons.
3. A hypothetical 3rd patch (very simple) would actually make use of
the "weak" information in the atomic expansion pass.

What do people think of the idea? (I'm sending to llvmdev too; out of
tree front-ends would need to be updated, and probably at least
backend tests for cmpxchg).

Cheers.

Tim.

0001-IR-implement-read-write-of-cmpxchg-weak.patch (101 KB)

0002-Fix-X86-cmpxchg-codegen.patch (5.67 KB)

Having to do extractvalue to get stuff out of these kinds of aggregates
is... somewhat horrible... have you thought of any alternatives?

1. All cmpxchg instructions now return { iN, i1 } where the first
value is what we got before (the loaded result), the second == 1 if an
exchange took place.

Having to do extractvalue to get stuff out of these kinds of aggregates
is... somewhat horrible... have you thought of any alternatives?

Well, sort of. There was that LL/SC thing from the other week, which
could be made the canonical form of a weak cmpxchg (perhaps with an
IRBuilder helper or something). But David's concerns about the
"cmpxchg weak" -> LL/SC transformation being much easier than the
reverse seem quite valid. I'm not sure even I want that any more.

But within a single instruction, we've got two values to produce. If
they're both part of the return then this is about the only option
(that I'm aware of, anyway). If one's part of an input then it'd have
to be a pointer like in the C++ interface; that would be horrific for
CodeGen, it's just not what actually happens in any architecture I'm
aware of.

Did you have anything in mind?

Cheers.

Tim.

Having thought about this a bit, I don't see anything better.

If we lived in a world w/o the select instruction, there might be a better
design for this (and for the overflow observing arithmetic intrinsics).

It is a bit sad, especially because I can't see a good reason to make the
weak variant an intrinsic and force this through the call path...

going back to the meat of the proposal...

Ok, I've mulled over all of the IR-level memory model changes. While I
didn't *really* doubt that this was the right way to go, I'm increasingly
happy with it.

I specifically increasingly happy with the { iN, i1 } result. I do wish we
had a better way in the IR to model this entire concept (which we see in
the overflow arithmetic intrinsics as well) but we don't. With the current
IR, this is clearly the best design (and I feel bad that I argued against
it in the original design discussion years ago... ah, hindsight). If we
ever figure out a better way to model this kind of operation, then we can
reconsider all of them in light of that, but I'm not sure when or if that
day will even arrive.

OK, DAG...

Hi Chandler,

So, I see where you're going here, but I'm curious -- why not just switch
ATOMIC_CMP_SWAP to have a second i1 value, and still be strong? Is this
*just* to support expanding? I wonder if that's really useful rather than
just lowering it directly on the various targets....

I tried that originally, but quickly got into murky waters with all
the targets I've got basically no clue about.

It still could have been the best option (I started poking at Mips),
but for one particularly nasty facet: you can select ATOMIC_CMP_SWAP
in TableGen, but not ATOMIC_CMP_SWAP_WITH_SUCCESS (multiple results
and all that).

That means even getting selection back to the status quo would have
been a lot of duplicated C++ code in multiple targets, all for what's
basically a dead-end in most cases. It's far easier to make use of the
i1 before the DAG level if possible.

Anyway, that was my reasoning. Possibly more pragmatism than principle.

Cheers.

Tim.

No, it makes a certain amount of sense.

I'd personally much rather we had *just* the multiple result variant in the
DAG, but I can see it being really annoying to get there directly. Perhaps
eventually targets will have adapted far enough that we can kill off the
expanded form, or invert them.

Certainly, I agree with only exposing the strong variant once you hit the
DAG.

So, I say carry on unless others on the list see serious problems here.
-Chandler

So, I say carry on unless others on the list see serious problems here.

Will do; thanks very much for giving this some thought.

Tim.