[RFC] Injecting new element atomic memory intrinsics into MemIntrinsic class hierarchy

Hi all,
We somewhat recently created/updated element-wise unordered-atomic versions of the memcpy, memmove, and memset memory intrinsics:
Memcpy: https://reviews.llvm.org/rL305558
Memmove: https://reviews.llvm.org/rL307796
Memset: https://reviews.llvm.org/rL307854

These intrinsics are semantically similar to the regular versions. The main difference is that the underlying operation is performed entirely with unordered atomic loads & stores of a given size.

Our ultimate purpose is to enable exploitation of the memory intrinsic idioms and optimizations in languages like Java that make extensive use of unordered-atomic loads & stores. To this end, we will eventually have to add these new intrinsics to the instruction class hierarchy, and teach passes how to deal with them; talking about how to do this is the purpose of this RFC.

We have started adding canary tests to some passes, and will continue to do so in preparation for adding the element atomic intrinsics to the instruction class hierarchy. These canary tests are simply placeholders that show that the pass in question currently does nothing with the new element atomic memory instrinics, and could/should generally start failing once the unordered-atomic memory intrinsics are added to the instruction class hierarchy — telling us which passes need to be updated. For example: https://reviews.llvm.org/rL308247

For adding the new intrinsics to the instruction class hierarchy, the plan will be to add them one at a time — add the intrinsic to the hierarchy, and flush out all problems before adding the next. We’re thinking that it would be best to start with memset, then memcpy, then memmove; memset is kind of it’s own thing, and there are some passes that change memcpy/memmove into a memset so it would be beneficial to have that in place before memcpy/memmove, and there are also passes that change memmove into memcpy but I haven’t found any that go the other way (I can’t imagine why you’d want to) so it would be good to have memcpy squared away before doing memmove.

There are a few options that we have thought of for adding the new memory intrinsics to the instruction class hierarchy. We have a preference for the first option, but would appreciate thoughts/opinions/discussion on the best way to insert the element atomic intrinsics into the hierarchy. For background, the relevant portion of the current hierarchy looks like:

MemIntrinsic

  • MemSetInst
  • MemTransferInst
    ** MemCpyInst
    ** MemMoveInst

Option 1)

I’ll comment that Daniel and I have discussed this pretty extensively offline. I am now convinced that option 1 is our best choice out of a set of not entirely great choices. If no one has any comments, that’s the approach we’ll end up pursuing, but we wanted to give the community a chance to chime in here. Maybe someone can come up with a better API design than we’ve managed. :slight_smile:

Philip

Hi Daniel,

Cons:
One more attribute to check when implementing a pass that handles a memory
intrinsic; could lead to some unexpected bugs where we change an
unordered-atomic intrinsic into a regular intrinsic.

The con here is pretty concerning. We risk making it easy for a developer
to forget about the element-atomic variants when introducing an optimization
that converts a memmove/memcpy/memset. Such an optimization could,
conceivably, result in an unordered-atomic intrinsic being erroneously
converted to drop its atomicity; such a bug would only exhibit as a race
condition in the resulting program, and could be an absolute beast to debug.
The only way to prevent such bugs would be very careful code reviews.

The con is compounded by the fact that since clang will not (?)
generate atomic memX instructions, it will be possible to validate a
change to LLVM across an arbitrarily large corpus of C and C++
programs and not notice this sort of bug. In theory this isn't a
problem (since code reviews can catch bugs like this), but IMO there
is non-trivial value in avoiding a situation that allows bug in the
Java frontends that don't manifest in clang.

Option 2)
-------------
Add separate classes to the hierarchy for each intrinsic. There are a few
different ways that this could be organized (note: UA* = unordered-atomic
version of intrinsic, to save typing):

a) Completely disjoint hierarchy
UAMemIntrinsic
* UAMemSetInst
* UAMemTransferInst
** UAMemCpyInst
** UAMemMoveInst
MemIntrinsic
* MemSetInst
* MemTransferInst
** MemCpyInst
** MemMoveInst

e) Multiple inheritance

This is a variant on options 2b to 2d. To make it possible to easily use
isa<> to query for whether or not a memory intrinsic is unordered-atomic or
not we could add an interface-like class that all unordered-atomic memory
intrinsic classes would multiply inherit from. I’m not sure if there is
precedence for this sort of thing in the instruction class hierarchy,
though; I haven’t reviewed the entire hierarchy, but all of the parts that I
have looked at appear to be single inheritance.

There is precedent for something like this: OverflowingBinaryOperator
will return true for both isa<Instruction> and isa<ConstantExpr>. I
also think this combined with (a) is probably the cleanest path
forward:

AtomicMemIntrinsic
* AtomicMemSetInst
* AtomicMemTransferInst
** AtomicMemCpyInst
** AtomicMemMoveInst
PlainMemIntrinsic
* PlainMemSetInst
* PlainMemTransferInst
** PlainMemCpyInst
** PlainMemMoveInst

MemIntrinsic : PlainMemIntrinsic, AtomicMemIntrinsic
MemSetInst : PlainMemSetInst, AtomicMemSetInst
MemTransferInst : PlainMemTransferInst, AtomicMemTransferInst
MemCpyInst : PlainMemCpyInst, AtomicMemCpyInst
MemMoveInst : PlainMemMoveInst, AtomicMemMoveInst

with the existing uses of MemIntrinsic changed to PlainMemIntrinsic etc.

What do you think?

-- Sanjoy

Hi Sanjoy,
Response/thoughts below…

Hi Daniel,

Cons:
One more attribute to check when implementing a pass that handles a memory
intrinsic; could lead to some unexpected bugs where we change an
unordered-atomic intrinsic into a regular intrinsic.

The con here is pretty concerning. We risk making it easy for a developer
to forget about the element-atomic variants when introducing an optimization
that converts a memmove/memcpy/memset. Such an optimization could,
conceivably, result in an unordered-atomic intrinsic being erroneously
converted to drop its atomicity; such a bug would only exhibit as a race
condition in the resulting program, and could be an absolute beast to debug.
The only way to prevent such bugs would be very careful code reviews.

The con is compounded by the fact that since clang will not (?)
generate atomic memX instructions, it will be possible to validate a
change to LLVM across an arbitrarily large corpus of C and C++
programs and not notice this sort of bug. In theory this isn’t a
problem (since code reviews can catch bugs like this), but IMO there
is non-trivial value in avoiding a situation that allows bug in the
Java frontends that don’t manifest in clang.

Option 2)

Add separate classes to the hierarchy for each intrinsic. There are a few
different ways that this could be organized (note: UA* = unordered-atomic
version of intrinsic, to save typing):

a) Completely disjoint hierarchy
UAMemIntrinsic

  • UAMemSetInst
  • UAMemTransferInst
    ** UAMemCpyInst
    ** UAMemMoveInst
    MemIntrinsic
  • MemSetInst
  • MemTransferInst
    ** MemCpyInst
    ** MemMoveInst

e) Multiple inheritance

This is a variant on options 2b to 2d. To make it possible to easily use
isa<> to query for whether or not a memory intrinsic is unordered-atomic or
not we could add an interface-like class that all unordered-atomic memory
intrinsic classes would multiply inherit from. I’m not sure if there is
precedence for this sort of thing in the instruction class hierarchy,
though; I haven’t reviewed the entire hierarchy, but all of the parts that I
have looked at appear to be single inheritance.

There is precedent for something like this: OverflowingBinaryOperator
will return true for both isa and isa. I
also think this combined with (a) is probably the cleanest path
forward:

AtomicMemIntrinsic

  • AtomicMemSetInst
  • AtomicMemTransferInst
    ** AtomicMemCpyInst
    ** AtomicMemMoveInst
    PlainMemIntrinsic
  • PlainMemSetInst
  • PlainMemTransferInst
    ** PlainMemCpyInst
    ** PlainMemMoveInst

MemIntrinsic : PlainMemIntrinsic, AtomicMemIntrinsic
MemSetInst : PlainMemSetInst, AtomicMemSetInst
MemTransferInst : PlainMemTransferInst, AtomicMemTransferInst
MemCpyInst : PlainMemCpyInst, AtomicMemCpyInst
MemMoveInst : PlainMemMoveInst, AtomicMemMoveInst

with the existing uses of MemIntrinsic changed to PlainMemIntrinsic etc.

What do you think?

It’s an interesting idea. I kind of like it — it’s an interesting mix of options 1, 2a, & 2e. An advantage over option 1 is that this also exposes an easy way for a pass to not even have to think about the atomic memory intrinsics — just key off of the PlainMemIntrinsic classes instead of the MemIntrinsic classes.

We still have the same con as option 1, though I could virtually eliminate that con by starting the work by submitting an NFC that renames the current MemIntrinsic & derived class to PlainMemIntrinsic, etc. That would make the path forward something like:

  1. NFC to rename MemIntrinsic classes to PlainMemIntrinsic
  2. Patch to add AtomicMemIntrinsic classes & MemIntrinsic classes derived from Plain* & Atomic*
  3. Individual patches to update passes to handle Atomic + Plain as desired.

The problem that I see with that, though, is any third party stuff that uses the MemIntrinsic classes as they are today will break between steps (1) and (2), and then they will start doing the wrong things with the unordered-atomic memory intrinsics if they’re presented with one — though, that last bit is no different from option 1.

Thinking out loud… If we go the multiple inheritance route, then what about something like:
MemIntrinsic

  • MemSetInst
  • MemTransferInst
    ** MemCpyInst
    ** MemMoveInst
    AtomicMemIntrinsic
  • AtomicMemSetInst
  • AtomicMemTransferInst
    ** AtomicMemCpyInst
    ** AtomicMemMoveInst

AnyMemIntrinsic : MemIntrinsic, AtomicMemIntrinsic
AnyMemSetInst : MemSetInst, AtomicMemSetInst
AnyMemTransferInst : MemTranferInst, AtomicMemTransferInst
… etc.

That would leave the meanings of the current MemIntrinsic classes completely unchanged; which should be good for 3rd parties. Updating a pass to work with unordered-atomic memory intrinsics would just mean having it use the AnyMem* classes instead of the Mem* classes, and then adding some isa<> checks to do the right thing with the atomic variants. We don’t get all of the memory intrinsic optimizing passes for free on the atomic variants, but we get the ones we want at a reasonably low cost, and there’s no risk to the behaviour of the existing non-atomic memory intrinsics…

-Daniel

Hi Daniel,

It’s an interesting idea. I kind of like it — it’s an interesting mix of
options 1, 2a, & 2e. An advantage over option 1 is that this also exposes an
easy way for a pass to not even have to think about the atomic memory
intrinsics — just key off of the PlainMemIntrinsic classes instead of the
MemIntrinsic classes.

We still have the same con as option 1, though I could virtually eliminate
that con by starting the work by submitting an NFC that renames the current
MemIntrinsic & derived class to PlainMemIntrinsic, etc. That would make the
path forward something like:
1) NFC to rename MemIntrinsic classes to PlainMemIntrinsic
2) Patch to add AtomicMemIntrinsic classes & MemIntrinsic classes derived
from Plain* & Atomic*
3) Individual patches to update passes to handle Atomic + Plain as desired.

The problem that I see with that, though, is any third party stuff that
uses the MemIntrinsic classes as they are today will break between steps (1)
and (2), and then they will start doing the wrong things with the
unordered-atomic memory intrinsics if they’re presented with one — though,
that last bit is no different from option 1.

My current understanding is that we don't have any actual here beyond
publicizing the changes on llvm-dev and not changing the world
overnight. I think as long as you make change (1), send a heads-up
email to llvm-dev and make sure there is a gap of a day or two between
(1) and (2), we're fine. LLVM does not have a stable C++ API.

Having said that:

Thinking out loud… If we go the multiple inheritance route, then what about
something like:
MemIntrinsic
* MemSetInst
* MemTransferInst
** MemCpyInst
** MemMoveInst
AtomicMemIntrinsic
* AtomicMemSetInst
* AtomicMemTransferInst
** AtomicMemCpyInst
** AtomicMemMoveInst

AnyMemIntrinsic : MemIntrinsic, AtomicMemIntrinsic
AnyMemSetInst : MemSetInst, AtomicMemSetInst
AnyMemTransferInst : MemTranferInst, AtomicMemTransferInst
… etc.

I like the Any* prefix. But how about:

AnyMemIntrinsic : PlainMemIntrinsic, AtomicMemIntrinsic
AnyMemSetInst : PlainMemSetInst, AtomicMemSetInst

etc. ? This also will prevent silent failures in downstream projects.

-- Sanjoy

Hi Sanjoy,
Sounds like a good plan to me. Unless someone wants to chime in with a better idea then I’ll try this approach, and see how it goes.

-Daniel

In case anyone would like to chime in. Review of this is ongoing in: https://reviews.llvm.org/D38419