RFC: Extending atomic loads and stores to floating point and vector types

Currently, we limit atomic loads and stores to either pointer or integer types. I would like to propose that we extend this to allow both floating point and vector types which meet the other requirements. (i.e. power-of-two multiple of 8 bits, and aligned)

This will enable a couple of follow on changes:
1) Teaching the vectorizer how to vectorize unordered atomic loads and stores
2) Removing special casing around type canonicalization of loads in various passes
3) Removing complexity from language frontends which need to support atomic operations on floating point types.

My initial implementation plan will not require any changes from the backends. I plan to add a lowering step to the existing AtomicExpandPass which will convert atomic operations on floats and vectors to their equivalently sized integer counterparts. Over time, individual backends will be able to opt in - via a TTI hook - to have the new types of atomics handled by the normal isel machinery.

I've prototyped this approach with the x86 backend and get what looks like correct and even fairly efficient instruction selection taking place. I haven't studied it too extensively, so it might not work out in the end, but the approach appears generally feasible.

One open question I don't know the answer to: Are there any special semantics required from floating point stores which aren't met by simply bitcasting their result to i32 (float) or i64 (double) and storing the result? In particular, I'm unsure of the semantics around canonicalization here. Are there any? Same for loads?

Philip

Currently, we limit atomic loads and stores to either pointer or integer
types. I would like to propose that we extend this to allow both floating
point and vector types which meet the other requirements. (i.e.
power-of-two multiple of 8 bits, and aligned)

I support this.

This will enable a couple of follow on changes:

1) Teaching the vectorizer how to vectorize unordered atomic loads and
stores
2) Removing special casing around type canonicalization of loads in
various passes
3) Removing complexity from language frontends which need to support
atomic operations on floating point types.

This may become relevant for C++, see http://wg21.link/p0020r0

My initial implementation plan will not require any changes from the

backends. I plan to add a lowering step to the existing AtomicExpandPass
which will convert atomic operations on floats and vectors to their
equivalently sized integer counterparts. Over time, individual backends
will be able to opt in - via a TTI hook - to have the new types of atomics
handled by the normal isel machinery.

I've prototyped this approach with the x86 backend and get what looks like
correct and even fairly efficient instruction selection taking place. I
haven't studied it too extensively, so it might not work out in the end,
but the approach appears generally feasible.

Simpler path that D11382 <http://reviews.llvm.org/D11382>, generating the
same code?

One open question I don't know the answer to: Are there any special

semantics required from floating point stores which aren't met by simply
bitcasting their result to i32 (float) or i64 (double) and storing the
result? In particular, I'm unsure of the semantics around canonicalization
here. Are there any? Same for loads?

I'd go a bit further: should you also support basic FP operations
atomically? The above C++ paper adds add/sub, and we've discussed adding
FMA as well.

This raises similar issues around FP exceptions (are they impl-defined, UB,
unspecified? Do we care?).

Hi Philip,

I think this makes sense.

-Hal

Great. I’ll prepare a patch and send it for review. Actually, I think I was relying on exactly that change. :slight_smile: My initial implementation just converts “store atomic float 0.0, float* %p unordered, align 4” to %p.i = bitcast float* %p to i32* %v = bitcast float 0.0 to i32 store atomic i32 %v, i32* %p.i unordered, align 4 (i.e. it just does what the frontend previously did in the AtomicExpand.) The net result is that the backend still sees the bitcast idioms you added pattern matching for. Once that parts in, I’ll start changing the canonical form the backend expects, but given that’s backend specific, I wanted a migration mechanism which worked for all backends. p.s. There’s still lots of work to do to improve isel for atomic loads and stores on these types. I plan to eventually get there, but my first focus is on a) functional correctness, and b) getting the middle-end optimizers fixed up. Just to be clear, I assume you mean extending the atomicrmw, and cmpxchg instructions right? I agree this is worth doing, but I’m purposely separating that into a possible later proposal. I don’t need this right now and keeping the work scoped reasonably is key to making progress. In practice, we seem not to care. This is much broader than my proposal, and I have no opinions on what the right solutions here are. Philip

It seems a unfortunate, and potentially problematic if llvm doesn't
support, at least, "atomicrmw xchg" and "cmpxchg". If you support just
those two additionally to load/store, it'll cover everything the C frontend
actually needs to be able to do *now* with floating point atomics.

Atomic floating point math isn't actually a supported operation in C now,
so it seems reasonable to leave the rest of the atomicrmw ops for a future
design.

I’m really not sure what you’re trying to say here. Your two paragraphs seem to contradict each other? For the record, I actually don’t care about the C frontend at all. If my work helps the clang, great, but that’s not my goal. Others can step up to push the changes through clang if they want to see the benefit there. Philip

I apologize for being unclear.

I only meant that in C and C++, “atomic_load”, “atomic_store”, “atomic_exchange”, and "atomic_compare_exchange_{weak,strong} are supported for all types, including integers, floats, vectors, aggregates, and pointers. On the other hand the atomic_fetch_{add,sub,and,or,…} operations are not – they only work on integers and, for “add” and “sub”, pointers.

I think that similarly makes sense in llvm: the instructions “atomic_load”, “atomic_store”, “atomicrmw xchg”, and “cmpxchg” should all support float/vector types of the target-allowed sizes: the mechanism you proposed of adding bitcasts in the AtomicExpandPass should work for all four of those.

But supporting actual arithmetic operations on other types, e.g. “atomicrmw add”, “atomicrmw sub”, or any of the others, is a completely separate issue, and I don’t think it makes any sense to consider that as a part of what you’re doing.

I agree, though, unless reviewers really push for it, I only plan on implementing the first two for now. I may get to the later at a future point, but they’re not on my must have list right now. (Actually, I might get to at least cmpxchg sooner rather than later. I just noticed we had a bit of code in our frontend hacking around that too.) Again, I’m not sure what you’re getting at. We already support “atomicrmw add”. Did you possible mean “atomicrmw fadd”? Or were you using “add” in place of “some_other_op_we_dont_yet_support”? Philip

Patch posted for review: http://reviews.llvm.org/D15471

Philip

Looking at the patch, I think we should do FP only for now as vectors have
extra complexities which IMO warrant more discussion.

Hi Philip,

Hi Philip,

Currently, we limit atomic loads and stores to either pointer or integer types. I would like to propose that we extend this to allow both floating point and vector types which meet the other requirements. (i.e. power-of-two multiple of 8 bits, and aligned)

If you’re adding support for floating point atomics, would it be possible for you to add support for atomic operations on pointers too? Currently, front ends need to bitcast pointers to an integer type which assumes that a corresponding integer type exists (it doesn’t for us) and loses pointerness of the operand in the middle, making life annoying for garbage collection.

Currently, our support for atomics on pointers is mixed. Atomic loads and stores of pointer types are supported. atomicrmw and cmpxchg operations on pointers not. I know of no reason for the distinction.

I agree that making the change you're proposing is valuable. If you want to file a bug and assign it me, I'll likely get to it in the next couple of weeks. Alternatively, I'd be happy to review patches. :slight_smile:

I’m fine with splitting the patch up to make progress. I’ll post a split patch shortly. Would you mind explaining what complexities you see for vectors? As per my direct email, the set of vectors which can practically be made atomic may be smaller than we’d like, but the existing atomic semantics seem to map cleanly. What am I missing? Philip

Patch posted for review: http://reviews.llvm.org/D15471

Looking at the patch, I think we should do FP only for now as vectors have
extra complexities which IMO warrant more discussion.

I'm fine with splitting the patch up to make progress. I'll post a split
patch shortly.

Thanks. FWIW I think we'll want to do vectors, but I think it's trickier
than just FP.

Would you mind explaining what complexities you see for vectors? As per my

direct email, the set of vectors which can practically be made atomic may
be smaller than we'd like, but the existing atomic semantics seem to map
cleanly. What am I missing?

I'm also concerned about:

   - Alignment is the big one, I think we'll want to discuss having
   entirely atomic vectors as well as vectors whose elements are atomic only.
   - Having vectors of pointers without fully supporting atomic pointer.
   - Vectors of unusual sizes or integer types being atomic, and how they
   get legalized. e.g. <3 x i32> or <256 x i1>.
   - Once we add vector, should we consider adding other composite types in
   general, including structs? C++ allows this, but has substantial issues
   w.r.t. padding.

Agreed. :slight_smile: Not sure how the second part relates to the first part. I agree that we probably want a notion of element-wise atomicity for vector (and possibly struct/array) types, but I think that should come later. Specifically, I think we’d need an alternate spelling for an element-wise for on atomic, so I see no harm in having the support for fully atomic vectors added now. We support atomic pointer operations in loads and stores. Again, what we support in load/store is currently separate from what we support in atomicrmw/cmpxchg. We should unify the later with the former, but that’s a separate issue. Well, the former isn’t legal. It isn’t a power of two size. I’m not suggesting we relax that requirement. If it has a power of two store size, then it should get the equivalent handling to an iX of the same size. I don’t see the issue here. How is the second example any different from an atomic i256? Actually, this brings up a related issue. We seem to have no documentation in the lang ref about how vector types are represented in memory. The actual implementation appears to assumed packed bits, but the docs don’t say. Depending on what the semantics here are, my assumptions in the last two paragraphs may not be valid. I’d say possibly, but FCAs are poorly supported in the IR in general. I am not willing to block changes for vectors on changes for FCAs. This has never been our policy in the past and should not become so now. Philip p.s. Thanks for asking clarifying questions. Getting this all hammered out is definitely a good idea. I just want to make sure we close the loop quickly so that this doesn’t get stalled.

Would you mind explaining what complexities you see for vectors? As per

my direct email, the set of vectors which can practically be made atomic
may be smaller than we'd like, but the existing atomic semantics seem to
map cleanly. What am I missing?

I'm also concerned about:

   - Alignment is the big one, I think we'll want to discuss having
   entirely atomic vectors as well as vectors whose elements are atomic only.

Not sure how the second part relates to the first part. I agree that we
probably want a notion of element-wise atomicity for vector (and possibly
struct/array) types, but I think that should come later. Specifically, I
think we'd need an alternate spelling for an element-wise for on atomic, so
I see no harm in having the support for fully atomic vectors added now.

   - Having vectors of pointers without fully supporting atomic pointer.

We support atomic pointer operations in loads and stores. Again, what we
support in load/store is currently separate from what we support in
atomicrmw/cmpxchg. We should unify the later with the former, but that's a
separate issue.

Ha, I didn't realize we did, I though it had to go through intptr casts
just like FP does but r162146 added it back in 2012. The documentation and
verifier look slightly wrong, here's a proposed fix
<http://reviews.llvm.org/D15512>.

   - Vectors of unusual sizes or integer types being atomic, and how they
   get legalized. e.g. <3 x i32> or <256 x i1>.

Well, the former isn't legal. It isn't a power of two size. I'm not
suggesting we relax that requirement.

Indeed, I'd like to make sure we don't and have some pretty explicit
testing for it.

If it has a power of two store size, then it should get the equivalent

handling to an iX of the same size. I don't see the issue here. How is
the second example any different from an atomic i256?

i1 isn't storable as-is :slight_smile:

Actually, this brings up a related issue. We seem to have no documentation

in the lang ref about how vector types are represented in memory. The
actual implementation appears to assumed packed bits, but the docs don't
say. Depending on what the semantics here are, my assumptions in the last
two paragraphs may not be valid.

Indeed!

   - Once we add vector, should we consider adding other composite types
   in general, including structs? C++ allows this, but has substantial issues
   w.r.t. padding.

I'd say possibly, but FCAs are poorly supported in the IR in general. I
am not willing to block changes for vectors on changes for FCAs. This has
never been our policy in the past and should not become so now.

Oh yeah I don't think we should block it. FWIW I expect that some of these
will generate calls to the runtime's global atomic lock shard, which is
horrible.

p.s. Thanks for asking clarifying questions. Getting this all hammered out

is definitely a good idea. I just want to make sure we close the loop
quickly so that this doesn't get stalled.

You mean, more stalled that the holidays will already stall it? :wink:

Commented on your review. Reasonable. It looks like the backend scalarizes a non-atomic <256 x i1> store into 256 1 byte stores. So no, that can’t be atomic. :slight_smile: The particular lowering is also terrible. We should be emitting a small loop in this case, not unrolling the entire thing. Any clue how to start specifying this? It would seem to be ABI dependent. “global atomic lock shard”? I have no idea what you’re referring to. Is that something in libc? True. :slight_smile: Philip

Actually, this brings up a related issue. We seem to have no

documentation in the lang ref about how vector types are represented in
memory. The actual implementation appears to assumed packed bits, but the
docs don't say. Depending on what the semantics here are, my assumptions
in the last two paragraphs may not be valid.

Indeed!

Any clue how to start specifying this? It would seem to be ABI
dependent.

Maybe start a new threads here describing the problem? I don't really know
otherwise.

   - Once we add vector, should we consider adding other composite types
   in general, including structs? C++ allows this, but has substantial issues
   w.r.t. padding.

I'd say possibly, but FCAs are poorly supported in the IR in general. I
am not willing to block changes for vectors on changes for FCAs. This has
never been our policy in the past and should not become so now.

Oh yeah I don't think we should block it. FWIW I expect that some of these
will generate calls to the runtime's global atomic lock shard, which is
horrible.

"global atomic lock shard"? I have no idea what you're referring to. Is
that something in libc?

compiler-rt:
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/atomic.c

Will do. Hm, I think this raises an interesting semantic question. We could use the global lock shard scheme to make loads atomic w.r.t. other llvm emitted writes, but not writes emitted by other compilers. This would mean that linking object files with atomics might break their atomicity. I’m not sure we want to allow that. Maybe we can do that only if the synchronization scope allows it or something? Philip

   - Once we add vector, should we consider adding other composite
   types in general, including structs? C++ allows this, but has substantial
   issues w.r.t. padding.

I'd say possibly, but FCAs are poorly supported in the IR in general. I
am not willing to block changes for vectors on changes for FCAs. This has
never been our policy in the past and should not become so now.

Oh yeah I don't think we should block it. FWIW I expect that some of
these will generate calls to the runtime's global atomic lock shard, which
is horrible.

"global atomic lock shard"? I have no idea what you're referring to. Is
that something in libc?

compiler-rt:
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/atomic.c

Hm, I think this raises an interesting semantic question. We could use
the global lock shard scheme to make loads atomic w.r.t. other llvm emitted
writes, but not writes emitted by other compilers. This would mean that
linking object files with atomics might break their atomicity. I'm not
sure we want to allow that. Maybe we can do that only if the
synchronization scope allows it or something?

GCC does it in libatomic:
https://github.com/gcc-mirror/gcc/tree/master/libatomic

They agree on the function name, so AFAIK either runtime allows this to
work properly: the compiler just emits a call to the function, and one of
the runtimes has to be present.