RFC: Extending atomic loads and stores to floating point and vector types

preames · December 11, 2015, 2:22am

Currently, we limit atomic loads and stores to either pointer or integer types. I would like to propose that we extend this to allow both floating point and vector types which meet the other requirements. (i.e. power-of-two multiple of 8 bits, and aligned)

This will enable a couple of follow on changes:
1) Teaching the vectorizer how to vectorize unordered atomic loads and stores
2) Removing special casing around type canonicalization of loads in various passes
3) Removing complexity from language frontends which need to support atomic operations on floating point types.

My initial implementation plan will not require any changes from the backends. I plan to add a lowering step to the existing AtomicExpandPass which will convert atomic operations on floats and vectors to their equivalently sized integer counterparts. Over time, individual backends will be able to opt in - via a TTI hook - to have the new types of atomics handled by the normal isel machinery.

I've prototyped this approach with the x86 backend and get what looks like correct and even fairly efficient instruction selection taking place. I haven't studied it too extensively, so it might not work out in the end, but the approach appears generally feasible.

One open question I don't know the answer to: Are there any special semantics required from floating point stores which aren't met by simply bitcasting their result to i32 (float) or i64 (double) and storing the result? In particular, I'm unsure of the semantics around canonicalization here. Are there any? Same for loads?

Philip

JF_Bastien1 · December 11, 2015, 8:05am

Currently, we limit atomic loads and stores to either pointer or integer
types. I would like to propose that we extend this to allow both floating
point and vector types which meet the other requirements. (i.e.
power-of-two multiple of 8 bits, and aligned)

I support this.

This will enable a couple of follow on changes:

1) Teaching the vectorizer how to vectorize unordered atomic loads and
stores
2) Removing special casing around type canonicalization of loads in
various passes
3) Removing complexity from language frontends which need to support
atomic operations on floating point types.

This may become relevant for C++, see P0020r00 : Floating Point Atomic View

My initial implementation plan will not require any changes from the

backends. I plan to add a lowering step to the existing AtomicExpandPass
which will convert atomic operations on floats and vectors to their
equivalently sized integer counterparts. Over time, individual backends
will be able to opt in - via a TTI hook - to have the new types of atomics
handled by the normal isel machinery.

I've prototyped this approach with the x86 backend and get what looks like
correct and even fairly efficient instruction selection taking place. I
haven't studied it too extensively, so it might not work out in the end,
but the approach appears generally feasible.

Simpler path that D11382 <Login, generating the
same code?

One open question I don't know the answer to: Are there any special

semantics required from floating point stores which aren't met by simply
bitcasting their result to i32 (float) or i64 (double) and storing the
result? In particular, I'm unsure of the semantics around canonicalization
here. Are there any? Same for loads?

I'd go a bit further: should you also support basic FP operations
atomically? The above C++ paper adds add/sub, and we've discussed adding
FMA as well.

This raises similar issues around FP exceptions (are they impl-defined, UB,
unspecified? Do we care?).

Finkel_Hal_J · December 11, 2015, 5:20pm

Hi Philip,

I think this makes sense.

-Hal

preames · December 11, 2015, 8:05pm

Great. I’ll prepare a patch and send it for review. Actually, I think I was relying on exactly that change. My initial implementation just converts “store atomic float 0.0, float* %p unordered, align 4” to %p.i = bitcast float* %p to i32* %v = bitcast float 0.0 to i32 store atomic i32 %v, i32* %p.i unordered, align 4 (i.e. it just does what the frontend previously did in the AtomicExpand.) The net result is that the backend still sees the bitcast idioms you added pattern matching for. Once that parts in, I’ll start changing the canonical form the backend expects, but given that’s backend specific, I wanted a migration mechanism which worked for all backends. p.s. There’s still lots of work to do to improve isel for atomic loads and stores on these types. I plan to eventually get there, but my first focus is on a) functional correctness, and b) getting the middle-end optimizers fixed up. Just to be clear, I assume you mean extending the atomicrmw, and cmpxchg instructions right? I agree this is worth doing, but I’m purposely separating that into a possible later proposal. I don’t need this right now and keeping the work scoped reasonably is key to making progress. In practice, we seem not to care. This is much broader than my proposal, and I have no opinions on what the right solutions here are. Philip

jyknight · December 11, 2015, 9:29pm

It seems a unfortunate, and potentially problematic if llvm doesn't
support, at least, "atomicrmw xchg" and "cmpxchg". If you support just
those two additionally to load/store, it'll cover everything the C frontend
actually needs to be able to do *now* with floating point atomics.

Atomic floating point math isn't actually a supported operation in C now,
so it seems reasonable to leave the rest of the atomicrmw ops for a future
design.

preames · December 11, 2015, 9:55pm

I’m really not sure what you’re trying to say here. Your two paragraphs seem to contradict each other? For the record, I actually don’t care about the C frontend at all. If my work helps the clang, great, but that’s not my goal. Others can step up to push the changes through clang if they want to see the benefit there. Philip

jyknight · December 11, 2015, 11:27pm

I apologize for being unclear.

I only meant that in C and C++, “atomic_load”, “atomic_store”, “atomic_exchange”, and "atomic_compare_exchange_{weak,strong} are supported for all types, including integers, floats, vectors, aggregates, and pointers. On the other hand the atomic_fetch_{add,sub,and,or,…} operations are not – they only work on integers and, for “add” and “sub”, pointers.

I think that similarly makes sense in llvm: the instructions “atomic_load”, “atomic_store”, “atomicrmw xchg”, and “cmpxchg” should all support float/vector types of the target-allowed sizes: the mechanism you proposed of adding bitcasts in the AtomicExpandPass should work for all four of those.

But supporting actual arithmetic operations on other types, e.g. “atomicrmw add”, “atomicrmw sub”, or any of the others, is a completely separate issue, and I don’t think it makes any sense to consider that as a part of what you’re doing.

preames · December 12, 2015, 12:23am

I agree, though, unless reviewers really push for it, I only plan on implementing the first two for now.Â I may get to the later at a future point, but they’re not on my must have list right now.Â (Actually, I might get to at least cmpxchg sooner rather than later.Â I just noticed we had a bit of code in our frontend hacking around that too.) Again, I’m not sure what you’re getting at.Â We already support “atomicrmw add”.Â Did you possible mean “atomicrmw fadd”?Â Or were you using “add” in place of “some_other_op_we_dont_yet_support”? Philip

preames · December 12, 2015, 12:24am

Patch posted for review: http://reviews.llvm.org/D15471

Philip

JF_Bastien1 · December 12, 2015, 9:44pm

Looking at the patch, I think we should do FP only for now as vectors have
extra complexities which IMO warrant more discussion.

David_Chisnall3 · December 13, 2015, 9:03pm

Hi Philip,

preames · December 14, 2015, 7:44pm

Hi Philip,

Currently, we limit atomic loads and stores to either pointer or integer types. I would like to propose that we extend this to allow both floating point and vector types which meet the other requirements. (i.e. power-of-two multiple of 8 bits, and aligned)

If you’re adding support for floating point atomics, would it be possible for you to add support for atomic operations on pointers too? Currently, front ends need to bitcast pointers to an integer type which assumes that a corresponding integer type exists (it doesn’t for us) and loses pointerness of the operand in the middle, making life annoying for garbage collection.

Currently, our support for atomics on pointers is mixed. Atomic loads and stores of pointer types are supported. atomicrmw and cmpxchg operations on pointers not. I know of no reason for the distinction.

I agree that making the change you're proposing is valuable. If you want to file a bug and assign it me, I'll likely get to it in the next couple of weeks. Alternatively, I'd be happy to review patches.

preames · December 14, 2015, 7:46pm

I’m fine with splitting the patch up to make progress. I’ll post a split patch shortly. Would you mind explaining what complexities you see for vectors? As per my direct email, the set of vectors which can practically be made atomic may be smaller than we’d like, but the existing atomic semantics seem to map cleanly. What am I missing? Philip

JF_Bastien1 · December 14, 2015, 8:46pm

Patch posted for review: ⚙ D15471 [IR] Add support for floating pointer atomic loads and stores

Looking at the patch, I think we should do FP only for now as vectors have
extra complexities which IMO warrant more discussion.

I'm fine with splitting the patch up to make progress. I'll post a split
patch shortly.

Thanks. FWIW I think we'll want to do vectors, but I think it's trickier
than just FP.

Would you mind explaining what complexities you see for vectors? As per my

direct email, the set of vectors which can practically be made atomic may
be smaller than we'd like, but the existing atomic semantics seem to map
cleanly. What am I missing?

I'm also concerned about:

   - Alignment is the big one, I think we'll want to discuss having
   entirely atomic vectors as well as vectors whose elements are atomic only.
   - Having vectors of pointers without fully supporting atomic pointer.
   - Vectors of unusual sizes or integer types being atomic, and how they
   get legalized. e.g. <3 x i32> or <256 x i1>.
   - Once we add vector, should we consider adding other composite types in
   general, including structs? C++ allows this, but has substantial issues
   w.r.t. padding.

preames · December 14, 2015, 9:26pm

Agreed. Not sure how the second part relates to the first part. I agree that we probably want a notion of element-wise atomicity for vector (and possibly struct/array) types, but I think that should come later. Specifically, I think we’d need an alternate spelling for an element-wise for on atomic, so I see no harm in having the support for fully atomic vectors added now. We support atomic pointer operations in loads and stores. Again, what we support in load/store is currently separate from what we support in atomicrmw/cmpxchg. We should unify the later with the former, but that’s a separate issue. Well, the former isn’t legal. It isn’t a power of two size. I’m not suggesting we relax that requirement. If it has a power of two store size, then it should get the equivalent handling to an iX of the same size. I don’t see the issue here. How is the second example any different from an atomic i256? Actually, this brings up a related issue. We seem to have no documentation in the lang ref about how vector types are represented in memory. The actual implementation appears to assumed packed bits, but the docs don’t say. Depending on what the semantics here are, my assumptions in the last two paragraphs may not be valid. I’d say possibly, but FCAs are poorly supported in the IR in general. I am not willing to block changes for vectors on changes for FCAs. This has never been our policy in the past and should not become so now. Philip p.s. Thanks for asking clarifying questions. Getting this all hammered out is definitely a good idea. I just want to make sure we close the loop quickly so that this doesn’t get stalled.

JF_Bastien1 · December 15, 2015, 12:23am

Would you mind explaining what complexities you see for vectors? As per

my direct email, the set of vectors which can practically be made atomic
may be smaller than we'd like, but the existing atomic semantics seem to
map cleanly. What am I missing?

I'm also concerned about:

   - Alignment is the big one, I think we'll want to discuss having
   entirely atomic vectors as well as vectors whose elements are atomic only.

Not sure how the second part relates to the first part. I agree that we
probably want a notion of element-wise atomicity for vector (and possibly
struct/array) types, but I think that should come later. Specifically, I
think we'd need an alternate spelling for an element-wise for on atomic, so
I see no harm in having the support for fully atomic vectors added now.

   - Having vectors of pointers without fully supporting atomic pointer.

We support atomic pointer operations in loads and stores. Again, what we
support in load/store is currently separate from what we support in
atomicrmw/cmpxchg. We should unify the later with the former, but that's a
separate issue.

Ha, I didn't realize we did, I though it had to go through intptr casts
just like FP does but r162146 added it back in 2012. The documentation and
verifier look slightly wrong, here's a proposed fix
<Login.

- Vectors of unusual sizes or integer types being atomic, and how they
get legalized. e.g. <3 x i32> or <256 x i1>.

Well, the former isn't legal. It isn't a power of two size. I'm not
suggesting we relax that requirement.

Indeed, I'd like to make sure we don't and have some pretty explicit
testing for it.

If it has a power of two store size, then it should get the equivalent

handling to an iX of the same size. I don't see the issue here. How is
the second example any different from an atomic i256?

i1 isn't storable as-is

Actually, this brings up a related issue. We seem to have no documentation

in the lang ref about how vector types are represented in memory. The
actual implementation appears to assumed packed bits, but the docs don't
say. Depending on what the semantics here are, my assumptions in the last
two paragraphs may not be valid.

Indeed!

   - Once we add vector, should we consider adding other composite types
   in general, including structs? C++ allows this, but has substantial issues
   w.r.t. padding.

I'd say possibly, but FCAs are poorly supported in the IR in general. I
am not willing to block changes for vectors on changes for FCAs. This has
never been our policy in the past and should not become so now.

Oh yeah I don't think we should block it. FWIW I expect that some of these
will generate calls to the runtime's global atomic lock shard, which is
horrible.

p.s. Thanks for asking clarifying questions. Getting this all hammered out

is definitely a good idea. I just want to make sure we close the loop
quickly so that this doesn't get stalled.

You mean, more stalled that the holidays will already stall it?

preames · December 15, 2015, 11:10pm

Commented on your review. Reasonable. It looks like the backend scalarizes a non-atomic <256 x i1> store into 256 1 byte stores. So no, that can’t be atomic. The particular lowering is also terrible. We should be emitting a small loop in this case, not unrolling the entire thing. Any clue how to start specifying this? It would seem to be ABI dependent. “global atomic lock shard”? I have no idea what you’re referring to. Is that something in libc? True. Philip

JF_Bastien1 · December 16, 2015, 12:02am

Actually, this brings up a related issue. We seem to have no

documentation in the lang ref about how vector types are represented in
memory. The actual implementation appears to assumed packed bits, but the
docs don't say. Depending on what the semantics here are, my assumptions
in the last two paragraphs may not be valid.

Indeed!

Any clue how to start specifying this? It would seem to be ABI
dependent.

Maybe start a new threads here describing the problem? I don't really know
otherwise.

   - Once we add vector, should we consider adding other composite types
   in general, including structs? C++ allows this, but has substantial issues
   w.r.t. padding.

I'd say possibly, but FCAs are poorly supported in the IR in general. I
am not willing to block changes for vectors on changes for FCAs. This has
never been our policy in the past and should not become so now.

Oh yeah I don't think we should block it. FWIW I expect that some of these
will generate calls to the runtime's global atomic lock shard, which is
horrible.

"global atomic lock shard"? I have no idea what you're referring to. Is
that something in libc?

compiler-rt:

github.com

llvm-mirror/compiler-rt/blob/master/lib/builtins/atomic.c

//===-- atomic.c - Implement support functions for atomic operations.------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
//
//  atomic.c defines a set of functions for performing atomic accesses on
//  arbitrary-sized memory locations.  This design uses locks that should
//  be fast in the uncontended case, for two reasons:
//
//  1) This code must work with C programs that do not link to anything
//     (including pthreads) and so it should not depend on any pthread
//     functions.
//  2) Atomic operations, rather than explicit mutexes, are most commonly used
//     on code where contended operations are rate.
//
//  To avoid needing a per-object lock, this code allocates an array of
//  locks and hashes the object pointers to find the one that it should use.

This file has been truncated. show original

preames · December 16, 2015, 12:09am

Will do. Hm, I think this raises an interesting semantic question. We could use the global lock shard scheme to make loads atomic w.r.t. other llvm emitted writes, but not writes emitted by other compilers. This would mean that linking object files with atomics might break their atomicity. I’m not sure we want to allow that. Maybe we can do that only if the synchronization scope allows it or something? Philip

JF_Bastien1 · December 16, 2015, 12:21am

   - Once we add vector, should we consider adding other composite
   types in general, including structs? C++ allows this, but has substantial
   issues w.r.t. padding.

I'd say possibly, but FCAs are poorly supported in the IR in general. I
am not willing to block changes for vectors on changes for FCAs. This has
never been our policy in the past and should not become so now.

Oh yeah I don't think we should block it. FWIW I expect that some of
these will generate calls to the runtime's global atomic lock shard, which
is horrible.

"global atomic lock shard"? I have no idea what you're referring to. Is
that something in libc?

compiler-rt:
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/builtins/atomic.c

Hm, I think this raises an interesting semantic question. We could use
the global lock shard scheme to make loads atomic w.r.t. other llvm emitted
writes, but not writes emitted by other compilers. This would mean that
linking object files with atomics might break their atomicity. I'm not
sure we want to allow that. Maybe we can do that only if the
synchronization scope allows it or something?

GCC does it in libatomic:

They agree on the function name, so AFAIK either runtime allows this to
work properly: the compiler just emits a call to the function, and one of
the runtimes has to be present.

Topic		Replies	Views
Proposal for a new LLVM concurrency memory model LLVM Dev List Archives	32	199	April 27, 2010
Why can't atomic loads and stores handle floats? LLVM Dev List Archives	6	107	May 27, 2014
Atomic Operation and Synchronization Proposal v2 LLVM Dev List Archives	10	137	July 13, 2007
Proposal for atomic and synchronization instructions LLVM Dev List Archives	21	144	July 10, 2007
Plan to optimize atomics in LLVM LLVM Dev List Archives	15	101	August 18, 2014

RFC: Extending atomic loads and stores to floating point and vector types

Related topics