Memory scope proposal

Hi all,

Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to represent synchronization scope on atomic instructions. We would like to enhance the representation of memory scopes in LLVM IR to allow more values than just the current two. The intention of this email is to invite comments on our proposal. There are some discussion before and it can be found here:

https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ

Here is our new proposal:

Ping. Do we have any advice on this proposal? Thanks!

Ping! We need to close on whether we can use integers and global metadata for interpreting all the non-standard integers, to represent memory scopes.

​​
Dear all,

Here is the plain text version of the proposal:

Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to represent synchronization scope on atomic instructions. We would like to enhance the representation of memory scopes in LLVM IR to allow more values than just the current two. The intention of this email is to invite comments on our proposal. There are some discussion before and it can be found here:
https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ

Here is our new proposal:

Ke,

I’ll be the bearer of bad news here. The radio silence this proposal has gotten probably means there is not enough interest in the community in this proposal to see it land.

One concern I have with the current proposal is that the optimization value of these scopes is not clear to me. Is it only the backend which is expected to support optimizations over these scopes? Or are you expecting the middle end optimizer to understand them? If so, I’d suspect we’d need a refined definition which allows us to discuss relative strengths of memory scopes.

More fundamentally, it’s not clear to me that “scope” is even the right model for this. I could see a case where we’d want something along the lines of “acquire semantics on memory space 1, release semantics on memory space 2, cst_seq semantics on address space 3”.

Also, unless I’m misreading on my skim of your proposal, the current definition of scope is slightly off from what you’ve specified. A “seq_cst singlethread” fence is a much weaker fence than a “seq_cst crossthread”. It’s probably easiest to reason about the current scheme as having the cross product of {singlethread, crossthread} x {orderings…} distinct orderings rather than a set of orderings with two overlapping scopes.

Philip

Ke,

I’ll be the bearer of bad news here. The radio silence this proposal has gotten probably means there is not enough interest in the community in this proposal to see it land.

FWIW, I’m very interested in seeing it go in, but haven’t had a lot of time to write a response.

One concern I have with the current proposal is that the optimization value of these scopes is not clear to me. Is it only the backend which is expected to support optimizations over these scopes? Or are you expecting the middle end optimizer to understand them? If so, I’d suspect we’d need a refined definition which allows us to discuss relative strengths of memory scopes.

I don’t know about Ke’s use cases, but I at least am not very concerned with having any portion of LLVM optimize them. Right now LLVM has no way to represent the information encoded here at all.

More fundamentally, it’s not clear to me that “scope” is even the right model for this. I could see a case where we’d want something along the lines of "acquire semantics on memory space 1, release semantics on memory space 2, cst_seq semantics on address space 3”.

Scopes are orthogonal to ordering constraints. Scopes are about memory operation visibility, primarily in the context of a machine with non-coherent caches. Imagine an accelerator with:

  • Per HW thread load/store buffers
  • Per core L1
  • Accelerator-wide L2
  • Whole-system DRAM

… and at any level of the hierarchy, the caching for one thread/core/accelerator may not be coherent with caches for other threads/cores/accelerators.

Scopes allow the program author to express the requisite visibility for a memory option; an that needs to be visible to other cores within the accelerator may need to bypass or flush the per-core L1. Communication to the host CPU or other accelerators may similarly need to bypass the the L2.

—Owen

​​
Dear all,

Here is the plain text version of the proposal:

Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to
represent synchronization scope on atomic instructions. We would like to
enhance the representation of memory scopes in LLVM IR to allow more values
than just the current two. The intention of this email is to invite
comments on our proposal. There are some discussion before and it can be
found here:
Redirecting to Google Groups

Here is our new proposal:

=================================================================
We still let the bitcode store memory scopes as "unsigned integers", since
that is the easiest way to maintain compatibility. The values 0 and 1 are
special. All other values are meaningful only within that bc file. In
addition, "a global metadata in the file" will provide a map from unsigned
integers to string symbols which should be used to interpret all the
non-standard integers. If the global metadata is empty or non-existent,
then all non-zero values will be mapped to "system", which is the current
behavior.

The proposed syntax for synchronization scope is as follows:
* Synchronization scopes are of arbitrary width, but implemented as
unsigned in the bitcode, just like address spaces.
* Cross-thread is default.
* Keyword "singlethread" is unchanged
* New syntax "synchscope(n)" for other target-specific scopes.
* There is no keyword for cross-thread, but it can be specified as
"synchscope(0)".

The proposed new integer implementation expanded synchronization scopes are
as follows:
****************************************************************
Format
​ ​
Single Thread
​ ​
System (renamed)
​ ​
Intermediate
Bitcode

zero

one

unsigned n
Assembly
​ ​
singlethread,
​ ​
empty (default),
​ ​
synchscope(n-1)

​ ​
synchscope(~0U)
​ ​
synchscope(0)
In-memory

~0U
​ z
ero

unsigned n-1
SelectionDAG

~0U
​ ​
zero

unsigned n-1

This part of the proposal is formatted strangely and is a little confusing.
Was this supposed to be a table? Can you re-format so it is more clear what
is being proposed.

Thanks,
Tom

Here is the initial proposal with some formatting fixed:

Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to
represent synchronization scope on atomic instructions. We would like to
enhance the representation of memory scopes in LLVM IR to allow more values
than just the current two. The intention of this email is to invite
comments on our proposal. There are some discussion before and it can be
found here:
https://groups.google.com/forum/#!searchin/llvm-dev/hsail/llvm-dev/46eEpS5h0E4/i3T9xw-DNVYJ

Here is our new proposal:

Why not going with a metadata attachment directly and kill the "singlethread" keyword? Something like:
Something like:

cmpxchg i32* %addr, i32 42, i32 0 monotonic monotonic, 3, !memory.scope{!42}
cmpxchg i32* %addr, i32 42, i32 0 monotonic monotonic, 3, !memory.scope{!43}

...

!42 = !{"singlethread"}
!43 = !{"L2"}

I also avoids manipulating/pruning the global map, and target won't depend on integer to keep bitcode compatibility.

+1. I like this idea. Haven't given it serious thought, but on the surface this sounds nice. We'd need to establish a naming scheme so that targets could add meta tags without preventing upstream from adding generic ones going forward.

Philip

>
> Here is the initial proposal with some formatting fixed:
>
> Currently, the LLVM IR uses a binary value (SingleThread/CrossThread) to
> represent synchronization scope on atomic instructions. We would like to
> enhance the representation of memory scopes in LLVM IR to allow more values
> than just the current two. The intention of this email is to invite
> comments on our proposal. There are some discussion before and it can be
> found here:
> Redirecting to Google Groups
>
> Here is our new proposal:
>
> =================================================================
> We still let the bitcode store memory scopes as "unsigned integers", since
> that is the easiest way to maintain compatibility. The values 0 and 1 are
> special. All other values are meaningful only within that bc file. In
> addition, "a global metadata in the file" will provide a map from unsigned
> integers to string symbols which should be used to interpret all the
> non-standard integers. If the global metadata is empty or non-existent,
> then all non-zero values will be mapped to "system", which is the current
> behavior.
>
> The proposed syntax for synchronization scope is as follows:
> * Synchronization scopes are of arbitrary width, but implemented as
> unsigned in the bitcode, just like address spaces.
> * Cross-thread is default.
> * Keyword "singlethread" is unchanged
> * New syntax "synchscope(n)" for other target-specific scopes.
> * There is no keyword for cross-thread, but it can be specified as
> "synchscope(0)".

Why not going with a metadata attachment directly and kill the "singlethread" keyword? Something like:
Something like:

cmpxchg i32* %addr, i32 42, i32 0 monotonic monotonic, 3, !memory.scope{!42}
cmpxchg i32* %addr, i32 42, i32 0 monotonic monotonic, 3, !memory.scope{!43}

...

!42 = !{"singlethread"}
!43 = !{"L2"}

I also avoids manipulating/pruning the global map, and target won't depend on integer to keep bitcode compatibility.

This seems like it will work assuming that promoting something like "L2" to the
most conservative memory scope is legal (this is what would have to happen
if the metadata was dropped). Are there any targets where this type of'
promotion would be illegal?

-Tom

Hi all,

Why not going with a metadata attachment directly and kill the "singlethread" keyword? Something like:
>Something like:
>
> cmpxchg i32* %addr, i32 42, i32 0 monotonic monotonic, 3, !memory.scope{!42}
> cmpxchg i32* %addr, i32 42, i32 0 monotonic monotonic, 3, !memory.scope{!43}
>
>...
>
>!42 = !{"singlethread"}
>!43 = !{"L2"}
>
>I also avoids manipulating/pruning the global map, and target won't depend on integer to keep bitcode compatibility.
>

This seems like it will work assuming that promoting something like "L2" to the
most conservative memory scope is legal (this is what would have to happen
if the metadata was dropped). Are there any targets where this type of'
promotion would be illegal?

+1

Sorry to enter this discussion so late, but in my opinion this
is a very good non-intrusive starting point solution.

Implementing it as a MD with the assumption of there being a safe
conservative memory scope to fall back to (in case the MD gets
stripped off for some reason) converts it to a mere
optimization hint without the need to touch LLVM IR instruction semantics.

Also, as it's only a MD, if we encounter a need to extend it later towards a more integrated solution (or a mechanism to support
targets where this scheme is not feasible), we can more easily do so.

BR,

+ Brian and Konstantin

We believe that it would be best that this is added to the LLVM IR atomic memory instruction as fields on atomic instructions rather than using meta data.

The reasoning is that this information is similar to other information that is represented as instruction fields. For example, the indication that memory operations are atomic rather than non-atomic, the memory ordering of atomics, and whether per-thread or system scope. In all these cases this information has a semantic meaning for the instructions that can be exploited by optimizations. Representing it as meta data would mean this information could be dropped making the optimizations impossible with very significant performance penalty.

For example, if memory operations were not marked as being atomic, all memory operations would have to be generated as sequential consistent atomics at system scope. Although this "default" behavior is correct, it would not be very performant. Similarly, the memory ordering could use a "default" of sequentially consistent, which again is much less efficient than the weaker orderings. By analogy, the memory scope could also have a "default" of system scope, but that is also not performant when the scope is narrower.

In all these cases this information changes the semantics of the instructions. It affects whether a program is undefined behavior. Using a "default" value leads to those same programs being treated as having defined behavior (for example by eliminating data races).

Currently the atomic memory instructions have the own-thread/system recorded on the instruction which is a limited form of memory scope. The proposal is to replace this with a more general field that can have more than 2 values. Languages that do not use memory scopes can simply use the value corresponding to system scope.

We understand that it is good to avoid adding information to LLVM instructions that is not primary, but in this case it seems that the atomicity, memory ordering and memory scope are all equally primary information that characterize the semantics of memory instructions.

We have posted reviews that implement the proposal and invite everyone to discuss it:
http://reviews.llvm.org/D21723
http://reviews.llvm.org/D21724

Thank you,
Konstantin

We believe that it would be best that this is added to the LLVM IR atomic memory instruction as fields on atomic instructions rather than using meta data.

The reasoning is that this information is similar to other information that is represented as instruction fields. For example, the indication that memory operations are atomic rather than non-atomic, the memory ordering of atomics, and whether per-thread or system scope. In all these cases this information has a semantic meaning for the instructions that can be exploited by optimizations. Representing it as meta data would mean this information could be dropped making the optimizations impossible with very significant performance penalty.

For example, if memory operations were not marked as being atomic, all memory operations would have to be generated as sequential consistent atomics at system scope. Although this “default” behavior is correct, it would not be very performant. Similarly, the memory ordering could use a “default” of sequentially consistent, which again is much less efficient than the weaker orderings. By analogy, the memory scope could also have a “default” of system scope, but that is also not performant when the scope is narrower.

In all these cases this information changes the semantics of the instructions. It affects whether a program is undefined behavior. Using a “default” value leads to those same programs being treated as having defined behavior (for example by eliminating data races).

It is not clear to me if there is any correctness issues to dropping metadata?

Currently the atomic memory instructions have the own-thread/system recorded on the instruction which is a limited form of memory scope. The proposal is to replace this with a more general field that can have more than 2 values. Languages that do not use memory scopes can simply use the value corresponding to system scope.

We understand that it is good to avoid adding information to LLVM instructions that is not primary, but in this case it seems that the atomicity, memory ordering and memory scope are all equally primary information that characterize the semantics of memory instructions.

We have posted reviews that implement the proposal and invite everyone to discuss it:
http://reviews.llvm.org/D21723
http://reviews.llvm.org/D21724

It seems you’re going back to integer, which I don’t really like for reasons mentioned earlier in this thread, and that I don’t feel you addressed here.

I will comment - as one of the few people actually working on llvm’s atomic implementation with any regularity - that I am opposed to extending the instructions without a strong motivating case. I don’t care anywhere near as much about metadata based schemes, but extending the instruction semantics imposes a much larger burden on the rest of the community. That burden has to be well justified and supported.

Philip

In OpenCL 2.x, two atomic operations on the same atomic object need to have
the same scope to prevent a data race. This derives from the definition of
"inclusive scope" in OpenCL 2.x. Encoding OpenCL 2.x scope as metadata in
LLVM IR would be a problem because there cannot be a "safe default value"
to be used when the metadata is dropped. If the "largest" scope is used as
the default, then the optimizer must guarantee that the metadata is dropped
from every atomic operation in the whole program, or not dropped at all.

Hence the original attempt to extend LLVM atomic instructions with a
broader scope field.

Sameer.

Hi,

I have updated the review here:

https://reviews.llvm.org/D21723

As Sameer pointed out, the motivation is:

In OpenCL 2.x, two atomic operations on the same atomic object need to have the same scope to prevent a data race. This derives from the definition of “inclusive scope” in OpenCL 2.x. Encoding OpenCL 2.x scope as metadata in LLVM IR would be a problem because there cannot be a “safe default value” to be used when the metadata is dropped. If the “largest” scope is used as the default, then the optimizer must guarantee that the metadata is dropped from every atomic operation in the whole program, or not dropped at all.

Thanks,

Konstantin

Why not going with a metadata attachment directly and kill the “singlethread” keyword? Something like:

Right, I saw Sameer's explanation for that earlier, and we shouldn’t move forward (without Philip’s opinion on the topic as he expressed concerns).

But you stripped out the second part of my email where I wrote "It seems you’re going back to integer, which I don’t really like for reasons mentioned earlier in this thread, and that I don’t feel you addressed here”. Why can’t `synchscope` take a string literal?