[RFC][PATCH][OPENCL] synchronization scopes redux

ssahasra · December 11, 2014, 10:58am

Hi all,

Attached is a sequence of patches that changes the IR to support more than two synchronization scopes. This is still a work in progress, and these patches are only meant to start a more detailed discussion on the way forward.

One big issue is the absence of any backend that actually makes use of intermediate synchronization scopes. This work is meant to be just one part of the ground work required for landing the much-anticipated HSAIL backend. Also, more work might be needed for emitting atomic instructions via Clang.

The proposed syntax for synchronization scope is as follows:

Synchronization scopes are of arbitrary width, but implemented as unsigned in the bitcode, just like address spaces.
Cross-thread is default, but now encoded as 0.
Keyword ‘singlethread’ is unchanged, but now encoded as the largest integer (which happens to be ~0U in bitcode).
New syntax “synchscope(n)” for other scopes.
There is no keyword for cross-thread, but it can be specified as “synchscope(0)”.

This change breaks forward compatibility for the bitcode, since the meaning of the zero/one values are now changed.

enum SynchronizationScope {
- SingleThread = 0,
- CrossThread = 1
+ CrossThread = 0,
+ SingleThread = ~0U
};

The change passes almost all lit tests including one new test (see patch 0005). The failing tests are specifically checking for forward compatibility:

Failing Tests (3):
LLVM :: Bitcode/cmpxchg-upgrade.ll
LLVM :: Bitcode/memInstructions.3.2.ll
LLVM :: Bitcode/weak-cmpxchg-upgrade.ll

This breakage remains even if we reverse the order of synchronization scopes. One simple way to preserve compatibility is to retain 0 and 1 with their current meanings, and specify that intermediate scopes are represented in an ordered way with numbers greater than one. But this is pretty ugly to work with. Would appreciate inputs on how to fix this!

Sameer.

0001-selection-DAG.patch (12.6 KB)

0002-instructions.patch (21.4 KB)

0003-bitcode.patch (7.98 KB)

0004-clients.patch (5.23 KB)

0005-assembly.patch (8.23 KB)

ssahasra · December 12, 2014, 6:25pm

Ping! Found a simple way to preserve forward compatibility. See below. The issue here is purely in the bitcode, and we need an encoding that can represent new intermediate scopes while preserving the two known values of zero and one. Note that the earlier zero is now ~0U in the in-memory representation, and the earlier 1 is now zero. This mapping can be easily accomplished with a simple increment/decrement by one, ignoring overflow. So the bitreader now subtracts a one when decoding the synch scope, and bitwriter adds a one when encoding the synch scope. The attached change number 0006 is meant to replace changes 0003 and 0005 in the previous list, since the assembly and the bitcode need to be updated simultaneously for this to work. The new change passes all tests, including the ones checking for forward compatibility. Sameer.

0006-assembly-bitcode.patch (13 KB)

Chandler_Carruth · December 24, 2014, 12:24am

I’ve not had a good chance to look at the patches in detail, but just to clarify one point:

I don’t really care whether we number things going up or down from single threaded to “every thread”. I just think it makes sense to expose them in the in-memory IR interface as an enum with a particular ordering so that code can use the obvious sorts of tests for comparing two orderings and not have to worry (overly much) about edge cases. This doesn’t really need to be reflected in the bitcode encoding though, so I’m fine with whatever steps are needed to keep the bitcode compatible and sane.

I also agree with having the text format use a symbolic thing for both extremes. It doesn’t seem super important, but it seems nice.

Regarding the bitcode encoding, I would consider whether one encoding is more space efficient than another. I don’t recall whether we default to zero or whether we use a varint encoding in the bitcode here, but if we do, it would make sense to optimize the encoding around cross thread being the most common. I’m not really a bitcode expert, so I’d rather defer to someone who has hacked on this part of LLVM more recently there.

I can try to take a look at the higher level patches soon though.

ssahasra · January 5, 2015, 10:51am

Right. The second version of my patches fixes the bitcode encoding. But now I see another potential problem with future bitcode if we require an ordering on the scopes. What happens when a backend later introduces a new scope that goes into the middle of the order? If they renumber the scopes to accomodate this, then existing bitcode for that backend will no longer work. The bitcode reader/writer cannot compensate for this since the values are backend-specific. If we agree that this problem is real, then we cannot force an ordering on the scope numbers. So far, I have refrained from proposing a keyword for cross thread scope in the text format, because (a) there never was one and (b) it is not strictly needed since it is the default anyway. I am fine either way, but we will first have to decide what the new keyword should be. I find “allthreads” to be a decent counterpart for “singlethread” … “crossthread” is not good enough since intermediate scopes have multiple threads too. Indeed, the text format is defined around cross thread being the most common, but strangely it was not encoded as zero, even in bitcode. So the most common case turns out to be a one stored as a uint32 in the bitcode. The new scopes fit into that existing space, while the most common case changes from one to ~0U. Maintaining forward compatibility for older bitcode would mean that we can’t optimize by changing the common case to zero. Great! I intend to clean up and submit the in-memory patches first. These simply upgrade the representation from a single bit to an unsigned integer, without affecting any “end points” of the compiler. Sameer.

resistor · January 6, 2015, 6:51am

Hi Sameer,

Right. The second version of my patches fixes the bitcode encoding. But now I see another potential problem with future bitcode if we require an ordering on the scopes. What happens when a backend later introduces a new scope that goes into the middle of the order? If they renumber the scopes to accomodate this, then existing bitcode for that backend will no longer work. The bitcode reader/writer cannot compensate for this since the values are backend-specific. If we agree that this problem is real, then we cannot force an ordering on the scope numbers.

That’s an interesting consideration, and something I hadn’t thought of. I’m unsure offhand of how much it matters in practice. The alternative, I suppose, is having something like string-named scopes, but then we can’t do much with them at the IR level.

So far, I have refrained from proposing a keyword for cross thread scope in the text format, because (a) there never was one and (b) it is not strictly needed since it is the default anyway. I am fine either way, but we will first have to decide what the new keyword should be. I find "allthreads" to be a decent counterpart for "singlethread" ... "crossthread" is not good enough since intermediate scopes have multiple threads too.

This actually raises another question. In principle, the “most visible” scope ought to be something like “system” or “device”, meaning a completely uncached memory access that is visible to all peripherals in a heterogeneous system. However, this is almost certainly not what we want to have for typical memory accesses.

To summarize, a prototypical scope nest, from most to least visible (aka least to most cacheable) might look like:

System —> AllThreads —> Various target-specific local scopes —> SingleThread

If we wanted to go really gonzo, there could be a Network scope at the beginning for large-scale HPC systems, but I’m not sure how important that is to anyone.

As a related question, do we actually need the local scopes to be target specific? Are there systems, real or planned, that *aren’t* captured by:

[Network —> ] System —> AllThreads —> ThreadGroup —> SingleThread ?

—Owen

Chandler_Carruth · January 6, 2015, 7:31am

Hi Sameer,

>
> Right. The second version of my patches fixes the bitcode encoding. But
now I see another potential problem with future bitcode if we require an
ordering on the scopes. What happens when a backend later introduces a new
scope that goes into the middle of the order? If they renumber the scopes
to accomodate this, then existing bitcode for that backend will no longer
work. The bitcode reader/writer cannot compensate for this since the values
are backend-specific. If we agree that this problem is real, then we cannot
force an ordering on the scope numbers.

That’s an interesting consideration, and something I hadn’t thought of.
I’m unsure offhand of how much it matters in practice. The alternative, I
suppose, is having something like string-named scopes, but then we can’t do
much with them at the IR level.

This has me somewhat non-plussed as well.

> So far, I have refrained from proposing a keyword for cross thread scope
in the text format, because (a) there never was one and (b) it is not
strictly needed since it is the default anyway. I am fine either way, but
we will first have to decide what the new keyword should be. I find
"allthreads" to be a decent counterpart for "singlethread" ...
"crossthread" is not good enough since intermediate scopes have multiple
threads too.

This actually raises another question. In principle, the “most visible”
scope ought to be something like “system” or “device”, meaning a completely
uncached memory access that is visible to all peripherals in a
heterogeneous system. However, this is almost certainly not what we want
to have for typical memory accesses.

To summarize, a prototypical scope nest, from most to least visible (aka
least to most cacheable) might look like:

System —> AllThreads —> Various target-specific local scopes —>
SingleThread

If we wanted to go really gonzo, there could be a Network scope at the
beginning for large-scale HPC systems, but I’m not sure how important that
is to anyone.

I probably *should* be in a position to be very interested in such a
concept.... but honestly, I'm not. If I ever wanted to do something like
this, I would just define the large-scale HPC system as the "system" and a
single machine/node as some "local" scope.

As a related question, do we actually need the local scopes to be target
specific? Are there systems, real or planned, that *aren’t* captured by:

[Network —> ] System —> AllThreads —> ThreadGroup —> SingleThread ?

Sadly, I don't think this will work. In particular, there are real-world
accelerators with multiple tiers of thread groups that are visible in the
cache hierarchy subsystem.

I'm starting to think we might actually need to let the target define
acceptable strings for memory scopes and a strict weak ordering over
them.... That's really complex and heavy weight, but I'm not really
confident that we're safe committing to something more limited. The good
side is that we can add the SWO-stuff lazily as needed...

Dunno, thoughts?

ssahasra · January 6, 2015, 8:31am

That really depends on what we want to do at the IR level. Scopes do not affect transformations that move non-atomic accesses around atomic accesses. The scope on the atomic access should not matter to the non-atomic accesses. The interesting case is when the compiler tries to optimize atomic accesses with respect to each other, and their scopes do not match. But it might be sufficient to leave such transformations to the target, since quite possibly, other target-specific information might be necessary to make them work or even to say whether they are beneficial. I agree. The most accurate description of the highest scope is “address space scope”, i.e., all threads that can access the address space being accessed. From that view, it does not matter if the threads are local, remote or situated on different devices, or such. It makes sense to not specify any keyword for this scope, and just say that “synchscope(0)” is default and need not be specified. Any other scope is an explicit optimization over a narrower set of threads. The HSAIL 1.0 provisional spec has the following scopes: workitem, wavefront, workgroup, component, system. A component is anything that supports the HSAIL instruction set and can execute commands dispatched to it. I am not an authority on this, but to me, it is conceivable that there could be other scopes later, analogous to things such as one “die” or one “chip” or one “board” or one node in a cloud. Just the thought of using strings in the IR smells like over-design to me. Going back to the original point, are target-independent optimizations on scoped atomic operations really so attractive? But while the topic is wide open, here’s another possibly whacky approach: we let the scopes be integers, and add a “scope layout” string similar to data-layout. The string encodes the ordering of the integers. If it is empty, then simple numerical comparisons are sufficient. Else the string spells out the exact ordering to be used. Any known current target will be happy with the first option. If some target inserts an intermediate scope in the future, then that version switches from empty to a fully specified string. The best part is that we don’t even need to do this right now, and only come up with a “scope layout” spec when we really hit the problem for some future target. Sameer.

Chandler_Carruth · January 7, 2015, 3:29am

I'm starting to think we might actually need to let the target define
acceptable strings for memory scopes and a strict weak ordering over
them.... That's really complex and heavy weight, but I'm not really
confident that we're safe committing to something more limited. The good
side is that we can add the SWO-stuff lazily as needed...

Dunno, thoughts?

Just the thought of using strings in the IR smells like over-design to me.
Going back to the original point, are target-independent optimizations on
scoped atomic operations really so attractive?

Essentially, I think target-independent optimizations are still attractive,
but we might want to just force them to go through an actual
target-implemented API to interpret the scopes rather than making the
interpretation work from first principles. I just worry that the targets
are going to be too different and we may fail to accurately predict future
targets' needs.

I think the "strings" can be made relatively clean.

What I'm imagining is something very much like the target-specific
attributes which are just strings and left to the target to interpret, but
are cleanly factored so that the strings are wrapped up in a nice opaque
attribute that is used as the sigil everywhere in the IR. We could do this
with metadata, and technically this fits the model of metadata if we make
the interpretation of the absence of metadata be "system". However, I'm
quite hesitant to rely on metadata here as it hasn't always ended up
working so well for us. ;]

I'd be interested in your thoughts and others' thoughts on how me might
encode an opaque string-based scope effectively. If we can find a
reasonably clean way of doing it, it seems like the best approach at this
point:

- It ensures we have no bitcode stability problems.
- It makes it easy to define a small number of IR-specified values like
system/crossthread/allthreads/whatever and singlethread, and doing so isn't
ever awkward due to any kind of baked-in ordering.
- In practice in the real world, every target is probably going to just
take this and map it to an enum that clearly spells out the rank for their
target, so I suspect it won't actually increase the complexity of things
much.

But while the topic is wide open, here's another possibly whacky approach:
we let the scopes be integers, and add a "scope layout" string similar to
data-layout. The string encodes the ordering of the integers. If it is
empty, then simple numerical comparisons are sufficient. Else the string
spells out the exact ordering to be used. Any known current target will be
happy with the first option. If some target inserts an intermediate scope
in the future, then that version switches from empty to a fully specified
string. The best part is that we don't even need to do this right now, and
only come up with a "scope layout" spec when we really hit the problem for
some future target.

This isn't a bad approach, but it seems even more complex. I think I'd
rather go with the fairly boring one where the IR just encodes enough data
for the target to answer queries about the relationship between scopes.

So, my current leaning is to try to figure out a reasonably clean way to
use strings, similar to the target-specific attributes.

ssahasra · January 7, 2015, 4:06am

If we have a target-implemented API, then just opaque numbers should also be sufficient, right? For the API, all we care about is queries that interesting optimizations will want answered from the target. This could be at the instruction level: “is it okay to remove this atomic store with scope n1 that is immediately followed by atomic store with scope n2?”. Or it could be at the scope level: “does scope n2 include scope n1”? Metadata was the first thing to be considered internally at AMD. But it was quickly shot down because the Research guys were unwilling to accept the possibility of scope being lost and replaced by a default “system” scope. Current models are useful only when all atomic accesses for a given location use the same scope throughout the application, i.e., all threads running on all agents. So it is not okay for the compiler to “promote” the scope in just one kernel unless it has access to the entire application; the result is undefined. This is true for OpenCL source as well as HSAIL target. This may change in the near furture: HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models But even then, it will be difficult to say if the same models can be applied to heterogeneous systems that don’t resemble OpenCL or HSAIL. I seem to be missing something here about the need for strings. If they are opaque anyway, and they are represented by sigils, then the sigils themselves are all that matter, right? Then the encoding is just a number… I am not really championing scope layout strings over a target-implemented API, but it seems less work to me rather than more. The relationship between scopes is just an SWO, and it can be represented as a graph. A practical target will have a very small number of scopes, say not more than 16. It should be possible to encode this into a graphviz-style string. Then instead of having every target implement an API, they just have to specify the relationship as a string. Sameer.

Chandler_Carruth · January 7, 2015, 4:12am

Essentially, I think target-independent optimizations are still
attractive, but we might want to just force them to go through an actual
target-implemented API to interpret the scopes rather than making the
interpretation work from first principles. I just worry that the targets
are going to be too different and we may fail to accurately predict future
targets' needs.

If we have a target-implemented API, then just opaque numbers should also
be sufficient, right? For the API, all we care about is queries that
interesting optimizations will want answered from the target. This could be
at the instruction level: "is it okay to remove this atomic store with
scope n1 that is immediately followed by atomic store with scope n2?". Or
it could be at the scope level: "does scope n2 include scope n1"?

I think it is significantly more friendly (and easier to debug mistakes) if
the textual IR uses human readable names. We already have a hard time due
to the totally opaque nature of address spaces -- there are magical address
spaces for segment stuff in x86.

The strings are only opaque to the target-independent optimizer. While
integers and strings are equally friendly to the code in the target,
strings are significantly more friendly to humans reading the IR.

The other advantage is that it makes it much harder to accidentally write
code that relies on the particular values for the integers. =]

I think the "strings" can be made relatively clean.

What I'm imagining is something very much like the target-specific
attributes which are just strings and left to the target to interpret, but
are cleanly factored so that the strings are wrapped up in a nice opaque
attribute that is used as the sigil everywhere in the IR. We could do this
with metadata, and technically this fits the model of metadata if we make
the interpretation of the absence of metadata be "system". However, I'm
quite hesitant to rely on metadata here as it hasn't always ended up
working so well for us. ;]

Metadata was the first thing to be considered internally at AMD. But it
was quickly shot down because the Research guys were unwilling to accept
the possibility of scope being lost and replaced by a default "system"
scope. Current models are useful only when all atomic accesses for a given
location use the same scope throughout the application, i.e., all threads
running on all agents. So it is not okay for the compiler to "promote" the
scope in just one kernel unless it has access to the entire application;
the result is undefined. This is true for OpenCL source as well as HSAIL
target. This may change in the near furture:

HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous
memory models
http://benedictgaster.org/?page_id=278

But even then, it will be difficult to say if the same models can be
applied to heterogeneous systems that don't resemble OpenCL or HSAIL.

Yea, I'm not really surprised by this.

I'd be interested in your thoughts and others' thoughts on how me might
encode an opaque string-based scope effectively. If we can find a
reasonably clean way of doing it, it seems like the best approach at this
point:

- It ensures we have no bitcode stability problems.
- It makes it easy to define a small number of IR-specified values like
system/crossthread/allthreads/whatever and singlethread, and doing so isn't
ever awkward due to any kind of baked-in ordering.
- In practice in the real world, every target is probably going to just
take this and map it to an enum that clearly spells out the rank for their
target, so I suspect it won't actually increase the complexity of things
much.

I seem to be missing something here about the need for strings. If they
are opaque anyway, and they are represented by sigils, then the sigils
themselves are all that matter, right? Then the encoding is just a number...

See above for why I'd prefer not to use a raw number in the IR.

But while the topic is wide open, here's another possibly whacky
approach: we let the scopes be integers, and add a "scope layout" string
similar to data-layout. The string encodes the ordering of the integers. If
it is empty, then simple numerical comparisons are sufficient. Else the
string spells out the exact ordering to be used. Any known current target
will be happy with the first option. If some target inserts an intermediate
scope in the future, then that version switches from empty to a fully
specified string. The best part is that we don't even need to do this right
now, and only come up with a "scope layout" spec when we really hit the
problem for some future target.

This isn't a bad approach, but it seems even more complex. I think I'd
rather go with the fairly boring one where the IR just encodes enough data
for the target to answer queries about the relationship between scopes.

I am not really championing scope layout strings over a target-implemented
API, but it seems less work to me rather than more. The relationship
between scopes is just an SWO, and it can be represented as a graph. A
practical target will have a very small number of scopes, say not more than
16. It should be possible to encode this into a graphviz-style string. Then
instead of having every target implement an API, they just have to specify
the relationship as a string.

I see where you're going here, and it sounds feasible, but it honestly
seems much *more* work and certainly more complex for the IR. We can always
add such a representation to communicate the relationships if it becomes
important, but I'd rather communicate via a boring target API to start with
I think.

mehdi_amini · January 7, 2015, 4:17am

If we have a target-implemented API, then just opaque numbers should also be sufficient, right? For the API, all we care about is queries that interesting optimizations will want answered from the target. This could be at the instruction level: “is it okay to remove this atomic store with scope n1 that is immediately followed by atomic store with scope n2?”. Or it could be at the scope level: “does scope n2 include scope n1”? Metadata was the first thing to be considered internally at AMD. But it was quickly shot down because the Research guys were unwilling to accept the possibility of scope being lost and replaced by a default “system” scope. Current models are useful only when all atomic accesses for a given location use the same scope throughout the application, i.e., all threads running on all agents. So it is not okay for the compiler to “promote” the scope in just one kernel unless it has access to the entire application; the result is undefined. This is true for OpenCL source as well as HSAIL target. This may change in the near furture: HRF-Relaxed: Adapting HRF to the complexities of industrial heterogeneous memory models But even then, it will be difficult to say if the same models can be applied to heterogeneous systems that don’t resemble OpenCL or HSAIL. I seem to be missing something here about the need for strings. If they are opaque anyway, and they are represented by sigils, then the sigils themselves are all that matter, right? Then the encoding is just a number…

Don’t the strings answer your previous concern:

. But now I see another potential problem with future bitcode if we require an ordering on the scopes. What happens when a backend later introduces a new scope that goes into the middle of the order?

Note: the backend can just convert the string into integer once. The string are really useful only for serialization IIUC.

But while the topic is wide open, here’s another possibly whacky approach: we let the scopes be integers, and add a “scope layout” string similar to data-layout. The string encodes the ordering of the integers. If it is empty, then simple numerical comparisons are sufficient. Else the string spells out the exact ordering to be used. Any known current target will be happy with the first option. If some target inserts an intermediate scope in the future, then that version switches from empty to a fully specified string. The best part is that we don’t even need to do this right now, and only come up with a “scope layout” spec when we really hit the problem for some future target.

This isn’t a bad approach, but it seems even more complex. I think I’d rather go with the fairly boring one where the IR just encodes enough data for the target to answer queries about the relationship between scopes.

I am not really championing scope layout strings over a target-implemented API, but it seems less work to me rather than more. The relationship between scopes is just an SWO, and it can be represented as a graph. A practical target will have a very small number of scopes, say not more than 16. It should be possible to encode this into a graphviz-style string. Then instead of having every target implement an API, they just have to specify the relationship as a string.

So basically you are replacing an API by a custom language in a string. Isn’t such a string carrying an API by itself?

resistor · January 7, 2015, 4:38pm

I want to point out that “address space” is not sufficient for the highest scope. It’s entirely possible to have a host and and an accelerator that do not have shared address spaces, but do need to communicate, particularly in job management code where the accelerator talks directly to the host-side driver.

—Owen

ssahasra · January 8, 2015, 4:03am

Here’s what this looks like to me:

Chandler_Carruth · January 8, 2015, 10:44pm

Here's what this looks like to me:

   1. LLVM text format will use string symbols for memory scopes, and not
   numbers. The set of strings is target defined, but "singlethread" and
   "system" are reserved and have a well-known meaning.

    2. "The keyword informally known as system" represents the set of all
   threads that could possibly synchronize on the location being accessed
   by the current atomic instruction. These threads could be local, remote,
   executing on different agents, or whatever else is admissible on that
   particular platform. We still need to agree on the keyword to be used.

    3. The bitcode will store memory scopes as unsigned integers, since
   that is the easiest way to maintain compatibility. The values 0 and 1 are
   special. All other values are meaningful only within that bc file. The file
   will also provide a map from unsigned integers to string symbols which
   should be used to interpret all the non-standard integers.
      1. The map must not include 0 and 1, since the reader will
      internally map them to singlethread" and "system" respectively.
      2. If the map is empty or non-existent, then all non-zero values
      will be mapped to "system", which is the current behaviour.

       4. The in-memory structure for an atomic instruction will
   represent memory scope as a reference to a uniqued strings. This
   eliminates any notion of performing arithmetic on the scope indicator,
   or to write code that is sensitive to its numerical value.

    5. Behaviour is undefined if a symbolic scope used in the IR is not
   supported by the target. This is true for "singlethread" and "system" also,
   since some targets may not have those scopes.

Is this correct?

Generally, yes.

Regarding the specific way of using strings, see the email I just sent
about metadata being used poorly, this is a place where we might use
metadata to encode the string, or we might do something more direct as you
propose. I think it would be good to do something like what I propose in
that thread to have nicely uniqued opaque string entities in the IR, and
then use them here for marking scopes.

But how does this work in the SelectionDAG? Also, what will this look like
in TableGen files?

Not sure what you mean here? I haven't looked at implementing it, but from
the DAG down you should get to collapse this rapidly toward target-specific
nodes / structures / representations?

ssahasra · January 9, 2015, 4:48am

So I see the following large chunks of work that are difficult to isolate: All this work, just to ensure readability in the LLVM text format. I am not entirely convinced that it is worth it, when opaque integers can get the job done just like address spaces … are you? Sorry, that was just me being lazy and not reading up on the SelectionDAG. I expect that we will translate the the symbols into integer TargetConstant SDNodes … do you see the need to be more flexible than that? Sameer.

ssahasra · January 14, 2015, 6:27am

Ping! We need to close on whether everyone is convinced that symbolic memory scopes have a significant advantage over opaque numbers. Either of them will be examined by optimizations using a target-implemented API. I personally don’t think that readability in the LLVM text format is worth the effort, especially given that address spaces work well enough with opaque numbers.

Sameer.

Chandler_Carruth · January 14, 2015, 6:33am

I am much more comfortable with symbolic memory scopes. The reason I feel
this way is actually because there *is* a particular ordering of them that
the target will mandate. Having an ordering but having it *not* be the
order of the numbers used seems too actively confusing to me.

ssahasra · January 14, 2015, 6:42am

All that is true about address spaces too. On some platforms, address spaces could have a subset relationship, but it would be wrong to infer that from the numerical value. Isn’t it enough to say that the number is opaque and should not be interpreted via any comparison? I do see your ponit, though. But now the task got much bigger and will have to reexamine the time required. I suppose it starts with bitcode reader that can interpret existing bitcode files and translate the scopes to symbols instead. Sameer.

Chandler_Carruth · January 14, 2015, 7:21am

Ping! We need to close on whether everyone is convinced that symbolic
memory scopes have a significant advantage over opaque numbers. Either of
them will be examined by optimizations using a target-implemented API. I
personally don't think that readability in the LLVM text format is worth
the effort, especially given that address spaces work well enough with
opaque numbers.

I am much more comfortable with symbolic memory scopes. The reason I
feel this way is actually because there *is* a particular ordering of them
that the target will mandate. Having an ordering but having it *not* be the
order of the numbers used seems too actively confusing to me.

All that is true about address spaces too. On some platforms, address
spaces could have a subset relationship, but it would be wrong to infer
that from the numerical value. Isn't it enough to say that the number is
opaque and should not be interpreted via any comparison?

My understanding is that there is much less of this. I also wasn't heavily
involved in the address space design, and that design also has to cope with
more entrenched legacy in other systems and interfaces. Not sure how much
it makes sense to base the design on that.

I do see your ponit, though. But now the task got much bigger and will
have to reexamine the time required. I suppose it starts with bitcode
reader that can interpret existing bitcode files and translate the scopes
to symbols instead.

I"m sympathetic here. We really should have an easy way of encoding this
kind of thing. You might be able to use "metadata" in the same way that
@llvm.read_register does, where it is not really metadata in the
traditional sense and can't be "stripped" or "discarded" in any way. As I
wrote in my email, I think this should be replaced by something which more
formally models this idea, but I don't think this work should be held
hostage waiting for that better system to arrive. However, I also don't
think this better system of synthetic constant strings is very hard to
build if your interested, and it would serve a lot of use cases outside of
synchronization scopes.

ssahasra · January 14, 2015, 8:00am

That makes sense. I will start with what’s currently available. As far as a better string is concerned, scopes will be yet another misuse of metadata to be fixed in that separate project. Sameer.

Topic		Replies	Views
memory scopes in atomic instructions LLVM Dev List Archives	14	152	February 21, 2015
Memory scope proposal LLVM Dev List Archives	39	217	October 7, 2016
[RFC] Support synchronisation scope in Clang atomic builtin functions Clang Frontend	7	151	January 19, 2017
Atomic Operation and Synchronization Proposal v2 LLVM Dev List Archives	10	232	July 13, 2007
Proposal for atomic and synchronization instructions LLVM Dev List Archives	21	235	July 10, 2007

[RFC][PATCH][OPENCL] synchronization scopes redux

Related topics