LLVM's optimizer combines stores to consecutive characters into a write of a single word. For instance, if I have char A[4] and I write some static value to each element, these writes would be combined into a single 32-bit word write. I found this thread from 2009 -- it seems like it wasn't possible then. Has anything changed since?
Neil
LLVM's optimizer combines stores to consecutive characters into a write of a single word. For instance, if I have char A[4] and I write some static value to each element, these writes would be combined into a single 32-bit word write. I found this thread <http://llvm.1065342.n5.nabble.com/disabling-combining-load-stores-in-optimizer-td37560.html> from 2009 -- it seems like it wasn't possible then. Has anything changed since?
Why do you want to disable this optimization?
-Tom
I’m writing a pass for some custom hardware — we’d like to split arrays across hardware elements; this doesn’t work if consecutive writes to characters get combined to a word.
I’m writing a pass for some custom hardware — we’d like to split arrays across hardware elements; this doesn’t work if consecutive writes to characters get combined to a word.
Do you have an LLVM IR example where this happens when you don't want
it to?
-Tom
This won’t happen with volatile load/store
-Matt
This won’t happen with volatile load/store
This is mostly true today, but AFAICT the LLVM memory model doesn’t actually offer this guarantee. It merely says that LLVM treats volatile like C / C++ treats volatile… which isn’t much of a guarantee because C / C++ volatile doesn’t normatively mean anything. Specifically, we cannot really honor this when volatile bitfields are used for which memory operations don’t exist:
struct {
volatile int a : 12;
volatile int b : 4;
} s;
As things stand, we haven’t promised that we won’t combine adjacent volatile stores, and C / C++ certainly allow us to do so. I don’t think it would be a good idea to do so, but we certainly could.
The lang ref promises that, "the backend should never ... merge
target-legal volatile load/store instructions". Which I've usually assumed
meant that it was promised? Just curious if you thought this left some
wiggle room for some optimizations, since my impression was that someone
wanted to let you know that it's dependable. Likewise, there's also the
clause that "The optimizers must not change the number of volatile
operations", which doesn't seem to explicitly forbid merging, but does seem
to me like it would make it difficult to find a case where it would be
profitable.
So, volatile’s been a fine solution — the issue is volatile pointers would perform the load every time; ideally memory accesses would be able to be cached. This is why I’ve been leaning towards disabling the part of instcombine that combines memory accesses instead of using volatile — there should be better performance.
The lang ref promises that, "the backend should never ... merge target-legal volatile load/store instructions". Which I've usually assumed meant that it was promised? Just curious if you thought this left some wiggle room for some optimizations, since my impression was that someone wanted to let you know that it's dependable. Likewise, there's also the clause that "The optimizers must not change the number of volatile operations", which doesn't seem to explicitly forbid merging, but does seem to me like it would make it difficult to find a case where it would be profitable.
“Should” and “target-legal” aren’t very good promises 
The “must not” bit is interesting, I hadn’t noticed it!
In any case, I agree that LLVM should do what it can to meet expectations w.r.t. volatile, and in this case avoid merging when it can (i.e. only merge if there are no corresponding memory operations).
Why do you want this?
The goal is to share arrays between multiple tiles on a manycore architecture by splitting arrays between tiles. With a DRF memory model, it makes sense to elide multiple loads to the same memory location between barriers.; IIRC the semantics for volatile don’t allow this eliding.
If there hasn’t been synchronization between to loads, then yes it makes sense to elide loads in a DRF memory model.
Indeed volatile loads cannot be elided.
But why is it desirable to avoid combining adjacent stores? If you’ve got DRF code then the combination can’t be observed.
But why is it desirable to avoid combining adjacent stores? If you’ve got DRF code then the combination can’t be observed.
It’s more that the consecutive stores would be going to different tiles. If multiple stores are combined in IR, I don’t think they’d be able to decoupled in IR, unless there’s a way to always determine which global object an arbitrary GEP is pointing to.
This won’t happen with volatile load/store
This is mostly true today, but AFAICT the LLVM memory model doesn’t actually offer this guarantee. It merely says that LLVM treats volatile like C / C++ treats volatile… which isn’t much of a guarantee because C / C++ volatile doesn’t normatively mean anything. Specifically, we cannot really honor this when volatile bitfields are used for which memory operations don’t exist:
struct {
volatile int a : 12;
volatile int b : 4;
} s;
As things stand, we haven’t promised that we won’t combine adjacent volatile stores, and C / C++ certainly allow us to do so. I don’t think it would be a good idea to do so, but we certainly could.
Is this really true? The writes to two volatile variables are sequenced and can’t be re-ordered since they can have side effects, and so merging adjacent volatile stores would break this.
This won’t happen with volatile load/store
This is mostly true today, but AFAICT the LLVM memory model doesn’t actually offer this guarantee. It merely says that LLVM treats volatile like C / C++ treats volatile… which isn’t much of a guarantee because C / C++ volatile doesn’t normatively mean anything. Specifically, we cannot really honor this when volatile bitfields are used for which memory operations don’t exist:
struct {
volatile int a : 12;
volatile int b : 4;
} s;
As things stand, we haven’t promised that we won’t combine adjacent volatile stores, and C / C++ certainly allow us to do so. I don’t think it would be a good idea to do so, but we certainly could.
Is this really true? The writes to two volatile variables are sequenced and can’t be re-ordered since they can have side effects, and so merging adjacent volatile stores would break this.
There’s further discussion of what we promise downthread. It’s definitely true of C / C++, a handwavy promise in LLVM LangRef, and the LLVM memory model just says “same as C / C++".
That does seem like a valid use case, albeit one that C / C++ and LLVM IR can’t really help you with today. Volatile has the limitations you’ve described, and relaxed atomics could be combined as you’d like to avoid (though they probably won’t be right now).
This is really a codegen problem. You can decompose the load/store however you like in the backend. InstCombine should still combine the loads as a canonicalization.
-Matt
IIRC it’s not strictly possible to determine what array a load/store is based on. I don’t believe the decomposition is always possible, as information is lost when accesses are combined.
Neil
I’m not sure what you mean by this. The type in memory doesn’t really mean anything, and no information is lost. You can still tell (for optimizations) some information about the underlying IR object from the MachineMemOperand, which is mostly used for alias analysis.
-Matt
Sorry, I meant that if I have a series of character loads that get combined into a single word load, I don’t know of a way to 1) know that the word load was originally comprised of character loads and 2) decompose the word load back into character loads.
Granted, if I have (1), (2) just a matter of inserting the right ops. I’ve been digging into MachineMemOperand — I’m not entirely sure what methods in this class can get me this information. Any pointers would be much appreciated.
Neil