I noticed that the tensor.dim and memref.dim invoke undefined behavior if the dimension index is out of bounds. I believe this means their current NoSideEffect trait (link, link) is incorrect and may cause miscompiles e.g. by incorrectly hoisting them out of control flow.
I propose the following change to address this; please let me know if you have any concerns or see a better path forward:
We introduce a NoObservableSideEffect trait for operations that don’t have side effects except for invoking UB on certain input values. arith.divui, tensor.dim, memref.dim, memref.load will all have this trait. NoObservableSideEffect operations can be sunk and CSE’ed if they don’t read memory, and can be DCEd with any further checks.
We introduce an IsPotentiallySideEffectFree OpInterface with an isSideEffectFree method. This method will return true iff that op has no side effects after inspecting the op’s inputs and attributes. tensor.dim and memref.dim will override this method to return true if the source is a tensor with known rank and the index is a constant less than the rank. A similar check could be written for arith.divui.
[edit: given feedback below I now suggest renaming NoObservableSideEffect to MayHaveUndefinedBehavior.]
These changes should make our optimizations sound without surrendering optimization potential except in cases we really have to.
For instance, this will block us from hoisting the tensor.dim out of the scf.if below since it could introduce UB in an UB-free program
Could you explain why OOB tensor.dim and memref.dim should cause immediate UB? I’m not familiar with how these get lowered so it’s not obvious to me why these should cause immediate UB instead of producing a something like a poison value (which doesn’t exist in MLIR yet…).
Immediate UB on division / OOB load/store make sense to me.
I would think %dim_idx dynamically being more than %tensors rank is explicitly undefined behavior (i.e. the program is wrong). I am not following why that needs a change to the semantics of tensor.dim (or memref.dim). These should be freely hoistable around any control flow as long as they satisfy SSA use-def chains.
and %cond is false, %dim_idx is 5 and %tensors rank is 2. I.e. the tensor.dim has undefined behavior (e.g. at runtime it segfaults). However, since it isn’t executed, there is nothing wrong with the program.
If I hoist the tensor.dim to outside the scf.if like so:
then I have introduced UB in the program, and the program can now crash. We turned a non-crashing program into a crashing program; this is a miscompile.
Having UB in “dead code” like above may seem artificial, but can be common with multi-versioning. E.g. consider a transform that multi-versions a dynamic rank operation into N versions with static rank:
if (t.rank() == 1) {
x = tensor.dim t, 0
...
} else if (t.rank() == 2) {
x = tensor.dim t, 0
y = tensor.dim t, 1
...
} else if (t.rank() == 3) {
x = tensor.dim t, 0
y = tensor.dim t, 1
z = tensor.dim t, 2
...
} else {
...
}
In this multi-versioned IR we cannot hoist out y = tensor.dim t, 1 because at runtime the tensor rank may be 1 and thus that tensor.dim operation may have UB.
Having both NoObservableSideEffect and NoSideEffect would be a bit confusing. I was currently under the impression that NoSideEffect already accounts for “no side effects being observable” (since as far as I am concerned, a side effect that I can’t observe does not exist). I suppose this also boils down to the question of whether undefined behaviour is observable.
Why is a new IsPotentiallySideEffectFree OpInterface required? One can already make ops conditionally have (no) side effects by manually implementing MemoryEffectsOpInterface instead of using the operand annotations. Its not really a proper modelling but eg. using both a MemWrite and MemRead of unspecified location would make the Op not side effect free anymore and server your purpose I believe.
Would the motivation of this RFC, hoisting NoSideEffect operations, be better served by an interface/trait that returns whether an operation may be specutatively executed/hoisted?
Yes; I’m counting UB as a non-observable side effect. Happy to pick a different name.
That’s going to cause problems & complications down the road IMO. For instance, I’d like us to be able to reorder tensor.dim with store operations without relying on alias analysis or going down the inaccessiblememonly path that LLVM took.
That’s kind of what the IsPotentiallySideEffectFree interface does, IsSpeculatable is perhaps a better name.
(1) also gives us the ability to sink tensor.dim and CSE, DCE more easily (i.e. without proving that it is UB free) so it is probably valuable to have.
I see your point. But tensor.dim (and memref.dim) are just simple query operations that are retrieving metadata. If those have side-effecting behavior, then that seems like a huge price to pay for metadata. Also NoObservableSideEffect is same as NoSideEffect? If the side-effect isnt observable, then how do you know there is a side effect?
In practical terms, the IsPotentiallySideEffectFree interface and associated solution solves many issues, but its a red flag for me that metadata querying instructions suddenly have side effects…
By NoObservableSideEffect I really just mean “may have undefined behavior sometimes”. Perhaps MayHaveUndefinedBehavior is a better name for it; it also matches the tensor.dim and memref.dim spec more closely.
Specifically on the topic of tensor.dim – I think it would make more sense to introduce a version of tensor.dim where the dimension number is an attribute, and statically checked against the rank of the operand (it would require ranked operands). Then leave the current tensor.dim for unranked use cases. Keeping the ranked and unranked(and unknown dtype) worlds air-gapped makes things much clearer and simpler in my experience.
I do think that clearer modeling of these side effects is important for TCP (and linalg). Years ago I asked the same question about how to model this but I don’t think we came up with a satisfactory way to model it.
CSE of TCP/Linalg type ops is basically an extension of fusion cost modeling so not something we want to do with the regular CSE pass anyway. If we don’t already, we should definitely have a “DCEIfNoUses” equivalent for TCP/Linalg-type ops.
Clearly identifying the speculatable and non-speculatable “NoSideEffect” ops would be a great improvement to MLIR, as the current behavior (e.g. for divui) is already prone to miscompiles. “no side effect, but may have immediate UB” would be a good way to model this – not sure about the best naming (NoObservableSideEffect is a little hard to follow). Doesn’t LLVM model this with Speculatable?
+1, the existing infra already has the ability to be conditionally side-effecting, so we should use that. Potentially the ergonomics could be improved.
The exact same can be said of arith.muli today (and as Sean mentioned division too). As overflow behavior is UB, hoisting an integer multiplication out of conditional guarding it would result in introducing UB. This feels like we end up marking all ops with UB as having effects and requireing all analysis to understand the specific UB an op has to be able to do this transformation (e.g., safe to hoist this integer multiply/dim op as the conditional it is in is unrelated to predicate guarding the body and the UB, so what was UB before hoisting is UB after vs introducing UB). That feels difficult to reason about in open set. Well except if we go very conservative and mostly treating UB trait as unknown side effects and only operating on pure ops.
If the concern is wrt guarding and accidental hoisting, why not have that be explicit? E.g., (on mobile so excuse typos)
if (foo) {
v_idx = assert_in_range(idk, range);
d = tensor.dim val, v_idx
...
}
We don’t even need multiple assert, just an “assert_may_assume” or some such which is variadic, is side-effecting and is just statement from higher level that uses here of it’s output has been guarded in some way by the conditional dominanting it. Potentially even better is to have a “guard” conditional which has more operands and can ensure that operands passed in are also not used directly inside region (avoid in accidental peephole straddling between verified and unverified). Then UB state before == UB state after hoisting, but no accidental introduction. That puts more burden on frontends indeed, but also gives them more flexibility to enable aggressive optimizations (“the trust me I’ve formally verified this, just optimize it assuming all is correct” section).
The alternative of speculative trait and constraining code motion across conditional execution regions to non-speculative ops seems also more concrete. It restricts specific classes of optimization rather than having to consider it in all places. Would be more conservative than the above, but easier to see domain it applies and what it constrains than a UB trait. The latter I’m not sure what I’d do more aggressively than with an op with unknown side effects.
Update after discussing with @zero9178 and @mehdi_amini – the following solution originally proposed by Mehdi works for us:
NoSideEffect is redefined to allow UB. This means NoSideEffect is necessary but not sufficient to speculate operations.
NoSideEffect operations can be DCEed and CSEed without further checks since they don’t have any user-visible effect.
We introduce a new op interface Speculatable with an isSpeculatable function that checks if the Operation is speculatable. The default implementation for this will return true unconditionally.
With these in place, tensor.dim can be marked NoSideEffect and Speculatable with isSpeculatable returning true if the index is known to be in-bounds. This should make things correct while still allowing many code movement optimizations.
Let me know if you see any concerns going ahead with this.
Answering some other points by @_sean_silva and @jpienaar below, which seem more long term to me:
I agree that there are probably representational changes we can make to tensor.dim to simplify the compile-time-constant rank cases. But, as you noted, we need to resolve these issues indepedently of that.
That’s slightly different IIUC. We’re trying to model two properties:
Can the operation be hoisted? This is modeled by Speculatable.
Does the operation have side effects other than UB? If not, the operation cannot be speculated but it can be DCEed and CSEed.
I think the approach above (suggested by MA) address this, but please let me know if I’m missing some subtleties.
I think this could work, but will require a broader design I think. On the other hand using the control flow + UB semantic is fairly well understood since LLVM has the same design.
So my preference would be to address the known unsoundness using the approach above, and reconsider the guard / explicit control dependence on a later date.
(I personally also have some reservations against using control dependence, but I’d prefer to not intermingle those with this discussion.)
Question: is Speculatable a static property of the operation or a dynamic one? Let me elaborate. Another use case for Speculatable would be to remove a mask from a masked vector operation when the operation is speculatable. For example, we can turn a masked arithmetic addition into an unmasked one since the mask-out lanes can execute speculatively. A more complex scenario is a masked vector load. In some targets, a vector load can safely read out-of-bounds under some circumstances. Would a memory load be dynamically speculatable? Would an integer division be speculatable if the compiler is able to prove that speculation is safe?
I have the impression that Speculatable should be a static property and we should deal with dynamic speculation on a case by case basis, by teaching the compiler to reason about the specific semantics of those ops.
I think one important definitional thing that is coming out of this, which was not clear from earlier discussions, is that UB is not a “side effect”. That is, we should distinguish between “required for correctness side effects” (which we call just “side effects” – e.g. printing to the console, writing to a memory location) and “the effects which execution of this operation could potentially cause” (which in the case of UB could be e.g. nasal demons, erase your hard drive).
Sanjoy, as part of committing this design to code, can we somehow codify this distinction somewhere?
This LGTM. This is roughly what we intended from the early stages, but never drove it (I thought I had some TODOs sprinkled in a few places for speculative execution checks, but I don’t see them anymore).
One question we need to answer as part of this: what is the behavior of conceptually infinite loops like affine.for %arg0 = 0 to INT_MAX step 2?
I see three possibilities:
It loops infinitely at runtime.
It is undefined behavior.
It is defined to be finite always (i.e. in the case above only iterate INT_MAX/2 times by generating slighly more complex code).
Based on a simple example I tried, it looks like today we either have (1) or (2). The generated LLVM IR inf-loops in a well defined way so that can be interpreted as (1) but it is also consistent with (2).
This means scf.for and affine.for cannot be speculated even if their attached regions don’t have UB or side effects – we’d additionally need to do a trip count analysis to make sure we’re not introducing UB or infinite loops in a program that didn’t have them. Wanted to check if this is what we want.
The trip count analysis can be simple for now. If the bounds & step are constant then it is trivial, and we also know that the loops have a finite trip count anytime the step is known to be 1.
Answering some other points below:
Makes sense, will do.
I was thinking we should put simple checks in the isSpeculatable op interface hook and leave more complex things to dedicated analyses. E.g. for division isSpeculatable could check if the RHS is a constant non-zero and for loads it could check if the pointer was is a global constant pointer. But it won’t e.g. do detailed pointer lifetime analysis. Following our original motivation, tensor.dim would return true if the input tensor is ranked and the index is a constant less than the rank.
I think answered this above, but please let me know if not. In other words, we’re leveraging the op interface functionality to teach the compiler to reason about the specific semantics in a scalable way.
As was noted by others, this was often discussed but never done. We concluded for a trait back then but an interface with some light-weight dynamic decisions is much better. Thanks for pushing this forward.