Semantics modeling: Undefined Behavior and Side Effects

I’d like to raise again a topic that we haven’t really solved or addressed which is related to Undefined Behavior in MLIR, and its potential relationship to the notion of “side effects”.

Recap

As a recap, right now MLIR offers a fairly rich interface mechanism to model “side effects” through an extensible concept of “resource”, “effect”, and an OpInterface to describe what kind of “effects” an operation has on a particular “resource”.
This is used in particular to model the memory effects, which is the common thing many people think about when we refer to “side effects”.

This brings me to the notion of “NoSideEffect”, which is current defined as def NoSideEffect : MemoryEffects<[]>; – basically the absence of any MemoryEffects! This seems (unfortunately?) orthogonal to non-memory side effects.

Before we had interfaces, we were making more uses of “traits” on operation, and NoSideEffect was a trait.
Depending on the place, this was documented as:

/// This bit is set for operations that have no side effects: that means that
/// they do not read or write memory, or access any hidden state.
/// This trait indicates that an operation never has side effects.
This trait signifies that the operation is pure and has no visible side effects.

The last one in particular refers to “pure” and has been known to bring confusion with respect to how to grasp with Undefined Behavior.

We had a previous (long) discussion about how to handle operations with some preconditions, in particular if an op is considered exhibiting Undefined Behavior when these preconditions aren’t met.

How to move forward?

One possible vision is to considered that “undefined behavior” is a side-effect (in fact Sanjoy suggested in the first answer in the thread above that UB “is every possible side effect”).
On the other hand, this definition is unfortunate: it does not allow to model the side-effects of an operation in a program without undefined behavior.

I’d also bring up that if we go back to first principles, according to Wikipedia, side effect is (emphasis of mine) an “observable effect besides returning a value (the intended effect) to the invoker of the operation”. However by definition Undefined Behavior can’t be “observable”.

One practical issue (that is shared with LLLVM) is how to handle speculation:

int div(int cond, int x, int y)
  if (cond)
    return x / y;
  return 0;
}

In this function, we can’t do

int div(int cond, int x, int y)
  // x / y is undefined if y == 0 or x == INT_MIN and y == -1
  int div = x / y;
  if (cond) return div;
  return 0;
}

I suspect we don’t want something like integer division to be considered as “having side-effect” because it might exhibit undefined behavior.

In the previous thread, I referred to LLVM handling of this: there is a specific handling of the ability to speculate an operation with llvm::isSafeToSpeculativelyExecute.

I’d favor to try to look in this direction instead. One simple way would be to bring back a trait (SafeToSpeculate), but that might be a bit too much all-or-nothing of a flag. So an alternative would be to use an interface with a query that is more contextual, as a strawman it could look like:

// Return true is the current operation can be safely speculated to the given position.
// The provided iterator must dominate the current position for the operation.
bool isSafeToSpeculate(Block::iterator position);

This would allow operation to implement some more complex logic to determine when it is / isn’t safe to speculate.

I’m interested to hear about alternative modeling or other approaches that have been employed to tame UB situations in MLIR?

4 Likes

Thanks, Mehdi for bringing this up. Getting some clarity on this would help resolve some discussion loops that I’ve observed happening for quite a while, particular with respect to otherwise value semantic operations which have UB in certain cases of illegal inputs.

In these cases, when the inputs are ultimately user specified, we have various approaches to inserting guards and need to make sure when the operations are ordered for scheduling, that the guards cannot be re-ordered after the ops. I suspect that getting clarity on your simple case of integer divide-by-zero in the presence of operations with observable behavior would settle all of these cases.

1 Like

Thanks for raising this Mehdi, there are many different possible ways to tackle this. My opinion is shaped by my experiences, but there may be better ways to go, here’s are some random thoughts:

The most important thing to me is to be based on practical constraints. In practice, very few analyses will want to deal with things like detailed memory effects - the cost and complexity of working with it will mean that the passes who do are specialist. These passes are important, complicated, and useful to share across dialects. This (to me) means that effects should be optimized for how those specific clients (e.g. redundancy elimination and dead effect elimination) will end up using them.

The next level down are things like “readnone” and “readonly”. These are coarse grained effects that a lot more passes will care about, capture many useful operations in practice, and are more computationally efficient to handle as well. These have worked well in the LLVM world, are easy to describe, use, and adopt in dialects, and I don’t see any concern with supporting this. Simple things like CSE and DCE benefit from this, and making these passes trivial and cheap is important.

Your point about “having side effects in the case of UB” is the most interesting point to me. UB really can’t be an effect - operations are defined on certain inputs and have certain behavior under those inputs. Inputs that are out of range are not practical to model. If you go beyond divides to loads, then of course you want loads to be undefined on invalid inputs, because otherwise you couldn’t do alias modeling, would have to treat all of them as trapping when unaligned, etc - this wouldn’t be /useful/ even though you could do it.

Your point about speculation is also interesting - having no side effect is not actually enough to hoist an operation - you have to have a cost model, and UB relates as much to that as anything else.

In the end, the LLVM approach has a bunch of flaws, and I agree that isSafeToSpeculativelyExecute isn’t great. LLVM doesn’t have the right thing IMO for this (because of its closed design etc, it doesn’t handle intrinsics well in general), but something closer to CodeMetrics would be better here. The key is that you want an OpInterface that says “what is the COST if I speculate this” and the answer is “no” for anything that could expose UB.

In the LLVM case, you’d really want to say “what is the cost of speculating a bunch of stuff” vs “the cost of a branch”. The former is mostly target independent, the latter is target specific. LLVM also makes the mistake of doing these sorts of “optimizations” at the mid-level, when they should really only be done at codegen level.

With hindsight, the mid-level optimizer in LLVM should be focused on canonicalization, not “optimization”. Turning a block with some PHI nodes into a select is ok, but speculating something (even a single integer add) is trouble.

-Chris

I also had questions while looking into the definitions of operations in the TOSA dialect.
There were two unclear cases about the validity of tosa.add’s NoSideEffect:
(1) When its operands are dynamically shaped tensors whose sizes mismatch (probably related to the thread).
(2) One of its operands is linalg.init_tensor; if using uninitialized elements is UB, tosa.add isn’t speculatable in general.
I couldn’t find good answers for these yet.

If the primary goal of NoSideEffect was to model read-none, and supporting speculation of operations is important in MLIR as well, making ‘NoSideEffect = read-none’ explicit and introducing a new trait is necessary IMHO.

I commented on this in previous threads and just wanted to leave a note of support for adding an interface to model speculation/cost. That would solve the reordering issue in the presence of control flow.

This is yes another question. The speculation modeling would disallow moving operations with UB across control flow but it might still allow reordering such operations across operations with observable effects, as the operation itself still does not have an effect. And for operations that just affect memory, this is what I would expect. I want to be able to move meta data computations across loads, for example.

We might consider adding another effect to the effects system to model that an operation can impact control flow (e.g. it asserts or throws). That would complicate the analysis for speculation, though.

Thanks Mehdi for starting this conversation! As a frontend author that has to bridge between ops like “torch.mul” that are guaranteed to safely abort the program and linalg ops that have UB, my biggest desire from this is to have a sound system that can prevent hoisting ops with UB past “abort”-like ops that I insert to guarantee a safe exit of the program (no nasal demons).

One other random thing - the OpInterface should be passed /instances/ of an operation, not just the class name. You often can tell more about a specific instance being speculatable than the classification. For example, not all loads can be speculated, but loads from a non-weak global variable can be. This is a reason that a “speculatable” trait is not super helpful.

I like the cost model approach. It would seem to know exactly the cost/whether there would be UB, one has to know the exact interaction of ops. But I don’t think that can be avoided. A couple of things I’m wondering about

  1. What is the cost if already in UB state? The cost is the delta from the current state cost wise, we don’t want to have a pass mindlessly create UB, but what if the input to the pass already has UB. One option would be 0 cost, you can speculate freely as your input already is UB, the other extreme would be “no”, while we could just ignore the existing UB state and query (the last option seems appealing, but not sure if there is a caveat there).

  2. Would every op which hasn’t overridden the cost model be a barrier?

  3. Should generic analysis also be accessible from the interface? Basically thinking about range analysis and others that could be queried to give more information about valid states at a point (e.g., all call sites of this function has verified a value is not zero, so inside we can divide by it freely). Alternative would be specialized passes that take advantage of the info or perhaps some standard ops reified into, and then the generic interface would just be more conservative.

For hoisting/speculating, the cost should be infinite. UB operations are unreachable, and you don’t want to hoist an unreachable/UB operation into a code path that may be executable. You don’t want to turn:

  if (cond) { 
    a = div X, 0
    ...
  }

into:

  a = div X, 0
  if (cond) { 
  }

instead, you want to canonicalize div X, 0 into unreachable and have a CFG cleanup sort of pass nuke the impossible branch.

Note that this is only true for “real UB”. Operations (e.g. in the shape dialect) that are assertions which semantically trap on invalid input are not UB, they are defined to trap and therefore have side effects.

Would every op which hasn’t overridden the cost model be a barrier?

Right - Does it ever make sense to speculate something that might be incredibly expensive at runtime? Speculation is all about a tradeoff, and if you can’t measure the tradeoff, the principle of “do no harm” should come into effect.

Should generic analysis also be accessible from the interface?

I don’t know, but I’d start out with the minimal solution to the known part of the problem; get experience with it; then expand it over time. Many problems in IR design have come up because of well-meaning proactive extension to IR constructs and infra.

-Chris

The constraint operations in the shape dialect do not have undefined behavior and I don’t think anyone has argued they should have. They do not currently have side-effects because we want to be free to reorder them. They are kept alive by the uses of their produced witness value but otherwise can move around freely. They cannot be speculated on, though, because they eventually turn into asserts that are side effecting (terminate) and we do not want to see those side effects on control flow paths that do not actually trigger the guarded operation.

We could consider making them side-effecting and have moving constraints as a dedicated pass.

Ok, I don’t have any experience working with them, but lying to the infra about them not having side effects is very likely to cause problems for someone at some point :-). If the dialect is only intended to work with a closed set of passes, it is probably fine though.

Simple question: is it ok to delete one of these things when unused? Making an error go away, seems like it would could cause incorrect programs to go undiagnosed. This would lead to predictability problems.

3 Likes

Hi - I think this is related to some soundness issues in tensor.dim and memref.dim - I started a topic on that here (before I saw this) and I’d like to get your take on the solution proposed.