[RFC] Undef/uninitialized values and control flow

I know there has already been a discussion about potentially adding an undef op to mlir.

I would like to revive the discussion as I’m running into a case that would require to represent uninitialized value.
We added software pipelining transformation on scf.for op some time back. This currently always peel out the prologue and epilogue part of the loop, meaning the pipelining convert the code from:

scf.for %iv = %lb to %ub step %step {
  a% = OpA : () -> typeA
  OpB %a : typeA -> ()
} 

to:

%a0 = OpA : () -> typeA
%a3 = scf.for %iv = %lb to %ubminus1 step %step
  iter_args(%a1 = %a0) -> typeA {
  %a2= opA : () -> typeA
  OpB %1 : typeA -> ()
  scf.yield %a2 : typeA
}
OpB %a3 : typeA -> ()

The disadvantage of this solution is that it duplicates the operations this may be good but in some cases it may generate a large amount of code for pipelines with many stages.

Ideally I would like to be able to not peel either the prologue or the epilogue.
For instance this would be the IR generated for the case where the epilogue is not peeled out:

%a0 = OpA : () -> typeA
%a3 = scf.for %iv = %lb to %ub step %step
  iter_args(%a1 = %a0) -> typeA {
  %cond = arith.cmpi lt, %iv, %ubminus1 : index
  %a4 = scf.if %cond -> (typeA) {
    %a2 = opA : () -> typeA
    scf.yield %a2 : typeA
  } scf.else {
    %u = scf.uninitialized : typeA
    scf.yield %u : typeA
  }
  OpB %1 : typeA -> ()
  scf.yield %a2 : typeA
}

In order to do this we need to be able to have the scf.if return an uninitialized value for the last iteration of the loop.

I understand that the problem with undef value is how they are defined.
One solution would be to add an scf.uninitialized value to deal with such cases where the dependency cannot be represented in SSA without representing the undef/uninitialized value. The scope of such op could be limited to representing values that dynamically can never be used.
An alternative would be to relax the scf.if to allow it to have a return value without having an else block but this sounds like it would only push the problem further.

Does anybody see an other way to represent such IR? Would the scf.uninitialized be well enough defined to be acceptable in some dialect?

FYI @dcaballe @cbate @mehdi_amini @nicolasvasilache

We discussed a bit undef in the context of tensor where we could always use a %undef = tensor.zeroes() somehow, but interestingly it seems here that you’re surfacing a need to generalize it to any type, even where there isn’t a possible valid “default value”.
We could define it as a “poison” value that can never be read dynamically as you propose, but this like have some ramifications that need some careful considerations (how does it affect speculations validity for example?).

We are implicitly kind of modeling undef values when we mask out some lanes in a vector load/store. It’s very likely that we need a more generic undef/poison model when we generalize masking to other ops.

The problem with scf.uninitialized is that you would have to lower it to something in MLIR when you lower the scf dialect so we would need something that goes beyond scf. Could you provide an example of a type that can’t be initialized to any random value? I’m trying to better understand this limitation.

I would certainly advise against adding an undef value. We are trying to get rid of undef from LLVM, so don’t make the same mistake. Undef values make a lot of optimizations unsound in surprising ways.
An undef instruction is fine though (equivalent to LLVM’s freeze poison). The disadvantage is that if it has multiple uses its value may have to be materialized. But I think it’s well worth it.

Exactly.

Correct, I do think we would want to be able to read the value in speculatively executed code. Would that be a problem for poison value?

yes good point.

This is a bit specific but for instance the GPU dialect has some special types that I would like to be able to use in the pipelined IR:

Thanks for pointing this out. In MLIR, Values can only be operations results or block arguments (cannot be constants, constant expr, etc…) so an Undef Value would not be possible. Are you suggesting defining an operation with the poison semantic then?

Ah ok, then you are good to go. Your undef is equivalent to LLVM’s freeze poison. It cannot take a different value on each use (like LLVM’s undef value).
You don’t seem to need poison in MLIR for now. So don’t introduce it :slight_smile:

I agree with the general guidance, but the LLVM dialect should directly reflect the LLVM IR design (for better or worse), so having an llvm.undef and llvm.poison op seems fine.

What about for higher level dialect? Are we okay adding and Undef op in a different dialect (should it be in arith dialect?) that can be lowered to llvm.undef or are there still concerns about the semantic?

If a new arith.undef is added into arith, I would vote for lowering it into %c = llvm.undef + llvm.freeze %c, not llvm.undef.
Otherwise, using arith.undef might invalidate existing transformations that might introduce undefined behavior.

Specifically, we have a transformation that would be invalidated if arith.undef is defined as llvm.undef.

Given %z = arith.select %cond, %x, %y, if evaluating %x and %y requires expensive operations, it is beneficial to explicitly turn the arith.select and relevant expressions into a conditional branch:

z = scf.if %cond -> (..) {
  %x = .... // (expensive op 1)
  scf.yield %x
} scf.else {
  %y = ... // (expensive op 2)
  scf.yield %y
}

However, if %cond is undef op, this introduces UB because branching on undef in LLVM is UB.

Our compiler has to execute this transformation in MLIR level (not LLVM) - we explicitly mark expensive arith.selects with a custom attribute in MLIR during codegen, and lowering the attribute into LLVM & correctly implementing this transformation in LLVM needs extra development cost.
Note that %x and %y can be tensor types as well, so things become complicated.

That makes sense and sounds inline with what @nlopes was suggesting.

Respectfully, I think you’re look at this the right way. You can’t define an operation in terms of its lowering: you need to define its semantics, and then decide the lowering from that. The semantics are much bigger question than just how the op gets lowered, it impacts folding and lots of other things.