[RFC] Undef/uninitialized values and control flow

ThomasRaoux · May 31, 2022, 6:37pm

I know there has already been a discussion about potentially adding an undef op to mlir.

I would like to revive the discussion as I’m running into a case that would require to represent uninitialized value.
We added software pipelining transformation on scf.for op some time back. This currently always peel out the prologue and epilogue part of the loop, meaning the pipelining convert the code from:

scf.for %iv = %lb to %ub step %step {
  a% = OpA : () -> typeA
  OpB %a : typeA -> ()
}

to:

%a0 = OpA : () -> typeA
%a3 = scf.for %iv = %lb to %ubminus1 step %step
  iter_args(%a1 = %a0) -> typeA {
  %a2= opA : () -> typeA
  OpB %1 : typeA -> ()
  scf.yield %a2 : typeA
}
OpB %a3 : typeA -> ()

The disadvantage of this solution is that it duplicates the operations this may be good but in some cases it may generate a large amount of code for pipelines with many stages.

Ideally I would like to be able to not peel either the prologue or the epilogue.
For instance this would be the IR generated for the case where the epilogue is not peeled out:

%a0 = OpA : () -> typeA
%a3 = scf.for %iv = %lb to %ub step %step
  iter_args(%a1 = %a0) -> typeA {
  %cond = arith.cmpi lt, %iv, %ubminus1 : index
  %a4 = scf.if %cond -> (typeA) {
    %a2 = opA : () -> typeA
    scf.yield %a2 : typeA
  } scf.else {
    %u = scf.uninitialized : typeA
    scf.yield %u : typeA
  }
  OpB %1 : typeA -> ()
  scf.yield %a2 : typeA
}

In order to do this we need to be able to have the scf.if return an uninitialized value for the last iteration of the loop.

I understand that the problem with undef value is how they are defined.
One solution would be to add an scf.uninitialized value to deal with such cases where the dependency cannot be represented in SSA without representing the undef/uninitialized value. The scope of such op could be limited to representing values that dynamically can never be used.
An alternative would be to relax the scf.if to allow it to have a return value without having an else block but this sounds like it would only push the problem further.

Does anybody see an other way to represent such IR? Would the scf.uninitialized be well enough defined to be acceptable in some dialect?

FYI @dcaballe @cbate @mehdi_amini @nicolasvasilache

mehdi_amini · May 31, 2022, 6:52pm

We discussed a bit undef in the context of tensor where we could always use a %undef = tensor.zeroes() somehow, but interestingly it seems here that you’re surfacing a need to generalize it to any type, even where there isn’t a possible valid “default value”.
We could define it as a “poison” value that can never be read dynamically as you propose, but this like have some ramifications that need some careful considerations (how does it affect speculations validity for example?).

dcaballe · May 31, 2022, 7:29pm

We are implicitly kind of modeling undef values when we mask out some lanes in a vector load/store. It’s very likely that we need a more generic undef/poison model when we generalize masking to other ops.

The problem with scf.uninitialized is that you would have to lower it to something in MLIR when you lower the scf dialect so we would need something that goes beyond scf. Could you provide an example of a type that can’t be initialized to any random value? I’m trying to better understand this limitation.

nlopes · May 31, 2022, 8:14pm

I would certainly advise against adding an undef value. We are trying to get rid of undef from LLVM, so don’t make the same mistake. Undef values make a lot of optimizations unsound in surprising ways.
An undef instruction is fine though (equivalent to LLVM’s freeze poison). The disadvantage is that if it has multiple uses its value may have to be materialized. But I think it’s well worth it.

ThomasRaoux · May 31, 2022, 8:21pm

Exactly.

Correct, I do think we would want to be able to read the value in speculatively executed code. Would that be a problem for poison value?

yes good point.

This is a bit specific but for instance the GPU dialect has some special types that I would like to be able to use in the pipelined IR:

github.com

llvm/llvm-project/blob/ce5ebf0b9113df8ccaec1bcfd6804fb16cdef69d/mlir/include/mlir/Dialect/GPU/GPUBase.td#L67


      
            let useDefaultTypePrinterParser = 1;
          }
          
          def GPU_AsyncToken : DialectType<
            GPU_Dialect, CPred<"$_self.isa<::mlir::gpu::AsyncTokenType>()">, "async token type">,
                       BuildableType<"mlir::gpu::AsyncTokenType::get($_builder.getContext())">;
          
          // Predicat to check if type is gpu::MMAMatrixType.
          def IsMMAMatrixTypePred : CPred<"$_self.isa<::mlir::gpu::MMAMatrixType>()">;
          
          def GPU_MMAMatrix : DialectType<
            GPU_Dialect, IsMMAMatrixTypePred, "MMAMatrix type">;
          
          class MMAMatrixOf<list<Type> allowedTypes> :
            ContainerType<AnyTypeOf<allowedTypes>, IsMMAMatrixTypePred,
            "$_self.cast<::mlir::gpu::MMAMatrixType>().getElementType()",
            "gpu.mma_matrix", "::mlir::gpu::MMAMatrixType">;
          
          def GPU_AsyncOpInterface : OpInterface<"AsyncOpInterface"> {
            let description = [{
              Interface for GPU operations that execute asynchronously on the device.

github.com

llvm/llvm-project/blob/ce5ebf0b9113df8ccaec1bcfd6804fb16cdef69d/mlir/include/mlir/Dialect/NVGPU/NVGPU.td#L40


      
              This `NVGPU` dialect provides a bridge between the target agnostic GPU and
              Vector dialects and the lower level LLVM IR based NVVM dialect. This allow
              representing PTX specific operations while using MLIR high level concepts
              like memref and 2-D vector.
            }];
          
            let useDefaultTypePrinterParser = 1;
          }
          
          /// Device-side synchronization token.
          def NVGPU_DeviceAsyncToken : DialectType<
            NVGPU_Dialect, CPred<"$_self.isa<::mlir::nvgpu::DeviceAsyncTokenType>()">,
             "device async token type">,
             BuildableType<
               "mlir::nvgpu::DeviceAsyncTokenType::get($_builder.getContext())">;
          
          
          //===----------------------------------------------------------------------===//
          // NVGPU Op definitions
          //===----------------------------------------------------------------------===//

ThomasRaoux · May 31, 2022, 8:23pm

Thanks for pointing this out. In MLIR, Values can only be operations results or block arguments (cannot be constants, constant expr, etc…) so an Undef Value would not be possible. Are you suggesting defining an operation with the poison semantic then?

nlopes · May 31, 2022, 9:31pm

Ah ok, then you are good to go. Your undef is equivalent to LLVM’s freeze poison. It cannot take a different value on each use (like LLVM’s undef value).
You don’t seem to need poison in MLIR for now. So don’t introduce it

clattner · June 1, 2022, 12:19am

I agree with the general guidance, but the LLVM dialect should directly reflect the LLVM IR design (for better or worse), so having an llvm.undef and llvm.poison op seems fine.

ThomasRaoux · June 1, 2022, 3:23am

What about for higher level dialect? Are we okay adding and Undef op in a different dialect (should it be in arith dialect?) that can be lowered to llvm.undef or are there still concerns about the semantic?

aqjune · June 1, 2022, 3:47am

If a new arith.undef is added into arith, I would vote for lowering it into %c = llvm.undef + llvm.freeze %c, not llvm.undef.
Otherwise, using arith.undef might invalidate existing transformations that might introduce undefined behavior.

Specifically, we have a transformation that would be invalidated if arith.undef is defined as llvm.undef.

Given %z = arith.select %cond, %x, %y, if evaluating %x and %y requires expensive operations, it is beneficial to explicitly turn the arith.select and relevant expressions into a conditional branch:

z = scf.if %cond -> (..) {
  %x = .... // (expensive op 1)
  scf.yield %x
} scf.else {
  %y = ... // (expensive op 2)
  scf.yield %y
}

However, if %cond is undef op, this introduces UB because branching on undef in LLVM is UB.

Our compiler has to execute this transformation in MLIR level (not LLVM) - we explicitly mark expensive arith.selects with a custom attribute in MLIR during codegen, and lowering the attribute into LLVM & correctly implementing this transformation in LLVM needs extra development cost.
Note that %x and %y can be tensor types as well, so things become complicated.

ThomasRaoux · June 1, 2022, 4:40am

That makes sense and sounds inline with what @nlopes was suggesting.

clattner · June 2, 2022, 1:59am

Respectfully, I think you’re look at this the right way. You can’t define an operation in terms of its lowering: you need to define its semantics, and then decide the lowering from that. The semantics are much bigger question than just how the op gets lowered, it impacts folding and lots of other things.

Topic		Replies	Views
Better modelling of pipeline loops MLIR	20	2304	March 20, 2024
Transform Dialect: how to use Transform-LoopUnrollOp and SCF-ForOp to implement Clang’s #pragma unroll MLIR	4	188	December 21, 2023
Failure to lower scf::IfOp that yields local values MLIR	8	743	June 17, 2021
Scf::for content failure when lowering to standard mlir MLIR mlir	2	127	February 3, 2024
The correct way to convert scf::ForOp's signature? MLIR	9	914	September 15, 2020

[RFC] Undef/uninitialized values and control flow

Related Topics