Unrealized_conversion_cast failing bufferization passes

I’m making use of MLIR bufferization passes as shown below:

// Bufferize
  addFilteredPassToPassManager(pm, mlir::createTensorConstantBufferizePass(),
                               options.enablePass);
  addFilteredPassToPassManager(pm, mlir::createStdBufferizePass(),
                               options.enablePass);
  addFilteredPassToPassManager(pm, mlir::createTensorBufferizePass(),
                               options.enablePass);
  addFilteredPassToPassManager(pm, mlir::createLinalgBufferizePass(),
                               options.enablePass);
  addFilteredPassToPassManager(pm, mlir::createConvertLinalgToLoopsPass(),
                               options.enablePass);
  addFilteredPassToPassManager(pm, mlir::createFuncBufferizePass(),
                               options.enablePass);
  addFilteredPassToPassManager(pm, mlir::createFinalizingBufferizePass(),
                               options.enablePass);

To try to lower some linalg.generic that contains an unrealized_conversion_cast. Below are two dummy examples, with and without unrealized_conversion_cast, the first one fail, while the second work

Example with unrealized_conversion_cast:

#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (0)>
func @main(%arg0: tensor<2xi64>, %arg1: tensor<2xi8>, %arg2: i64) -> i64 {
  %0 = tensor.from_elements %arg2 : tensor<1xi64>
  %1 = linalg.generic {indexing_maps = [#map0, #map0, #map1], iterator_types = ["reduction"]} ins(%arg0, %arg1 : tensor<2xi64>, tensor<2xi8>) outs(%0 : tensor<1xi64>) {
  ^bb0(%arg3: i64, %arg4: i8, %arg5: i64):  // no predecessors
    %c0_1 = constant 0 : index
    %fail = unrealized_conversion_cast %c0_1 : index to i64
    linalg.yield %fail : i64
  } -> tensor<1xi64>
  %c0 = constant 0 : index
  %2 = tensor.extract %1[%c0] : tensor<1xi64>
  return %2 : i64
}

Failing with /llvm-project/mlir/lib/Dialect/Linalg/Transforms/Loops.cpp:423: llvm::Optional<llvm::SmallVector<mlir::Operation*, 4> > linalgOpToLoopsImpl(mlir::PatternRewriter&, mlir::linalg::LinalgOp) [with LoopTy = mlir::scf::ForOp]: Assertion `linalgOp.hasBufferSemantics() && "expected linalg op with buffer semantics"' failed. Aborted (core dumped)

Example without unrealized_conversion_cast:

#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (0)>
func @main(%arg0: tensor<2xi64>, %arg1: tensor<2xi8>, %arg2: i64) -> i64 {
  %0 = tensor.from_elements %arg2 : tensor<1xi64>
  %1 = linalg.generic {indexing_maps = [#map0, #map0, #map1], iterator_types = ["reduction"]} ins(%arg0, %arg1 : tensor<2xi64>, tensor<2xi8>) outs(%0 : tensor<1xi64>) {
  ^bb0(%arg3: i64, %arg4: i8, %arg5: i64):  // no predecessors
    linalg.yield %arg3 : i64
  } -> tensor<1xi64>
  %c0 = constant 0 : index
  %2 = tensor.extract %1[%c0] : tensor<1xi64>
  return %2 : i64
}

Working fine!

Is there anything specific to unrealized_conversion_cast that’s making it fail? Other custom operations can be part of linalg.generic without this issue.

It would be interesting to see what the IR looks like right before mlir::createConvertLinalgToLoopsPass().

My suspicion is that the bufferization pattern for the linalg.generic fails. It is currently implemented in a way that re-creates all operations inside of the region, which requires the re-created operations to be legal in order for the pattern to successfully apply. However, currently only affine, math, memref, std and tensor are legal.

If this is true, the best way to fix it would be to move away from copying the body op by op and instead using the rewriter region cloning support. The simplest way to just move forward would be to change the legal dialects in LinalgBufferizePass.

Hey herhut,

Thanks for replying, that indeed exactly what is happening, the LinalgBufferize don’t do its jobs, that’s why the conversion to loops fails.
But what is strange, is that works for custom operators, indeed we tried to replace the unrealized_conversion_cast by a custom op like that:

#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (0)>
func @main(%arg0: tensor<2xi64>, %arg1: tensor<2xi8>, %arg2: i64) -> i64 {
  %0 = tensor.from_elements %arg2 : tensor<1xi64>
  %1 = linalg.generic {indexing_maps = [#map0, #map0, #map1], iterator_types = ["reduction"]} ins(%arg0, %arg1 : tensor<2xi64>, tensor<2xi8>) outs(%0 : tensor<1xi64>) {
  ^bb0(%arg3: i64, %arg4: i8, %arg5: i64):  // no predecessors
    %c0_1 = constant 0 : index
    %works = "!Custom.magic"(%c0_1) : (index) -> i64
    linalg.yield %fail : i64
  } -> tensor<1xi64>
  %c0 = constant 0 : index
  %2 = tensor.extract %1[%c0] : tensor<1xi64>
  return %2 : i64
}

From your explanation this shoud not works but it does, as you said the LinalgBufferizePass mark as legal only some dialects, so it should consider also the custom op as illegal, right?

Sorry I replying to fast, I should screw up with my tests… That doesn’t work as you expect with a custom op.

Thanks for trying. You could also use markUnknownOpDynamicallyLegal to mark all operations that are not explicitly marked illegal as legal. That is just a kludge but might unblock you.

But, as I said, the proper fix is to move to something using the rewriter. The bufferization of the scf.for operation has an example of how this can be done. See mlir/lib/Dialect/SCF/Transforms/StructuralTypeConversions.cpp.

Thanks for your help @herhut I think the problem is clear now. I wonder if supporting Builtin as a legal dialect (as unrealized_conversion_cast comes from there) is then something acceptable in this specific pass?

The IR before the LinalgToLoops pass:

#map1 = affine_map<(d0) -> (0)>
module  {
  func @main(%arg0: tensor<2xi64>, %arg1: tensor<2xi8>, %arg2: i64) -> i64 {
    %0 = memref.alloc() : memref<1xi64>
    %c0 = constant 0 : index
    memref.store %arg2, %0[%c0] : memref<1xi64>
    %1 = memref.tensor_load %0 : memref<1xi64>
    %2 = linalg.generic {indexing_maps = [#map0, #map0, #map1], iterator_types = ["reduction"]} ins(%arg0, %arg1 : tensor<2xi64>, tensor<2xi8>) outs(%1 : tensor<1xi64>) {
    ^bb0(%arg3: i64, %arg4: i8, %arg5: i64):  // no predecessors
      %c0_1 = constant 0 : index
      %5 = unrealized_conversion_cast %c0_1 : index to i64
      linalg.yield %5 : i64
    } -> tensor<1xi64>
    %c0_0 = constant 0 : index
    %3 = memref.buffer_cast %2 : memref<1xi64>
    %4 = memref.load %3[%c0_0] : memref<1xi64>
    return %4 : i64
  }
}

Why do you have an unrealized_conversion_cast there in the first place? Those are generally something that is part of type conversion during lowering and should fall away in some bounded amount of time by cancelling with another unrealized_conversion_cast. I wouldn’t want to be bufferizing with it still in there. It seems that should really be an std.index_cast that would then fold into just a constant i64

This is really a dummy example, a real example would involve custom ops and types which requires the use of unrealized_conversion_cast.

The way we solved it was by introducing a pass that removes all those unrealized_conversion_cast and simplify the types for the bufferization one

This should be fixed by ⚙ D109581 [mlir][linalg] Fix bufferize pattern to allow unknown operations in body of generic.

This bounded amount of time can be relatively long sometimes and we should not require those ops to not exist. For instance, one might have done the signed/signless conversion on the linalg op but not on the compute operations in its body.

But it does not matter for this bug anyway. linalg.generic bufferization just should support ops from other dialects.

3 Likes

@herhut Awesome, thank you!

As @youben says we fix on our side by a introduce pass that simplify our types and remove unrealized_conversion_cast earliest, but sure the fix you do is the right one as the linalg.generic I guess should not be aware of what is happening inside.

Thanks!

Fair. that makes sense. I think bounding that time ties in with PSA: run -reconcile-unrealized-casts after all -convert-*-to-llvm from now on

Just because it’s not clear from your examples, I hope you don’t mean a pass that operates on unrealized_conversion_cast directly and does something other than folding away cast A → B followed by cast B → A.

%1 = builtin.unrealized_conversion_cast %0 : i32 to index
%2 = builtin.unrealized_conversion_cast %1 : index to i32
%3 = "foo.op"(%2) : (i32) -> i32

%3 = "foo.op"(%0) : (i32) -> i32

because those are the only semantics that unrealized_conversion_cast has :slight_smile:

A concrete example is having C functions that can operate on parameters of different MLIR type, so we need to have a declaration that is generic func private @cfunc(generictype) -> generictype2, and then cast MLIR types that should be compatible with this function to the generictype before doing the CallOp.