Lowering matrix multiplication with tiling failed due to the illegal bufferization op

I want to lower the following matrix multiplication program with the tiling by the transform dialect into the executable format (e.g. through LLVMIR)

module {
  func.func @matmul_tile(%arg0: tensor<1024x1024xf32>, 
                         %arg1: tensor<1024x1024xf32>, 
                         %arg2: tensor<1024x1024xf32>) {
    %0 = linalg.matmul ins(%arg0, %arg1: tensor<1024x1024xf32>, tensor<1024x1024xf32>)
                       outs(%arg2: tensor<1024x1024xf32>) -> tensor<1024x1024xf32>
    func.return
  }

  transform.sequence failures(propagate) {
  ^bb0(%arg0: !transform.any_op, %arg1: !transform.op<"linalg.matmul">):
    transform.structured.tile %arg1 [4, 4] : (!transform.op<"linalg.matmul">) -> (!transform.any_op, !transform.any_op, !transform.any_op)
  }
 
  func.func @main() -> i32 {
    %A = bufferization.alloc_tensor() : tensor<1024x1024xf32>
    %B = bufferization.alloc_tensor() : tensor<1024x1024xf32>
    %C = bufferization.alloc_tensor() : tensor<1024x1024xf32>  
    
    %c = arith.constant 1.0 : f32
      
    call @matmul_tile(%A, %B, %C) : (tensor<1024x1024xf32>, tensor<1024x1024xf32>, tensor<1024x1024xf32>) -> ()
    %r = arith.constant 42 : i32
    return %r : i32
  } 
}

The mlir-opt option to lower the program is here.

mlir-opt matmul_tile.mlir \
  -test-transform-dialect-interpreter="bind-first-extra-to-ops=linalg.matmul" \
  -test-transform-dialect-erase-schedule \
  -one-shot-bufferize \
  -linalg-bufferize \
  -tensor-bufferize \
  -func-bufferize \
  -convert-scf-to-cf \
  -convert-cf-to-llvm \
  -convert-linalg-to-loops \
  -convert-linalg-to-llvm \
  -convert-func-to-llvm \
  -finalize-memref-to-llvm   \
  -finalizing-bufferize -bufferization-bufferize  -convert-bufferization-to-memref

But I have got the the legarization error around bufferization op.

matmul_tile.mlir:4:26: error: failed to legalize operation 'bufferization.to_tensor' that was explicitly marked illegal
                         %arg2: tensor<1024x1024xf32>) {
                         ^
matmul_tile.mlir:4:26: note: see current operation: %35 = "bufferization.to_tensor"(%26) : (memref<1024x1024xf32>) -> tensor<1024x1024xf32>

I tried every options relating to the bufferization, but it did not work. What option do we need to successfully lower this program into LLVM dialect? Is it possible to make this program executable in the first place?

Hi @Lewuathe , one problem is that you are mixing one-shot bufferization with other partial bufferizations (linalg-bufferize). I suggest using only one-shot (-one-shot-bufferize="bufferize-function-boundaries "). Also, you need to return the result of your matmul in matmul_tile; otherwise, the matmul is a dead op and will get eliminated.

Try this: -test-transform-dialect-interpreter="bind-first-extra-to-ops=linalg.matmul" -test-transform-dialect-erase-schedule --one-shot-bufferize="bufferize-function-boundaries allow-return-allocs" --convert-linalg-to-loops --convert-scf-to-cf -lower-affine -expand-strided-metadata -lower-affine -convert-arith-to-llvm -finalize-memref-to-llvm -convert-func-to-llvm -reconcile-unrealized-casts you should get MLIR LLVM IR.

1 Like

Thank you so much! It worked as expected to turn this given IR into LLVM IR.

Side note, I also added a test transform to lower to llvm to allow removing some of that pass pipeline boilerplate mess from invocations.

If that is more generally useful/ if people are repeatedly bitten by the mlir-opt incantation complexity, we could move it to a more accessible place.