Hello,
I am having an issue in the linalg-bufferize pass for the tensor.insert_slice operation.
mlir-opt --linalg-bufferize <input file>
with the <input file> being
func @rank_reducing_insert_slice_canonicalize(%arg0 : tensor<?x?xf32>, %arg1 : index,
%arg2 : index, %arg3 : tensor<?x?x?xf32>) -> tensor<?x?x?xf32>
{
%c0 = arith.constant 0 : index
%c1 = arith.constant 1 : index
%c4 = arith.constant 4 : index
%0 = tensor.insert_slice %arg0 into %arg3[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : tensor<?x?xf32> into tensor<?x?x?xf32>
return %0 : tensor<?x?x?xf32>
}
would produce the following error.
/home/sflab/mlir-tv-testsuite/opts/canonicalize/canonicalize_00129.src.mlir:9:8: error: 'linalg.copy' op expected indexing_map #1 to have 2 dim(s) to match the number of loops
%0 = tensor.insert_slice %arg0 into %arg3[%c0, %arg1, %c1] [%c4, 1, %arg2] [%c1, %c1, %c1] : tensor<?x?xf32> into tensor<?x?x?xf32>
^
/home/sflab/mlir-tv-testsuite/opts/canonicalize/canonicalize_00129.src.mlir:9:8: note: see current operation: "linalg.copy"(%3, %12) ( {
^bb0(%arg4: f32, %arg5: f32): // no predecessors
"linalg.yield"(%arg4) : (f32) -> ()
}) : (memref<?x?xf32>, memref<?x1x?xf32, affine_map<(d0, d1, d2)[s0, s1, s2, s3] -> (d0 * s1 + s0 + d1 * s2 + d2 * s3)>>) -> ()
The test case is provided in llvm-project/mlir/test/Dialect/Tensor/canonicalize.mlir
I wanted to ask if there is an optimization pass prior to linalg-bufferize that replaces the tensor.insert_slice into other operations. It would help if I could know the specific optimization pass for replacing tensor.insert_slice, and maybe how it is optimized, to understand the semantics of tensor.insert_slice.
Thank you!