There is a basic linalg matmul op with m=512, n= 512, k=512.
transform.sequence failures(propagate) {
^bb0(%arg1: !pdl.operation):
%0 = transform.structured.match ops{[“linalg.matmul”]} in %arg1 : (!pdl.operation) → !pdl.operation
%1, %loops:3 = transform.structured.tile %0 [16, 16, 4] : (!pdl.operation) → (!pdl.operation, !pdl.operation, !pdl.operation, !pdl.operation)
}
func.func @tile_linalg_matmul(
%arg0: tensor<128x128xf32>, %arg1: tensor<128x128xf32>, %arg2: tensor<128x128xf32>)
→ tensor<128x128xf32> {
%0 = linalg.matmul ins(%arg0, %arg1: tensor<128x128xf32>, tensor<128x128xf32>)
outs(%arg2: tensor<128x128xf32>)
→ tensor<128x128xf32>
return %0 : tensor<128x128xf32>
}
Then it will be tiled to block size with m=16, n= 16, k=4 using cmd “./mlir-opt linalg_struct.mlir -test-transform-dialect-interpreter -split-input-file --verify-diagnostics”
Why is the computation result of the block not accumulated but directly written back to the result matrix?
Linalg’s TileUsingForOp is used to just tile a op. PartialReductionOpInterface in Linalg is used for tile op like matmul. It is clear. But using matmul within TileUsingForOp could potentially lead to confusion.
Thx !