I’m working on some experiments and I’d like to split the tiling in two separate ops. Basically, I want to apply one transform between the tiling of the first and the second loop. For example, let’s say we have:
%1, %2 = transform.structured.tile %0[32, 16] : (!pdl.operation) -> (!pdl.operation, !pdl.operation)
and I want to split it in two, something like:
%1, %2 = transform.structured.tile %0[32] : (!pdl.operation) -> (!pdl.operation, !pdl.operation)
// the transform I want to apply would be here
%3, %4 = transform.structured.tile %1[16] : (!pdl.operation) -> (!pdl.operation, !pdl.operation)
However, the output I get is different, which means that tiling the tiled loop is not equivalent to tiling once using two tile sizes. However, I don’t understand why is not equivalent, since conceptually it seems like it should be. Is there any way to split the TileOp in two and still get the same, equivalent code?
Complete Example
For the sake of completeness, here is a complete example using one TileOp:
#map0 = affine_map<(d0, d1) -> (d0, d1)>
module {
func.func @gemm(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %init: tensor<?x?xf32>) -> tensor<?x?xf32> {
%0 = linalg.generic {indexing_maps = [#map0, #map0, #map0], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%init : tensor<?x?xf32>) {
^bb0(%arg6 : f32, %arg7 : f32, %arg8 : f32):
%1 = arith.mulf %arg6, %arg7 : f32
linalg.yield %1 : f32
} -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}
transform.sequence failures(propagate) {
^bb0(%arg0: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} attributes {iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>]} in %arg0 : (!pdl.operation) -> !pdl.operation
%tiled_linalg_op, %loops:2 = transform.structured.tile %0[32, 16] : (!pdl.operation) -> (!pdl.operation, !pdl.operation, !pdl.operation)
}
}
and using two separate ops:
#map0 = affine_map<(d0, d1) -> (d0, d1)>
module {
func.func @gemm(%arg0: tensor<?x?xf32>, %arg1: tensor<?x?xf32>, %init: tensor<?x?xf32>) -> tensor<?x?xf32> {
%0 = linalg.generic {indexing_maps = [#map0, #map0, #map0], iterator_types = ["parallel", "parallel"]} ins(%arg0, %arg1 : tensor<?x?xf32>, tensor<?x?xf32>) outs(%init : tensor<?x?xf32>) {
^bb0(%arg6 : f32, %arg7 : f32, %arg8 : f32):
%1 = arith.mulf %arg6, %arg7 : f32
linalg.yield %1 : f32
} -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>
}
transform.sequence failures(propagate) {
^bb0(%arg0: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} attributes {iterator_types = [#linalg.iterator_type<parallel>, #linalg.iterator_type<parallel>]} in %arg0 : (!pdl.operation) -> !pdl.operation
%tiled_linalg_op, %loops = transform.structured.tile %0[32] : (!pdl.operation) -> (!pdl.operation, !pdl.operation)
%tiled_linalg_op2, %loops2 = transform.structured.tile %tiled_linalg_op[16] : (!pdl.operation) -> (!pdl.operation, !pdl.operation)
}
}