I would like to perform multi-level tiling with --affine-loop-tile pass, but I am struggling to do so. I think perhaps it is not supported?
- I would like to do something like the following:
Start with a 104 x 104 matmul,
func.func @matmul104x104(%arg0: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg1: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg2: memref<104x104xi32, strided<[?, ?], offset: ?>>) -> memref<104x104xi32, strided<[?, ?], offset: ?>> {
affine.for %arg3 = 0 to 104 {
affine.for %arg4 = 0 to 104 {
affine.for %arg5 = 0 to 104 {
%0 = affine.load %arg0[%arg3, %arg5] : memref<104x104xi8, strided<[?, ?], offset: ?>>
%1 = affine.load %arg1[%arg5, %arg4] : memref<104x104xi8, strided<[?, ?], offset: ?>>
%2 = affine.load %arg2[%arg3, %arg4] : memref<104x104xi32, strided<[?, ?], offset: ?>>
%3 = arith.extsi %0 : i8 to i32
%4 = arith.extsi %1 : i8 to i32
%5 = arith.muli %3, %4 : i32
%6 = arith.addi %2, %5 : i32
affine.store %6, %arg2[%arg3, %arg4] : memref<104x104xi32, strided<[?, ?], offset: ?>>
}
}
}
return %arg2 : memref<104x104xi32, strided<[?, ?], offset: ?>>
}
Then tile the third loop twice, so that the %arg5 dimension is broken into tiles of size 26, which are then broken into even smaller tiles of 13. Something like:
#map = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (d0 + 8)>
#map2 = affine_map<(d0) -> (d0 + 26)>
#map3 = affine_map<(d0) -> (d0 + 13)>
func.func @matmul104x104(%arg0: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg1: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg2: memref<104x104xi32, strided<[?, ?], offset: ?>>) -> memref<104x104xi32, strided<[?, ?], offset: ?>> {
affine.for %arg3 = 0 to 104 step 8 {
affine.for %arg4 = 0 to 104 step 8 {
affine.for %arg5 = 0 to 104 step 26 {
affine.for %arg6 = #map(%arg3) to #map1(%arg3) {
affine.for %arg7 = #map(%arg4) to #map1(%arg4) {
affine.for %arg8 = #map(%arg5) to #map2(%arg5) step 13{
affine.for %arg9 = #map(%arg8) to #map3(%arg8) {
%0 = affine.load %arg0[%arg6, %arg9] : memref<104x104xi8, strided<[?, ?], offset: ?>>
%1 = affine.load %arg1[%arg9, %arg7] : memref<104x104xi8, strided<[?, ?], offset: ?>>
%2 = affine.load %arg2[%arg6, %arg7] : memref<104x104xi32, strided<[?, ?], offset: ?>>
%3 = arith.extsi %0 : i8 to i32
%4 = arith.extsi %1 : i8 to i32
%5 = arith.muli %3, %4 : i32
%6 = arith.addi %2, %5 : i32
affine.store %6, %arg2[%arg6, %arg7] : memref<104x104xi32, strided<[?, ?], offset: ?>>
}// end of hoodle for
}
}
}
}
}
}
return %arg2 : memref<104x104xi32, strided<[?, ?], offset: ?>>
}
Is this currently supported and I am not using the affine tiling pass correctly, or is this indeed impossible with the current affine tiling pass?
- I have been looking at this post, which talks about handling non-hyper-rectangular loop nests, but I am not convinced that multi-level tiling is the same thing as handling non-hyperrectangular loop nests (is it the same thing? Do I really need to perform computationally expensive Fourier-Motzkin elimination just to tile the same dimension of a matrix twice? )
Help and thoughts on this would be greatly appreciated!