Multi-level tiling in affine

I would like to perform multi-level tiling with --affine-loop-tile pass, but I am struggling to do so. I think perhaps it is not supported?

  1. I would like to do something like the following:
    Start with a 104 x 104 matmul,
  func.func @matmul104x104(%arg0: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg1: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg2: memref<104x104xi32, strided<[?, ?], offset: ?>>) -> memref<104x104xi32, strided<[?, ?], offset: ?>> {
    affine.for %arg3 = 0 to 104 {
      affine.for %arg4 = 0 to 104 {
        affine.for %arg5 = 0 to 104 {
          %0 = affine.load %arg0[%arg3, %arg5] : memref<104x104xi8, strided<[?, ?], offset: ?>>
          %1 = affine.load %arg1[%arg5, %arg4] : memref<104x104xi8, strided<[?, ?], offset: ?>>
          %2 = affine.load %arg2[%arg3, %arg4] : memref<104x104xi32, strided<[?, ?], offset: ?>>
          %3 = arith.extsi %0 : i8 to i32
          %4 = arith.extsi %1 : i8 to i32
          %5 = arith.muli %3, %4 : i32
          %6 = arith.addi %2, %5 : i32
          affine.store %6, %arg2[%arg3, %arg4] : memref<104x104xi32, strided<[?, ?], offset: ?>>
        }
      }
    }
    return %arg2 : memref<104x104xi32, strided<[?, ?], offset: ?>>
  }

Then tile the third loop twice, so that the %arg5 dimension is broken into tiles of size 26, which are then broken into even smaller tiles of 13. Something like:

#map = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (d0 + 8)>
#map2 = affine_map<(d0) -> (d0 + 26)>
#map3 = affine_map<(d0) -> (d0 + 13)>
func.func @matmul104x104(%arg0: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg1: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg2: memref<104x104xi32, strided<[?, ?], offset: ?>>) -> memref<104x104xi32, strided<[?, ?], offset: ?>> {
    affine.for %arg3 = 0 to 104 step 8 {
      affine.for %arg4 = 0 to 104 step 8 {
        affine.for %arg5 = 0 to 104 step 26 {
          affine.for %arg6 = #map(%arg3) to #map1(%arg3) {
            affine.for %arg7 = #map(%arg4) to #map1(%arg4) {
              affine.for %arg8 = #map(%arg5) to #map2(%arg5) step 13{ 
                affine.for %arg9 = #map(%arg8) to #map3(%arg8) {
                  %0 = affine.load %arg0[%arg6, %arg9] : memref<104x104xi8, strided<[?, ?], offset: ?>>
                  %1 = affine.load %arg1[%arg9, %arg7] : memref<104x104xi8, strided<[?, ?], offset: ?>>
                  %2 = affine.load %arg2[%arg6, %arg7] : memref<104x104xi32, strided<[?, ?], offset: ?>>
                  %3 = arith.extsi %0 : i8 to i32
                  %4 = arith.extsi %1 : i8 to i32
                  %5 = arith.muli %3, %4 : i32
                  %6 = arith.addi %2, %5 : i32
                  affine.store %6, %arg2[%arg6, %arg7] : memref<104x104xi32, strided<[?, ?], offset: ?>>
                }// end of hoodle for
              }
            }
          }
        }
      }
    }
    return %arg2 : memref<104x104xi32, strided<[?, ?], offset: ?>>
  }

Is this currently supported and I am not using the affine tiling pass correctly, or is this indeed impossible with the current affine tiling pass?

  1. I have been looking at this post, which talks about handling non-hyper-rectangular loop nests, but I am not convinced that multi-level tiling is the same thing as handling non-hyperrectangular loop nests (is it the same thing? Do I really need to perform computationally expensive Fourier-Motzkin elimination just to tile the same dimension of a matrix twice? )

Help and thoughts on this would be greatly appreciated!