Multi-level tiling in affine

blue_hoppip · October 2, 2024, 4:20pm

I would like to perform multi-level tiling with --affine-loop-tile pass, but I am struggling to do so. I think perhaps it is not supported?

I would like to do something like the following:
Start with a 104 x 104 matmul,

  func.func @matmul104x104(%arg0: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg1: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg2: memref<104x104xi32, strided<[?, ?], offset: ?>>) -> memref<104x104xi32, strided<[?, ?], offset: ?>> {
    affine.for %arg3 = 0 to 104 {
      affine.for %arg4 = 0 to 104 {
        affine.for %arg5 = 0 to 104 {
          %0 = affine.load %arg0[%arg3, %arg5] : memref<104x104xi8, strided<[?, ?], offset: ?>>
          %1 = affine.load %arg1[%arg5, %arg4] : memref<104x104xi8, strided<[?, ?], offset: ?>>
          %2 = affine.load %arg2[%arg3, %arg4] : memref<104x104xi32, strided<[?, ?], offset: ?>>
          %3 = arith.extsi %0 : i8 to i32
          %4 = arith.extsi %1 : i8 to i32
          %5 = arith.muli %3, %4 : i32
          %6 = arith.addi %2, %5 : i32
          affine.store %6, %arg2[%arg3, %arg4] : memref<104x104xi32, strided<[?, ?], offset: ?>>
        }
      }
    }
    return %arg2 : memref<104x104xi32, strided<[?, ?], offset: ?>>
  }

Then tile the third loop twice, so that the %arg5 dimension is broken into tiles of size 26, which are then broken into even smaller tiles of 13. Something like:

#map = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (d0 + 8)>
#map2 = affine_map<(d0) -> (d0 + 26)>
#map3 = affine_map<(d0) -> (d0 + 13)>
func.func @matmul104x104(%arg0: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg1: memref<104x104xi8, strided<[?, ?], offset: ?>>, %arg2: memref<104x104xi32, strided<[?, ?], offset: ?>>) -> memref<104x104xi32, strided<[?, ?], offset: ?>> {
    affine.for %arg3 = 0 to 104 step 8 {
      affine.for %arg4 = 0 to 104 step 8 {
        affine.for %arg5 = 0 to 104 step 26 {
          affine.for %arg6 = #map(%arg3) to #map1(%arg3) {
            affine.for %arg7 = #map(%arg4) to #map1(%arg4) {
              affine.for %arg8 = #map(%arg5) to #map2(%arg5) step 13{ 
                affine.for %arg9 = #map(%arg8) to #map3(%arg8) {
                  %0 = affine.load %arg0[%arg6, %arg9] : memref<104x104xi8, strided<[?, ?], offset: ?>>
                  %1 = affine.load %arg1[%arg9, %arg7] : memref<104x104xi8, strided<[?, ?], offset: ?>>
                  %2 = affine.load %arg2[%arg6, %arg7] : memref<104x104xi32, strided<[?, ?], offset: ?>>
                  %3 = arith.extsi %0 : i8 to i32
                  %4 = arith.extsi %1 : i8 to i32
                  %5 = arith.muli %3, %4 : i32
                  %6 = arith.addi %2, %5 : i32
                  affine.store %6, %arg2[%arg6, %arg7] : memref<104x104xi32, strided<[?, ?], offset: ?>>
                }// end of hoodle for
              }
            }
          }
        }
      }
    }
    return %arg2 : memref<104x104xi32, strided<[?, ?], offset: ?>>
  }

Is this currently supported and I am not using the affine tiling pass correctly, or is this indeed impossible with the current affine tiling pass?

I have been looking at this post, which talks about handling non-hyper-rectangular loop nests, but I am not convinced that multi-level tiling is the same thing as handling non-hyperrectangular loop nests (is it the same thing? Do I really need to perform computationally expensive Fourier-Motzkin elimination just to tile the same dimension of a matrix twice? )

Help and thoughts on this would be greatly appreciated!

Topic		Replies	Views
Question about --affine-loop-tile MLIR	1	647	October 26, 2021
Affine-Parallelize not parallelizing some loops MLIR affine	5	242	February 21, 2024
Non hyper-rectangular loop tiling in the Affine dialect MLIR	4	791	July 27, 2020
Tiling generates complex maps that seem to interfere with unrolling MLIR	18	967	April 14, 2020
Beginner Q: Help with loops/affine/linalg MLIR	15	2219	April 29, 2021

Multi-level tiling in affine

Related topics