Question about --affine-loop-tile

I did some experiment with --affine-loop-tile and notice it only tiles the loop nest but not the underlying buffers. This switch seems aiming at optimizing memory hierachy performance rather than working around memory resource restrictions. But I don’t know if I missed some switches that I’m not aware of so want to make sure I didn’t miss something to tile the buffers for --affine-loop-tile. In comparison, I see --linalg-tile and --linalg-tile-and-fuse-on-tensors tile the underlying buffers. Is --affine-loop-tile not designed for working around memory size restrictions?

For example, the IR before and after -affine-loop-tile="tile-size=32" on function simple_matmul from the test mlir\test\Dialect\Affine\loop-tiling.mlir produces the following IR.

  • Before -affine-loop-tile="tile-size=32"
  func @simple_matmul(%arg0: memref<256x256xvector<64xf32>>, %arg1: memref<256x256xvector<64xf32>>, %arg2: memref<256x256xvector<64xf32>>) -> memref<256x256xvector<64xf32>> {
    affine.for %arg3 = 0 to 256 {
      affine.for %arg4 = 0 to 256 {
        affine.for %arg5 = 0 to 250 {
          %0 = affine.load %arg0[%arg3, %arg5] : memref<256x256xvector<64xf32>>
          %1 = affine.load %arg1[%arg5, %arg4] : memref<256x256xvector<64xf32>>
          %2 = affine.load %arg2[%arg3, %arg4] : memref<256x256xvector<64xf32>>
          %3 = mulf %0, %1 : vector<64xf32>
          %4 = addf %2, %3 : vector<64xf32>
          affine.store %4, %arg2[%arg3, %arg4] : memref<256x256xvector<64xf32>>
        }
      }
    }
    return %arg2 : memref<256x256xvector<64xf32>>
  }
  • After -affine-loop-tile="tile-size=32"
#map0 = affine_map<(d0) -> (d0)>
#map1 = affine_map<(d0) -> (d0 + 32)>
#map2 = affine_map<(d0) -> (d0 + 32, 250)>
module  {
  func @simple_matmul(%arg0: memref<256x256xvector<64xf32>>, %arg1: memref<256x256xvector<64xf32>>, %arg2: memref<256x256xvector<64xf32>>) -> memref<256x256xvector<64xf32>> {
    affine.for %arg3 = 0 to 256 step 32 {
      affine.for %arg4 = 0 to 256 step 32 {
        affine.for %arg5 = 0 to 250 step 32 {
          affine.for %arg6 = #map0(%arg3) to #map1(%arg3) {
            affine.for %arg7 = #map0(%arg4) to #map1(%arg4) {
              affine.for %arg8 = #map0(%arg5) to min #map2(%arg5) {
                %0 = affine.load %arg0[%arg6, %arg8] : memref<256x256xvector<64xf32>>
                %1 = affine.load %arg1[%arg8, %arg7] : memref<256x256xvector<64xf32>>
                %2 = affine.load %arg2[%arg6, %arg7] : memref<256x256xvector<64xf32>>
                %3 = mulf %0, %1 : vector<64xf32>
                %4 = addf %2, %3 : vector<64xf32>
                affine.store %4, %arg2[%arg6, %arg7] : memref<256x256xvector<64xf32>>
              }
            }
          }
        }
      }
    }
    return %arg2 : memref<256x256xvector<64xf32>>
  }
}

Loop tiling itself just means tiling of the loop nests; it does not imply that the multi-dimensional arrays/data being used would also be replaced/transformed in any way.

1 Like