Suppose I have the following IR (which is a modification of the first test in mlir/test/Transforms/loop-fusion.mlir):
func @example() {
%m = memref.alloc() : memref<10xf32>
%cf7 = constant 7.0 : f32
affine.for %i2 = 0 to 20 {
affine.for %i0 = 0 to 10 {
affine.store %cf7, %m[%i0] : memref<10xf32>
}
affine.for %i1 = 0 to 10 {
%v0 = affine.load %m[%i1] : memref<10xf32>
}
}
return
}
Running
mlir-opt -affine-loop-fusion /path/to/mlir
yields
func @example() {
%0 = memref.alloc() : memref<10xf32>
%cst = constant 7.000000e+00 : f32
affine.for %arg0 = 0 to 20 {
affine.for %arg1 = 0 to 10 {
affine.store %cst, %0[%arg1] : memref<10xf32>
}
affine.for %arg1 = 0 to 10 {
%1 = affine.load %0[%arg1] : memref<10xf32>
}
}
return
}
Clearly, the two loops inside the outer loop did not fuse, even though they are fusible when they aren’t inside the loop.
Is this expected, or a bug? If it’s expected, how would I do loop fusion in a situation like this?
Thanks