Should affine loop fusion work on the contents of another loop?

Suppose I have the following IR (which is a modification of the first test in mlir/test/Transforms/loop-fusion.mlir):

func @example() {
  %m = memref.alloc() : memref<10xf32>
  %cf7 = constant 7.0 : f32

  affine.for %i2 = 0 to 20 {
    affine.for %i0 = 0 to 10 {
      affine.store %cf7, %m[%i0] : memref<10xf32>
    }
    affine.for %i1 = 0 to 10 {
      %v0 = affine.load %m[%i1] : memref<10xf32>
    }
  }
  return
}

Running

mlir-opt -affine-loop-fusion /path/to/mlir

yields

func @example() {
  %0 = memref.alloc() : memref<10xf32>
  %cst = constant 7.000000e+00 : f32
  affine.for %arg0 = 0 to 20 {
    affine.for %arg1 = 0 to 10 {
      affine.store %cst, %0[%arg1] : memref<10xf32>     
    }
    affine.for %arg1 = 0 to 10 {
      %1 = affine.load %0[%arg1] : memref<10xf32>
    }
  }
  return
}

Clearly, the two loops inside the outer loop did not fuse, even though they are fusible when they aren’t inside the loop.

Is this expected, or a bug? If it’s expected, how would I do loop fusion in a situation like this?

Thanks

This is expected and a current limitation. Since a lot of the initial use cases coming in from ML models would initially be lowered to a sequence of perfect nests (inside a FuncOp), the fusion algorithm is geared to work on those. While the mechanics for fusing such inner loop nests in an imperfect nest exist in fusion utilities, the outer algorithm works on “top-level” nests. It’d be good to extend it!