Affine-loop-fusion for non-return memref

(I don’t have an account on, so report a bug here)

I see a lot of wrong results like this in the lit tests for affine-loop-fusion, for example:

func @should_fuse_raw_dep_for_locality() {
  %m = memref.alloc() : memref<10xf32>
  %cf7 = constant 7.0 : f32

  affine.for %i0 = 0 to 10 { %cf7, %m[%i0] : memref<10xf32>
  affine.for %i1 = 0 to 10 {
    %v0 = affine.load %m[%i1] : memref<10xf32>
  // CHECK:      affine.for %{{.*}} = 0 to 10 {
  // CHECK-NEXT: %{{.*}}, %{{.*}}[0] : memref<1xf32>
  // CHECK-NEXT:   affine.load %{{.*}}[0] : memref<1xf32>
  // CHECK-NEXT: }
  // CHECK-NEXT: return

As we can see, memref<1xf32> is totally wrong. Many lit tests in loop-fusion.mlir produces similar wrong results with memref<1x1x...x1xf32>.

I found that if we return %m as the function result, the output program would be correct. It may be related to handling non-return memref.

These are not wrong - the memrefs have been contracted. It’s an optimization and dramatically reduces the memory footprint of programs. In fact, if you run -affine-scalrep on it, the memref will even disappear. However, when these memrefs are returned from the function or escape in other ways, these aren’t contracted since this is an intra-function pass or we can’t reason about updating those external uses.

1 Like

Thank you for the explanation! I was confused and wrong. The optimization is really useful.