Possible Bug in Affine Loop Fusion

Hi, I found a possible bug in the affine-loop-fusion pass. The output from the pass does not match the input.

Input:

func @foo(%m: memref<100xf32>, %src: memref<100xf32>) {
  affine.for %i0 = 0 to 100 {
    %r1 = affine.load %src[%i0]: memref<100xf32>
    affine.store %r1, %m[%i0] : memref<100xf32>
  }
  affine.for %i2 = 0 to 100 step 2 {
    %v1 = affine.load %m[%i2] : memref<100xf32>
  }
  return
}

Output:

module  {
  func @foo(%arg0: memref<100xf32>, %arg1: memref<100xf32>) {
    affine.for %arg2 = 0 to 100 step 2 {
      %0 = affine.load %arg1[%arg2] : memref<100xf32>
      affine.store %0, %arg0[%arg2] : memref<100xf32>
      %1 = affine.load %arg0[%arg2] : memref<100xf32>
    }
    return
  }
}

Expected correct output (Producer loop should not be removed):

module  {
  func @foo(%arg0: memref<100xf32>, %arg1: memref<100xf32>) {
    %0 = memref.alloc() : memref<1xf32>
    affine.for %arg2 = 0 to 100 {
      %1 = affine.load %arg1[%arg2] : memref<100xf32>
      affine.store %1, %arg0[%arg2] : memref<100xf32>
    }
    affine.for %arg2 = 0 to 100 step 2 {
      %1 = affine.load %arg1[%arg2] : memref<100xf32>
      affine.store %1, %0[0] : memref<1xf32>
      %2 = affine.load %0[0] : memref<1xf32>
    }
    return
  }
}

Ran using: mlir-opt --affine-loop-fusion

The reason for this is probably the fast check for equivalence of fused and producer iterations [1] does not handle local identifiers correctly. I tried commenting out this check and using the Presburger equality check [2] and got the correct output.

[1] llvm-project/Utils.cpp at 374cd0fb6102a8726da0e6036b3c484aca32c61e · llvm/llvm-project · GitHub
[2] llvm-project/Utils.cpp at 374cd0fb6102a8726da0e6036b3c484aca32c61e · llvm/llvm-project · GitHub

Is this a bug with loop fusion pass?

@dcaballe @bondhugula @nicolasvasilache

Thanks for reporting the issue! It looks like a bug to me. The fix should be simple. I think we just need to compare the steps of the src and dst loops here: llvm-project/Utils.cpp at 374cd0fb6102a8726da0e6036b3c484aca32c61e · llvm/llvm-project · GitHub

Do you think you could provide a fix using your test? :slight_smile:

1 Like

Sure, ill send a patch fixing this later today.

Thanks for reporting this. The isSliceMaximalFastCheck is expected to be a fast check – so it shouldn’t make use of any expensive operations. If it’s unable to determine if the slice is maximal, the more expensive check is used.