Hi, I found a possible bug in the affine-loop-fusion pass. The output from the pass does not match the input.
Input:
func @foo(%m: memref<100xf32>, %src: memref<100xf32>) {
affine.for %i0 = 0 to 100 {
%r1 = affine.load %src[%i0]: memref<100xf32>
affine.store %r1, %m[%i0] : memref<100xf32>
}
affine.for %i2 = 0 to 100 step 2 {
%v1 = affine.load %m[%i2] : memref<100xf32>
}
return
}
Output:
module {
func @foo(%arg0: memref<100xf32>, %arg1: memref<100xf32>) {
affine.for %arg2 = 0 to 100 step 2 {
%0 = affine.load %arg1[%arg2] : memref<100xf32>
affine.store %0, %arg0[%arg2] : memref<100xf32>
%1 = affine.load %arg0[%arg2] : memref<100xf32>
}
return
}
}
Expected correct output (Producer loop should not be removed):
module {
func @foo(%arg0: memref<100xf32>, %arg1: memref<100xf32>) {
%0 = memref.alloc() : memref<1xf32>
affine.for %arg2 = 0 to 100 {
%1 = affine.load %arg1[%arg2] : memref<100xf32>
affine.store %1, %arg0[%arg2] : memref<100xf32>
}
affine.for %arg2 = 0 to 100 step 2 {
%1 = affine.load %arg1[%arg2] : memref<100xf32>
affine.store %1, %0[0] : memref<1xf32>
%2 = affine.load %0[0] : memref<1xf32>
}
return
}
}
Ran using: mlir-opt --affine-loop-fusion
The reason for this is probably the fast check for equivalence of fused and producer iterations [1] does not handle local identifiers correctly. I tried commenting out this check and using the Presburger equality check [2] and got the correct output.
[1] llvm-project/Utils.cpp at 374cd0fb6102a8726da0e6036b3c484aca32c61e · llvm/llvm-project · GitHub
[2] llvm-project/Utils.cpp at 374cd0fb6102a8726da0e6036b3c484aca32c61e · llvm/llvm-project · GitHub
Is this a bug with loop fusion pass?