Hi, I was playing around with affine fusion with affine.vector_load/store operations inside loop and came across this particular case:
func.func @main(%a: memref<64x512xf32>, %b: memref<64x512xf32>, %c: memref<64x512xf32>, %d: memref<64x4096xf32>, %e: memref<64x4096xf32>) {
affine.for %j = 0 to 8 {
%lhs = affine.vector_load %a[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
%rhs = affine.vector_load %b[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
%res = arith.addf %lhs, %rhs : vector<64x64xf32>
affine.vector_store %res, %c[0, %j * 64] : memref<64x512xf32>, vector<64x64xf32>
}
affine.for %j = 0 to 8 {
%lhs = affine.vector_load %c[0, 0] : memref<64x512xf32>, vector<64x512xf32>
%rhs = affine.vector_load %d[0, %j * 512] : memref<64x4096xf32>, vector<64x512xf32>
%res = arith.subf %lhs, %rhs : vector<64x512xf32>
affine.vector_store %res, %d[0, %j * 512] : memref<64x4096xf32>, vector<64x512xf32>
}
func.return
}
Upon invoking affine-fusion on this IR with the following command:
mlir-opt --pass-pipeline='builtin.module(affine-loop-fusion)' test.mlir
I see that the loops are getting fused as follows:
func.func @main(%arg0: memref<64x512xf32>, %arg1: memref<64x512xf32>, %arg2: memref<64x512xf32>, %arg3: memref<64x4096xf32>, %arg4: memref<64x4096xf32>) {
%c0 = arith.constant 0 : index
%alloc = memref.alloc() : memref<1x1xf32>
%c0_0 = arith.constant 0 : index
affine.for %arg5 = 0 to 8 {
%0 = affine.vector_load %arg0[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
%1 = affine.vector_load %arg1[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
%2 = arith.addf %0, %1 : vector<64x64xf32>
affine.vector_store %2, %arg2[0, %c0 * 64] : memref<64x512xf32>, vector<64x64xf32>
%3 = affine.vector_load %arg0[0, %c0_0 * 64] : memref<64x512xf32>, vector<64x64xf32>
%4 = affine.vector_load %arg1[0, %c0_0 * 64] : memref<64x512xf32>, vector<64x64xf32>
%5 = arith.addf %3, %4 : vector<64x64xf32>
affine.vector_store %5, %alloc[0, 0] : memref<1x1xf32>, vector<64x64xf32>
%6 = affine.vector_load %alloc[0, 0] : memref<1x1xf32>, vector<64x512xf32>
%7 = affine.vector_load %arg3[0, %arg5 * 512] : memref<64x4096xf32>, vector<64x512xf32>
%8 = arith.subf %6, %7 : vector<64x512xf32>
affine.vector_store %8, %arg3[0, %arg5 * 512] : memref<64x4096xf32>, vector<64x512xf32>
}
return
}
Isn’t this an invalid transformation since 2nd loop can only be executed once 1st loop completely finishes all its iterations and produces the result to be consumed in 2nd loop? Are there any specific flags to be used in affine-loop-fusion to enable analysis in the context of vector types?
Note: My LLVM source is based out on Nov 12th 2024’s commit.