I see this test for affine-loop-fusion
here:
func @should_not_fuse_if_inst_at_top_level() {
%m = memref.alloc() : memref<10xf32>
%cf7 = constant 7.0 : f32
affine.for %i0 = 0 to 10 {
affine.store %cf7, %m[%i0] : memref<10xf32>
}
affine.for %i1 = 0 to 10 {
%v0 = affine.load %m[%i1] : memref<10xf32>
}
%c0 = constant 4 : index
affine.if #set0(%c0) {
}
// Top-level IfOp should prevent fusion.
// CHECK: affine.for %{{.*}} = 0 to 10 {
// CHECK-NEXT: affine.store %{{.*}}, %{{.*}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: }
// CHECK: affine.for %{{.*}} = 0 to 10 {
// CHECK-NEXT: affine.load %{{.*}}[%{{.*}}] : memref<10xf32>
// CHECK-NEXT: }
return
}
I think the affine.if is not related to the first two affine.for
, so the first two affine.for
should be fused.
Another example is when there is affine.if
inside a affine.for
near the end of a program, it causes all the other affine.for
unfused.
#set = affine_set<(d0) : (d0 - 1 >= 0)>
func @test_fusion() {
%a = memref.alloc() : memref<10x10xf32>
%b = memref.alloc() : memref<10x10xf32>
%cf7 = constant 7.0 : f32
affine.for %i0 = 0 to 10 {
affine.for %i1 = 0 to 10 {
affine.store %cf7, %a[%i0, %i1] : memref<10x10xf32>
}
}
affine.for %i2 = 0 to 10 {
affine.for %i3 = 0 to 10 {
%v0 = affine.load %a[%i3, %i2] : memref<10x10xf32>
affine.store %v0, %b[%i2, %i3] : memref<10x10xf32>
}
}
affine.for %i4 = 0 to 10 {
affine.for %i5 = 0 to 10 {
affine.if #set(%i5) {
%v1 = affine.load %b[%i4, %i5] : memref<10x10xf32>
} else {}
}
}
return
}
Is it the expected behavior or a bug we should fix? For me, it is like a bug.