Can you please elaborate on how this can be done with block arguments? I’m new to MLIR and not sure where to look. Concretely, I’m trying to optimize the following function:
func.func @stencil1D(%input: memref<10xf32>{llvm.noalias}, %output: memref<10xf32>{llvm.noalias}) {
// Center of the stencil.
affine.for %i = 1 to 9 {
%0 = affine.load %input[%i] : memref<10xf32>
affine.store %0, %output[%i] : memref<10xf32>
}
// Left.
affine.for %i = 1 to 9 {
%0 = affine.load %input[%i - 1] : memref<10xf32>
%1 = affine.load %output[%i] : memref<10xf32>
%2 = arith.addf %0, %1 : f32
affine.store %2, %output[%i] : memref<10xf32>
}
// Right.
affine.for %i = 1 to 9 {
%0 = affine.load %input[%i + 1] : memref<10xf32>
%1 = affine.load %output[%i] : memref<10xf32>
%2 = arith.addf %0, %1 : f32
affine.store %2, %output[%i] : memref<10xf32>
}
return
}
I want to fuse these three loops and then remove redundant stores to turn this into a stencil with 3 loads and 1 store. With the affine loop fusion pass and scalrep pass, I can get something like this:
module {
func.func @stencil1D(%arg0: memref<10xf32> {llvm.noalias}, %arg1: memref<10xf32> {llvm.noalias}) {
affine.for %arg2 = 1 to 9 {
%0 = affine.load %arg0[%arg2] : memref<10xf32>
affine.store %0, %arg1[%arg2] : memref<10xf32>
%1 = affine.load %arg0[%arg2 - 1] : memref<10xf32>
%2 = arith.addf %1, %0 : f32
affine.store %2, %arg1[%arg2] : memref<10xf32>
%3 = affine.load %arg0[%arg2 + 1] : memref<10xf32>
%4 = arith.addf %3, %2 : f32
affine.store %4, %arg1[%arg2] : memref<10xf32>
}
return
}
}
However, the affine pass can’t remove the redundant stores because I’m guessing that the input and output memref
s may alias. If I rewrite the MLIR code to
func.func @stencil1D() -> memref<10xf32> {
%input = memref.alloc() : memref<10xf32>
%output = memref.alloc() : memref<10xf32>
// Center of the stencil.
affine.for %i = 1 to 9 {
%0 = affine.load %input[%i] : memref<10xf32>
affine.store %1, %output[%i] : memref<10xf32>
}
// Left.
affine.for %i = 1 to 9 {
%0 = affine.load %input[%i - 1] : memref<10xf32>
%1 = affine.load %output[%i] : memref<10xf32>
%2 = arith.addf %0, %1 : f32
affine.store %2, %output[%i] : memref<10xf32>
}
// Right.
affine.for %i = 1 to 9 {
%0 = affine.load %input[%i + 1] : memref<10xf32>
%1 = affine.load %output[%i] : memref<10xf32>
%2 = arith.addf %0, %1 : f32
affine.store %2, %output[%i] : memref<10xf32>
}
return %output : memref<10xf32>
}
then things get rewritten to what I would expect:
module {
func.func @stencil1D() -> memref<10xf32> {
%alloc = memref.alloc() : memref<10xf32>
%alloc_0 = memref.alloc() : memref<10xf32>
affine.for %arg0 = 1 to 9 {
%0 = affine.load %alloc[%arg0] : memref<10xf32>
%1 = affine.load %alloc[%arg0 - 1] : memref<10xf32>
%2 = arith.addf %1, %0 : f32
%3 = affine.load %alloc[%arg0 + 1] : memref<10xf32>
%4 = arith.addf %3, %2 : f32
affine.store %4, %alloc_0[%arg0] : memref<10xf32>
}
return %alloc_0 : memref<10xf32>
}
}
So I would like to be able to tell the affine passes that this aliasing isn’t present – adding {llvm.noalias}
to the memref
type doesn’t make that reasoning go through.