I have been studying MLIR by expanding the Toy example, and have come across this problem which I can’t quite figure out. I don’t have a compilers background, so there might be something obvious I’m missing here:
My example consists of this basic program, where I have two “linalg.fill” ops where I am filling with two seperate values created by calling out to system time function (toy.time and toy.sleep get lowered into C calls). I subtract the two tensors and print the resulting values of each tensor:
#map = affine_map<(d0, d1) -> (d0, d1)>
module {
func @main() {
%cst = constant dense<1.000000e+00> : tensor<f32>
%0 = "toy.time"() : () -> f64
%1 = linalg.init_tensor [2, 2] : tensor<2x2xf64>
%2 = linalg.fill(%0, %1) : f64, tensor<2x2xf64> -> tensor<2x2xf64>
"toy.sleep"(%cst) : (tensor<f32>) -> ()
%3 = "toy.time"() : () -> f64
%4 = linalg.init_tensor [2, 2] : tensor<2x2xf64>
%5 = linalg.fill(%3, %4) : f64, tensor<2x2xf64> -> tensor<2x2xf64>
%6 = linalg.init_tensor [2, 2] : tensor<2x2xf64>
%7 = linalg.generic {indexing_maps = [#map, #map, #map], iterator_types = ["parallel", "parallel"]} ins(%5, %2 : tensor<2x2xf64>, tensor<2x2xf64>) outs(%6 : tensor<2x2xf64>) {
^bb0(%arg0: f64, %arg1: f64, %arg2: f64): // no predecessors
%8 = subf %arg0, %arg1 : f64
linalg.yield %8 : f64
} -> tensor<2x2xf64>
toy.print %2 : tensor<2x2xf64>
toy.print %5 : tensor<2x2xf64>
toy.print %7 : tensor<2x2xf64>
return
}
}
If I run the pass sequence of “CSE, linalgBufferizePass, linalgToAffineLoopsPass”, the result is:
module {
func @main() {
%cst = constant dense<1.000000e+00> : tensor<f32>
%0 = "toy.time"() : () -> f64
%1 = memref.alloc() : memref<2x2xf64>
affine.for %arg0 = 0 to 2 {
affine.for %arg1 = 0 to 2 {
affine.store %0, %1[%arg0, %arg1] : memref<2x2xf64>
}
}
%2 = memref.tensor_load %1 : memref<2x2xf64>
"toy.sleep"(%cst) : (tensor<f32>) -> ()
%3 = "toy.time"() : () -> f64
affine.for %arg0 = 0 to 2 {
affine.for %arg1 = 0 to 2 {
affine.store %3, %1[%arg0, %arg1] : memref<2x2xf64>
}
}
%4 = memref.tensor_load %1 : memref<2x2xf64>
%5 = memref.alloc() : memref<2x2xf64>
affine.for %arg0 = 0 to 2 {
affine.for %arg1 = 0 to 2 {
%7 = affine.load %1[%arg0, %arg1] : memref<2x2xf64>
%8 = affine.load %1[%arg0, %arg1] : memref<2x2xf64>
%9 = subf %7, %8 : f64
affine.store %9, %5[%arg0, %arg1] : memref<2x2xf64>
}
}
%6 = memref.tensor_load %5 : memref<2x2xf64>
toy.print %2 : tensor<2x2xf64>
toy.print %4 : tensor<2x2xf64>
toy.print %6 : tensor<2x2xf64>
return
}
}
Obviously this is not what I want because now the “FillOp” is repeatedly filling the same buffer and subtracting it from itself. Removing the CSE pass corrects this behavior. What am I doing wrong here? The “toy.time” calls are being preserved, but the analysis merges the output tensors before bufferization, while preserving the actual fill operation calls.