Why some optimizations are not performed?

Hello, I extended mlir-opt with a dialect of mine, for which I wrote
just the lowering pass. I then call:

./main --lower-mydialect --convert-linalg-to-affine-loops --inline
–affine-loop-fusion --fusion-maximal --canonicalize
–inline examples/inline-bug.mlir

But when I look at the generated code of the inner loop of my code,
I can see:

      affine.store %6, %4[%arg1, %arg2] : memref<10x20xf64>
      %7 = affine.load %4[%arg1, %arg2] : memref<10x20xf64>
      affine.store %7, %1[0, 0] : memref<1x1xf64>
      %8 = affine.load %1[0, 0] : memref<1x1xf64>
      affine.store %8, %5[%arg2, %arg1] : memref<20x10xf64>
      %9 = affine.load %0[%arg0, 0] : memref<10x1xf64>
      %10 = affine.load %5[%arg2, %arg1] : memref<20x10xf64>

Obviously, there are store/load sequences where the load is useless.
Is this normal?

You may want to try -memref-dataflow-opt as well.
(also if you can, please provide actual reproducers in general)

1 Like

As @mehdi_amini points out, you need -memref-dataflow-opt to forward the stores to the loads; the fusion pass won’t do it for you. I just checked and -memref-dataflow-opt does get rid of those store/load’s on your snippet, and it also removes intermediary allocations that become dead as a result.

1 Like