MLIR for arm SME : Reducing tile data transfers

This exact lowering is what you get if you use IREE, which already does this hoisting, it is not included in the -test-lower-to-arm-sme pipeline as that was mainly intended for functional correctness tests (not optimal code).

Edit: In IREE it’s the --iree-codegen-optimize-tensor-insert-extract-slices pass (which occurs right after vectorization) which does the hoisting of the reads/writes. I think everything that pass does is available upstream (just maybe not packaged into a single pass).

1 Like