I see two switches i.e. --linalg-tile and --linalg-tile-and-fuse-tensor-ops that can tile linalg ops. Besides the support for fusion, could someone shed some light on the position of these two switches in LinAlg? For example, is one meant to replace the other one in the long term? Thanks.
--linalg-tile - Tile operations in the linalg dialect
--distribution-types=<string> - DistributionTypes (if loop-type=tiled_loop)
--loop-type=<string> - Specify the type of loops to generate: for, parallel or tiled_loop
--tile-sizes=<long> - Tile sizes
--linalg-tile-and-fuse-tensor-ops - Tile a LinalgOp and fuse its producers.
--tile-interchange=<long> - Tile loop interchange
--tile-sizes=<long> - Tile sizes
--linalg-tile performs only tiling. It exists for quite some time and has a lot of features. For example, you can select the loop type or distribute the loops.
--linalg-tile-and-fuse-tensor-ops performs tiling followed by fusion. It is relatively new and still under development. For example, it supports only one loop type (scf.for) and it does not work together with padding and hoisting. However, we are working on improving it and making it fully functional.
I would not say that one will replace the other. We definitely aim at more code reuse internally but the flags will remain (±). Also note that we use these flags mostly for testing these days. CodegenStrategy is our current way to control the optimization and combine the different transformations.
Thanks for the information @gysit. If I want to do some quick experiment, what will be the switch to fuse tiled operators after tiling with --linalg-tile? I can’t find a switch to fuse tiled loops on LinAlg on tensors. Is their fusion expected to be on a lower level abstraction like Affine or scf? Thanks.
In linalg you can only do fusion and tiling together. That means if you use --linalg-tile there is currently no linalg way to do fusion after. The reason is that --linalg-tile usually generates scf.for loops meaning we are not fully in linalg land anymore and doing fusion would require analysis to detect which scf.for is tiling which iterator dimension (or even to detect if it is a tiling loop at all). Instead, we thus have --linalg-tile-and-fuse-tensor-ops that does both tiling and fusion in one go.
You can definitely do fusion in affine using -affine-loop-fusion if I am not mistaken. The scf dialect does not implement fusion as far as I know (but I may be wrong there).