[RFC] Changes to linalg::TiledLoopOp to unblock reductions

MaheshRavishankar · July 16, 2021, 7:01pm

In general I think it makes sense to have the body yield a tile instead of the whole tensor. So +1 for the direction. (Note this is kind of what we do in IREE with flow.dispatch.tensor.load and flow.dispatch.tensor.store.

I mostly have nit about

linalg.tiled_yield %transpose_sub in %out_sub : tensor<10x10xf32>

what does “%transpose_sub in %out_sub” mean?

Would something like

linalg.tiled_yield %transpose_sub as %out_sub

be more readable. Essentially saying that %transpose_sub replaces what was %out_sub.

Also,

 tiled_loop.yield %sub_sum in %out_

was this a typo or is tiled_loop.yield signifying something else.

Side note : This does seem to fit well with the interface RFC for `TilingInterface` for tiling operations that dont fit into Linalg Structured Operation definition which also is actually only having the tiled implementation return the tile and moving the tensor.insert_slice into being an implementation detail of the generated tiled code.

Topic		Replies	Views
[RFC] Add Linalg TileOp MLIR	17	1552	February 22, 2021
Difference between --linalg-tile and --linalg-tile-and-fuse-tensor-ops MLIR	3	719	October 26, 2021
Fuse linalg.tiled_loop MLIR	3	621	March 14, 2022
Linalg.tiled_loop does not bufferize MLIR	2	317	November 15, 2021
[RFC] Linalg on Tensors Update and Comprehensive Bufferization RFC MLIR	6	2356	May 6, 2021

[RFC] Changes to linalg::TiledLoopOp to unblock reductions

Related topics