[RFC] Changes to linalg::TiledLoopOp to unblock reductions

_sean_silva · August 2, 2021, 6:30pm

This is a long thread, but as a fly on the wall comment, another approach here that is isomorphis to what is being proposed is

%sum = linalg.tiled_loop (%i, %j) = (%c0, %c0) to (%size_0, %size_1)
    step (%c10, %c10)
    ins (%in_ =  %in: tensor<100x100xf32>)
    outs (%out_ =  %out: tensor<100x100xf32>)
    iterator_types ("parallel", "parallel")  {
  %in_sub = tensor.extract_slice %in_[%i, %j][%c10, %c20][%c1, %c1]
 
  %transpose_sub = linalg.generic {
      indexing_maps =  [#id, #tr],
      iterator_types =  ["parallel", "parallel"]}
      ins(%in_sub: tensor<10x10xf32>)
      outs(%out_sub: tensor<10x10xf32>)  {
    ^bb0(%in_elem: f32,  %out_elem: f32):
      linalg.yield  %in_elem : f32
  } -> tensor<10x10xf32>
  linalg.tiled_loop_terminator {
    tiled_yield %transpose_sub at [%j, %i][%c20, %c10][%c1, %c1]
  }
}

That is, the terminator has a region, with one op per outs, and that op holds the offsets/strides to insert into for this iteration. I think this dodges the weirdness of linalg.tiled_yield %transpose_sub in %out_sub : tensor<10x10xf32> needing %out_sub to be defined by a tensor.extract_slice. It’s the same information, but represented without needing to traverse a use-def chain to reach the dummy “read”.

Topic		Replies	Views
[RFC] Add Linalg TileOp MLIR	17	1552	February 22, 2021
Difference between --linalg-tile and --linalg-tile-and-fuse-tensor-ops MLIR	3	719	October 26, 2021
Fuse linalg.tiled_loop MLIR	3	621	March 14, 2022
Linalg.tiled_loop does not bufferize MLIR	2	317	November 15, 2021
[RFC] Linalg on Tensors Update and Comprehensive Bufferization RFC MLIR	6	2356	May 6, 2021

[RFC] Changes to linalg::TiledLoopOp to unblock reductions

Related topics