Any plan to fuse consumers?

AHuskie · January 15, 2024, 6:48am

Generally, we can fuse producers of some op inside the containing op, but if the producer has multiple users outside the containing op, we can’t fuse the producer into containing op completely.

for example:

func.func @add(%a: tensor<16x10xf32>, %b: tensor<10x10xf32>) -> (tensor<16x10xf32>, tensor<16x10xf32>) {
  %init_0 = tensor.empty() :tensor<16x10xf32>
  %matmul = linalg.matmul ins(%a, %b : tensor<16x10xf32>, tensor<10x10xf32>)
                          outs(%init_0 : tensor<16x10xf32>) -> tensor<16x10xf32>
  %init_1 = tensor.empty() :tensor<16x10xf32>
  %res0 = linalg.elemwise_unary {fun = #linalg.unary_fn<abs>, "res0"}
                                 ins(%matmul : tensor<16x10xf32>) outs(%init_1 : tensor<16x10xf32>) -> tensor<16x10xf32>
  %init_2 = tensor.empty() :tensor<16x10xf32>
  %res1 = linalg.elemwise_unary {fun = #linalg.unary_fn<ceil>, "res1"}
                                 ins(%matmul : tensor<16x10xf32>) outs(%init_1 : tensor<16x10xf32>) -> tensor<16x10xf32>
  return %res0, %res1 : tensor<16x10xf32>, tensor<16x10xf32>
}

transform.with_pdl_patterns {
^bb0(%arg0: !pdl.operation):
  transform.sequence %arg0 : !pdl.operation failures(propagate) {
    ^bb0(%arg1: !pdl.operation):
      %res0 = transform.structured.match attributes{"res0"} in %arg1 : (!pdl.operation) -> !pdl.operation
      %foreach_thread_op1, %tiled_op1 = transform.structured.tile_to_foreach_thread_op %res0 num_threads [4, 0]
      %matmul = transform.structured.match ops{["linalg.matmul"]} in %arg1 : (!pdl.operation) -> !pdl.operation
      transform.structured.fuse_into_containing_op %matmul into %foreach_thread_op1
  }
}

We have to keep a ‘matmul’ op outside the ‘foreach_thread_op1’, because it has a user outside ‘foreach_thread_op1’.

I want to tile and fuse all operations into one foreach_thread_op, but i can’t do it with the current infrastructure.

I think if we can fuse all consumers into the containing op, we can tile the ‘matmul’ and fuse its two consumers. Of course, we need to add a ‘generateOperandTileValue’ in the tiling interface, which is opposite to ‘generateResultTileValue’, and promote the sinking of insert slice.

I want to know if mlir provides some infrastructure to fuse consumer into the containing op in the future. If not, is there a better solution to fuse all operation into one foreach_thread_op.

rengolin · January 15, 2024, 10:06am

If I understand your query, this is certainly doable, but not as trivial as it sounds.

Your IR has two (independent) fusion opportunities:

  %res0 = abs ( matmul ( %a : <16x10>, %b: <10x10> ) );
  %res1 = ceil ( matmul ( %a: <16x10>, %b: <10x10> ) );

With CFG:

   matmul
   /    \
abs     ceil

You cannot do both paths in-place, you need to choose one. But you can tile and at least fuse all the ops in the same inner loop. However, you still need a separate buffer for each output, potentially making the last one in-place.

Extending this for arbitrary CFGs can lead to complex bufferization logic. It leads to a lot of corner cases that play with other passes and needs work that doesn’t just add the transform, but actually considers the side effects.

cxy · January 22, 2024, 6:01am

I have proposed a new RFC([RFC] Tiling interface supports fuse consumer) to address this issue. Welcome to engage in a thorough discussion there.

Topic		Replies	Views
What are the restrictions on fuse_into_containing_op? MLIR mlir	5	127	April 2, 2024
[RFC] Introduce new pass/transform: fusion by diffusion MLIR mlir	18	524	June 21, 2024
Fusing Convolution with Relu (Conv + Relu -> ConvRelu ) in Linalg MLIR	1	1282	July 12, 2021
[RFC][Tensor] Add a `tensor.concatenate` operation MLIR	9	410	November 19, 2023
[RFC] Add an op to group/cluster operations MLIR	10	538	June 21, 2021

Any plan to fuse consumers?

Related Topics