How to bufferize `scf.execute_region` op?

Colloportus0 · January 31, 2023, 5:58am

When I tried to bufferize the temp.mlir:

  func.func nested @func1(%arg0: tensor<10x6xi1>, %arg1: memref<5x5xi1>, %arg2: i32) {
    %13 = tensor.empty() : tensor<10x6xf32>
    %173 = scf.execute_region -> tensor<10x6xf32> {
      scf.yield %13 : tensor<10x6xf32>
    }
    return
  }

with the option mlir-opt --empty-tensor-to-alloc-tensor --func-bufferize --scf-bufferize --bufferization-bufferize, the output is:

module {
  func.func nested @func1(%arg0: memref<10x6xi1>, %arg1: memref<5x5xi1>, %arg2: i32) {
    %alloc = memref.alloc() {alignment = 64 : i64} : memref<10x6xf32>
    %0 = bufferization.to_tensor %alloc : memref<10x6xf32>
    %1 = scf.execute_region -> tensor<10x6xf32> {
      scf.yield %0 : tensor<10x6xf32>
    }
    return
  }
}

The scf.execute_region still yields tensor type.

Is there any step I missed

stellaraccident · February 1, 2023, 4:36pm

@matthias-springer

matthias-springer · February 1, 2023, 5:17pm

--func-bufferize, --scf-bufferize etc. are deprecated, I will turn them into test passes soon. Use -one-shot-bufferize instead, then it should bufferize.

Hardcode84 · February 1, 2023, 7:30pm

Is there any other ‘proper’ way to bufferize mixed tensor/memref code? For the context - we have our own custom tensor dialect which allows inplace modifications. We are lowering parts which we can prove safe to linalg-on-tensors and rest to memrefs, and bufferize resulting mixed code later by using these passes, which works reasonably well for us (although we have couple of custom patterns).

matthias-springer · February 2, 2023, 7:55am

There is no good way to bufferize mixed tensor/memref code. We cannot analyze through memref code to decide when copies must be inserted during bufferization. You probably noticed that the passes like --tensor-bufferize introduce many copies. E.g., tensor.insert will always bufferize to alloc+copy+memref.store. That’s why these passes are not very useful in general apart from small unit tests.

You could try bufferizing your code with --one-shot-bufferize="allow-unknown-ops". Your custom ops that don’t implement BufferizableOpInterface will be skipped. Then you can run your own custom bufferization for the remaining code.

Hardcode84 · February 2, 2023, 12:07pm

We don’t generate insert_slice, I think, but we had to do some custom lowering to avoid copies on extract_slice. But, passes like func/scf bufferize are just changing types on ops boundaries and are useful by itself.

Regarding extract_slice - issue was that it has to insert copy because conversion expects identity layout memrefs on op boundaries and memref.subview result is strided. But we handled this by introducing change_layout op, so extract_slice bufferized like this:

%1 = ... memref<?xf32>
%2 = memref.subview ... -> memref<?xf32 strided>
%3 = change_layout %2 -> memref<?xf32>

And then we have set of patterns which tries to propagate and cancel-out these change_layout ops.

mehdi_amini · February 2, 2023, 7:25pm

Can you elaborate a bit more with an example? I’m curious

matthias-springer · February 3, 2023, 9:41am

A bit of background: When we designed One-Shot Bufferize, we had two design options:

Analyze tensor IR and insert buffer copies only when needed.
Insert copies on every write (without analyzing anything). Then run a memref analysis to remove copies again.

We went with the first option. I have no good answer which design is better. My gut feeling says variant 1 is simpler because we can utilize SSA use-def chains for the analysis and implement special rules to bufferize certain tensor ops efficiently in the absence of difficult analyses (e.g., range analyses). Also, a tensor-based analysis fits better with destination-style, which we’ve already been utilizing in other components (e.g., tiling).

The bufferization analysis is driven by BufferizableOpInterface. There are two methods that model the flow of data through the program: getAliasingOpOperands and getAliasingOpResults. These are properties of tensor operands.

E.g.:

// getAliasingOpOperands(%r) = {%t}
// getAliasingOpResults(%t) = {%r}
%r = tensor.insert %cst into %t[%idx] : tensor<?xf32>

This tells us that if %t bufferizes in-place, buffer(%r) == buffer(%t). The bufferization analysis maintains alias sets based on this information.

Operations that are not tensor-based do not implement the BufferizableOpInterface, so there’s no getAliasingOpOperands/Results property that could be queried. Instead, we have bufferization.to_memref at the boundary, for which the BufferizableOpInterface could be queried.

E.g.:

%m = bufferization.to_memref %t : memref<?xf32>
// Do something with %m

Our analysis stops at bufferization.to_memref. We don’t know what’s happening to %m. In particular, we don’t know if some op is going to read from %m and/or write to %m. So we have to be conservative and assume that the answer is “yes”; which could potentially insert unnecessary buffer copies.

Then there’s bufferization.to_tensor on the other end.
E.g.:

// ...
%t = bufferization.to_tensor %m : memref<?xf32>

Our analysis does not know where %t is coming from. But it has to implement BufferizableOpInterface::getAliasingOpOperands. Usually, we would look up the alias set of the OpOperand and maybe union the set of %t and %m. But that doesn’t work because %m is a memref. So we have to be conservative and assume that buffer(%t) may after bufferization alias with any other SSA value who’s definition dominates the bufferization.to_tensor op.

(We don’t do this at the moment. Instead, we assert that there’s no to_tensor/to_memref in the program.)

Topic		Replies	Views
How to bufferize `memref.tensor_store` op? MLIR	0	187	February 7, 2023
In place memory bufferization MLIR mlir	8	226	October 21, 2024
[One-Shot Bufferizer] Bufferization fails in the presence of `tensor.empty()` Beginners	11	270	November 15, 2023
Bufferization error related to ```memref.clone``` MLIR	2	522	November 19, 2021
[RFC] Compile-time memref.alloc Scheduling/Merging optimization MLIR	18	771	May 17, 2024

How to bufferize `scf.execute_region` op?

Related topics