A bit of background: When we designed One-Shot Bufferize, we had two design options:
- Analyze tensor IR and insert buffer copies only when needed.
- Insert copies on every write (without analyzing anything). Then run a memref analysis to remove copies again.
We went with the first option. I have no good answer which design is better. My gut feeling says variant 1 is simpler because we can utilize SSA use-def chains for the analysis and implement special rules to bufferize certain tensor ops efficiently in the absence of difficult analyses (e.g., range analyses). Also, a tensor-based analysis fits better with destination-style, which we’ve already been utilizing in other components (e.g., tiling).
The bufferization analysis is driven by BufferizableOpInterface
. There are two methods that model the flow of data through the program: getAliasingOpOperands
and getAliasingOpResults
. These are properties of tensor
operands.
E.g.:
// getAliasingOpOperands(%r) = {%t}
// getAliasingOpResults(%t) = {%r}
%r = tensor.insert %cst into %t[%idx] : tensor<?xf32>
This tells us that if %t
bufferizes in-place, buffer(%r) == buffer(%t)
. The bufferization analysis maintains alias sets based on this information.
Operations that are not tensor-based do not implement the BufferizableOpInterface
, so there’s no getAliasingOpOperands/Results
property that could be queried. Instead, we have bufferization.to_memref
at the boundary, for which the BufferizableOpInterface
could be queried.
E.g.:
%m = bufferization.to_memref %t : memref<?xf32>
// Do something with %m
Our analysis stops at bufferization.to_memref
. We don’t know what’s happening to %m
. In particular, we don’t know if some op is going to read from %m
and/or write to %m
. So we have to be conservative and assume that the answer is “yes”; which could potentially insert unnecessary buffer copies.
Then there’s bufferization.to_tensor
on the other end.
E.g.:
// ...
%t = bufferization.to_tensor %m : memref<?xf32>
Our analysis does not know where %t
is coming from. But it has to implement BufferizableOpInterface::getAliasingOpOperands
. Usually, we would look up the alias set of the OpOperand and maybe union the set of %t
and %m
. But that doesn’t work because %m
is a memref. So we have to be conservative and assume that buffer(%t)
may after bufferization alias with any other SSA value who’s definition dominates the bufferization.to_tensor
op.
(We don’t do this at the moment. Instead, we assert that there’s no to_tensor
/to_memref
in the program.)