A bit of background: When we designed One-Shot Bufferize, we had two design options:
- Analyze tensor IR and insert buffer copies only when needed.
- Insert copies on every write (without analyzing anything). Then run a memref analysis to remove copies again.
We went with the first option. I have no good answer which design is better. My gut feeling says variant 1 is simpler because we can utilize SSA use-def chains for the analysis and implement special rules to bufferize certain tensor ops efficiently in the absence of difficult analyses (e.g., range analyses). Also, a tensor-based analysis fits better with destination-style, which we’ve already been utilizing in other components (e.g., tiling).
The bufferization analysis is driven by BufferizableOpInterface. There are two methods that model the flow of data through the program: getAliasingOpOperands and getAliasingOpResults. These are properties of tensor operands.
E.g.:
// getAliasingOpOperands(%r) = {%t}
// getAliasingOpResults(%t) = {%r}
%r = tensor.insert %cst into %t[%idx] : tensor<?xf32>
This tells us that if %t bufferizes in-place, buffer(%r) == buffer(%t). The bufferization analysis maintains alias sets based on this information.
Operations that are not tensor-based do not implement the BufferizableOpInterface, so there’s no getAliasingOpOperands/Results property that could be queried. Instead, we have bufferization.to_memref at the boundary, for which the BufferizableOpInterface could be queried.
E.g.:
%m = bufferization.to_memref %t : memref<?xf32>
// Do something with %m
Our analysis stops at bufferization.to_memref. We don’t know what’s happening to %m. In particular, we don’t know if some op is going to read from %m and/or write to %m. So we have to be conservative and assume that the answer is “yes”; which could potentially insert unnecessary buffer copies.
Then there’s bufferization.to_tensor on the other end.
E.g.:
// ...
%t = bufferization.to_tensor %m : memref<?xf32>
Our analysis does not know where %t is coming from. But it has to implement BufferizableOpInterface::getAliasingOpOperands. Usually, we would look up the alias set of the OpOperand and maybe union the set of %t and %m. But that doesn’t work because %m is a memref. So we have to be conservative and assume that buffer(%t) may after bufferization alias with any other SSA value who’s definition dominates the bufferization.to_tensor op.
(We don’t do this at the moment. Instead, we assert that there’s no to_tensor/to_memref in the program.)