As for the Comprehensive Bufferization (we’ve been talking about renaming it to “One Shot Bufferization” or some other name), the following things would go into the new dialect:
-
ComprehensiveBufferize
transform and pass. This thing is doing the bufferization. It has dependencies on the tensor and the memref dialect. -
BufferizableOpInterface
: Ops that implement this interface can be bufferized. - “External model” implementations of the op interface. E.g., interface implementations for ops in the vector dialect, so that these ops can be bufferized. These should ideally live in their respective dialects (e.g., vector), but it may be too early for this. Or we may even want to have them in the bufferization dialect forever, so that other dialects stay small. There is a separate build target for each supported “external” dialect, so the only build targets that have dependencies on another dialect (apart from memref and tensor) are the ones that contain the external model.
- An attribute and its verifier.
For those unaware of what Comprehensive Bufferize is: It’s a single-pass bufferization that does a whole-function analysis, whereby ops can be from different dialects. This whole-function analysis makes it easy to detect “inplace bufferization” opportunities (i.e., avoiding buffer copies).
E.g., one main use case that we have is bufferizing matching tensor.extract_slice
/ tensor.insert_slice
pairs inplace during tiling of Linalg ops.
%0 = tensor.extract_slice %A [%offset] [%sz] [1]
%1 = "do_something"(%0)
%2 = "do_somthing_else"(%1)
%3 = tensor.insert_slice %2 into %A [%offset] [%sz] [1]
Comprehensive Bufferize can trace back the source of a tensor.insert_slice
in the reverse SSA use-def chain, until it eventually ends up at a tensor.extract_slice
, which extracts from the same buffer that the insert_slice
is inserting into. If that’s the case, no buffer copy is needed.
This is not something that the existing multi-pass bufferization can do at the moment, because the tensor-bufferize
pass has no knowledge about ops from other dialects that are on the reverse SSA use-def chain path. It is my understanding that the existing bufferization pass would insert a buffer copy. The “copy removal” pass could be extended, so that it would remove the copy again. However, it is easier to make the right decision in the first place; reasoning about SSA values is easier than reasoning about memref aliases.
On the other hand, there are also cases that the Comprehensive Bufferize pass cannot handle. E.g., returning newly allocated buffers from a function or block. So we need both and we have to make them composable. For more details, I wrote up a high-level description of the design of Comprehensive Bufferize.
comprehensive_bufferize.pdf (353.7 KB)