Remove tight coupling of the BufferDeallocation pass to std and linalg operations

Sounds good to me. However, would this also imply that we should revive the discussion about std.copy at this point?

I share your point of view in general on the one hand. On the other hand, according to my understanding, MLIR should always try to preserve “high-level knowledge” within the IR instead of creating nested patterns that might represent the same meaning but are more difficult to reason about. Taking this into account, makes the single-copy-op approach more appealing :smile: What do you think?

Hmm… implementing MLIR interfaces is always a good thing. However, I guess that dialect-specific operation are almost always “slightly magical”, as they express specific dialect/domain knowledge. The question is whether we can live with this small amount of magic that allows us to reason about bufferization in a more meaningful way :nerd_face:

Every operation has its own semantics of course, but all of them fits into a general more abstract model: an operation can write, read, subview, allocate, or free a memref. But IIRC these operation here want to have more constraint on the memref that can’t be described in the abstract and that no other pass/analysis can reason about.
We had a similar discussion recently on the linalg.padded_view revision where I had related concerns as well: ⚙ D93704 [mlir][Linalg] Introduce linalg.pad_tensor op.

I definitely share your concerns about opaque ops that we can’t even reason about in any way in the scope of the MLIR core passes. The right way might be to simply split the operations off from the standard dialect and try to avoid further potential issues regarding (potentially different) copy operations in the first place. Since there is currently a lot of activity regarding splitting/bridging of dialects, it might be a good idea to come back to this discussion after we have decided how to proceed with the memref dialect :slight_smile:

Having a copy with implicit allocation is beneficial when optimizing buffer allocation, as one does not need to track the corresponding alloc for the copy and furthermore has the guarantee that it is unused. This special variant of copy could have the additional constraint that it is illegal to mutate its result, which is a general assumption in the current bufferization. Bufferization assumes that buffers behave like values to some degree (not aliasing, not mutated once computed).

For code generation, one would definitely want to lower this to alloc + some implementation of copy. In the case of linalg, likely a linalg.copy.

1 Like

Replying to myself here to pick this up again.

So, where do we want to take this? With the recent discussion about casts in dialect conversion, we can either have bufferize use the standard cast operation or define its own. In any case, it should be a specialized operation just for the purpose of dialect conversion and not general purpose (Hence its own dialect to make this clear. We can also use memref.bufferize_cast if that sounds better).

I would vote for having a specialized cast, as I would want a range of canonicalization patterns that are useful in partial bufferization to move scalars across the bufferization boundary. Specifying those on the generic cast based on the types involved seems wrong to me.

And, for the reasons I stated above, I’d also like a copy operation with implicit alloc similar to what tensor_to_memref does today. Again, memref.bufferize_copy works for me, too.

Can you give examples of this?

(I’m +1 on everything you said though)

I’d call it memref.clone : it seems like its semantics does not have to be tied to the bufferization process.

+1 here :slight_smile:

Works for me. That would give us memref.bufferize_cast and memref.clone. If that is what we can agree on, maybe @dfki-mako could get these added and integrated?

As that will replace the tensor_to_memref operation, should we remove that one again?

I think @_sean_silva discovered one of these patterns already, which is memref.load(memref.bufferize_cast). Other interesting cases are dim, rank, shape.shape_of. All of these get created by bufferize patterns on the bufferized form but can be forwarded to the tensor directly.