Remove tight coupling of the BufferDeallocation pass to std and linalg operations

This proposal is beyond linalg.copy vs. lhlo.copy or std.copy but about allowing different copies for different allocation kinds. Specifying which copy to use is just one side-effect of making this configurable (using callbacks like in type conversion or via an interface).

For instance, we currently have hard-coded the fact that we do not use free for alloca allocated memory (and maybe also not copying it). Instead, one could configure that this particular allocation operation has no-ops for copy and free.

To support reference counting, one could have an rc.alloc with rc.increment and rc.decrement operations.

I agree that this can be done by lowering alloc and copy differently but that approach does not allow mixing reference counted allocations with normal ones.

If we can agree to have a std.copy, then using allocation resources seems a good way to model this. We could then go ahead and have a reference counted resource.

1 Like

I’m running into the need for this as well, but it goes a step further: this pass should not have any dependencies on the type of the memory being dealt with. Even if linalg.copy was changed to std.copy that still is something for memrefs and std.alloc/std.dealloc/etc are also memref only. memref is a pretty specific thing and really limiting the usefulness of these kinds of passes.

Having a type interface-like thing that provided memory-like functions (alloc, dealloc, copy, set, etc) would be useful here in this pass as well as many others that just want to pass through what they are working on with minimal coupling.

I’m not sure there’s anything like a type interface today, though one could probably implement it with a dialect interface that was queried through the type (type → parent dialect → buffer interface) - interested in hearing other ideas. Definitely can’t do this with C++ templates (as then if you wanted to mix memrefs and other buffer-like things you’d have to stamp out this pass a whole bunch).

1 Like

We have this now. Attributes and types can implement interfaces just like operations. Using this facility to try to re-envision the ShapedType hierarchy has been brought up several times. I suspect the actual answer to that is not to s/ShapedType/ShapedTypeInterface/ but instead to do what you are suggesting and realize the specific abstractions that things need to operate generically on the types.

The interfaces doc is updated to describe the mechanics.

Aside from some design work, I think these generalizations are mostly in the just needs someone to care and do it category.

1 Like

We have this now.

Of course we do, MLIR rocks :heart:

Actually, we think that such a

query could be useful, but is still difficult to implement. However, considering the case that there are several types of allocs (my_alloc and other_alloc e.g.), we can not derive operations from such an interface, since we can not distinguish between both types. As described here:

Maybe it would be better to extend the allocation resources as described by

What do you think?

So we want to generalize this across two dimensions:

  1. Be able to work on different types beyond memref.
  2. Be able to have different alloc/free operations even for the same type.

If we wanted to go with AllocationResource as the central abstraction, then types that are different than memrefs would need to allocate from a different resource. Would this be sufficient for your needs, @benvanik?

If we model this via type interfaces as the sole means, then we can no longer allocate memref from different resources. This is currently needed to differentiate alloc from alloca but I can also envision using this to separate memrefs that are reference counted from ones that are not by using a different allocation resource for them.

In the end, this boils down to the question how one wants to compose things. I’d like to be able to use differently allocated memref values in the same context. The current approach of using the MemoryEffect interface to identify allocations works well for this.

I read the description, but I fail to see the motivation here?

Ref-counting schemes are fine, but this begs for different types than memref to model it though: it isn’t clear to me why should we have different allocation operation for the same type when we can extend the type system.

Sorry for the late reply. This fell off my radar.

I totally agree with @herhut that generalizing this to non-memref types is a little bit different from different alloc/free ops for the same type. Not sure how we want to disentangle that. I agree that wanting std.alloca and std.alloc to both return memref seems desirable and at odds with only relying on a type interface.

Given that we don’t currently have a requirement for non-memref types (or anybody actively pushing on that), I think we should consider that as a soft requirement at this point, unless we find a really obvious way to handle it.

This seems promising. You mean something like std.alloc() {resource = "gpu.on_device"} as in my original reply?

@mehdi_amini The basic motivation was to enable the emission of different copy operations than linalg.copy (and potentially different dialect-specific alloc and dealloc ops?). Currently, this tight coupling does not allow the BufferDeallocation pass to be applied to arbitrary dialect-specific scenarios. However, our intention was to make it as “generic as possible”.

As @_sean_silva mentions, I understand why we’d want to generate different alloc/dealloc/copy for different types, but it seems like you’re going beyond this?

This looks like a very welcome change – to provide the flexibility to use dialect-specific copy operations and custom alloc/free. Could we restart this discussion? The fact that the bufferization pass depends on linalg simply due to linalg.copy is already a layering problem.

We discusses this a bit more recently and came to the conclusion that the best way forward could be to add a new bufferize dialect, that would contain a copy operation (with implicit allocation) and two cast operations from tensor to memref and back to allow for gradual bufferization.

Users of this could then implement additional pattern to lower the copy operation to whatever target operation they prefer.

Also, by having the cast operation in the bufferize dialect, we can give it semantics that models its use more closely. In particular, it would no longer need to have any side-effects, given its constrained use.

Would this address your use-case, too, @bondhugula?

@dfki-mako @dfki-jugr

1 Like

While adding a memref copy operation sounds fine, do you really need a new dialect for just these three ops that are going to be interspersed among numerous other ops from another dialect? The copy operation could be added to the same dialect where memref casting, allocation and load/store ops live.

Also, what are the benefits of a copy with an implicit allocation instead of the standard pattern of alloc + copy? The latter form is already well handled by various patterns, utilities and passes and consistent with other patterns. For eg. adding a zero memref is a copy - you shouldn’t have to fold the alloc into it.

I’d still have concerns with “fake” ops that we can’t reason about properly using the usual tools (side-effects, etc.) and rely instead on “magic” properties.

Sounds good to me. However, would this also imply that we should revive the discussion about std.copy at this point?

I share your point of view in general on the one hand. On the other hand, according to my understanding, MLIR should always try to preserve “high-level knowledge” within the IR instead of creating nested patterns that might represent the same meaning but are more difficult to reason about. Taking this into account, makes the single-copy-op approach more appealing :smile: What do you think?

Hmm… implementing MLIR interfaces is always a good thing. However, I guess that dialect-specific operation are almost always “slightly magical”, as they express specific dialect/domain knowledge. The question is whether we can live with this small amount of magic that allows us to reason about bufferization in a more meaningful way :nerd_face:

Every operation has its own semantics of course, but all of them fits into a general more abstract model: an operation can write, read, subview, allocate, or free a memref. But IIRC these operation here want to have more constraint on the memref that can’t be described in the abstract and that no other pass/analysis can reason about.
We had a similar discussion recently on the linalg.padded_view revision where I had related concerns as well: ⚙ D93704 [mlir][Linalg] Introduce linalg.pad_tensor op.

I definitely share your concerns about opaque ops that we can’t even reason about in any way in the scope of the MLIR core passes. The right way might be to simply split the operations off from the standard dialect and try to avoid further potential issues regarding (potentially different) copy operations in the first place. Since there is currently a lot of activity regarding splitting/bridging of dialects, it might be a good idea to come back to this discussion after we have decided how to proceed with the memref dialect :slight_smile:

Having a copy with implicit allocation is beneficial when optimizing buffer allocation, as one does not need to track the corresponding alloc for the copy and furthermore has the guarantee that it is unused. This special variant of copy could have the additional constraint that it is illegal to mutate its result, which is a general assumption in the current bufferization. Bufferization assumes that buffers behave like values to some degree (not aliasing, not mutated once computed).

For code generation, one would definitely want to lower this to alloc + some implementation of copy. In the case of linalg, likely a linalg.copy.

1 Like

Replying to myself here to pick this up again.

So, where do we want to take this? With the recent discussion about casts in dialect conversion, we can either have bufferize use the standard cast operation or define its own. In any case, it should be a specialized operation just for the purpose of dialect conversion and not general purpose (Hence its own dialect to make this clear. We can also use memref.bufferize_cast if that sounds better).

I would vote for having a specialized cast, as I would want a range of canonicalization patterns that are useful in partial bufferization to move scalars across the bufferization boundary. Specifying those on the generic cast based on the types involved seems wrong to me.

And, for the reasons I stated above, I’d also like a copy operation with implicit alloc similar to what tensor_to_memref does today. Again, memref.bufferize_copy works for me, too.

Can you give examples of this?

(I’m +1 on everything you said though)