Bufferization: function call conventions?

Following the discussions of the last days, and after wrestling with various bufferization options for about a week, I was wondering if you have defined, and possibly also written down, the calling conventions governing bufferization.

As I see it now, there are two conventions already used in practice by various bufferization options when a function outputs a tensor:

  1. The output tensor becomes an output memref. This seems to be the default option. The implicit assumption seems to be that the callee allocates it.
  2. The output tensor becomes an input memref. This effect is attained using option --buffer-results-to-out-params. The implicit assumption is that the caller allocates the buffer.

However, what I just wrote cannot allow (by itself) the compilation of a tensor-based code in a way that excludes memory leaks and memory errors. For rule 1, what seems to be a minimum extra rules is:

  • the callee must itself allocate the output memref, not just send back the pointer to an input memref (e.g. through a view).
  • the caller must deallocate the memref itself or must output it to a higher level.

For rule 2, things seem a bit simpler. It’s just that the callee should never deallocate memrefs received as input.

These extra assumptions allow, for instance, the automatic synthesis of dealloc operations for memrefs allocated during a function call.

With respect to these considerations, my main question is: do you have a written document covering the current development on bufferization? I could help with reviewing it, and possibly add a few words on the way synchronous languages handle this (in a completely different way).

@albertcohen @Ulysse

I am presenting on bufferization at next week’s MLIR Open Design Meeting. And I’ll add documentation after a patch or two finish landing to get things into the final state that is ready to document.

You are correct that calling conventions are tricky. I don’t think there’s a one-size-fits-all solution. The initial bufferization conversion process creates form 1, but it doesn’t prescribe any particular calling convention. The current one is well-defined even if views into inputs are returned, if callers check the “allocated pointer” in the memref descriptor to see if it matches any of the inputs they provided.

We also don’t yet have any infrastructure around indicating the properties of memref function arguments. We will eventually want features like indicating that a particular argument does not alias any other arguments. Another useful property to annotate is that the function can mutate an input argument buffer in place – e.g. reuse it as a scratchpad – to reduce the total memory used by the program.

Nice. I will attend.

I did not attend that presentation but I would like to pick this up again.

The current implementation is only correct in case 2 where buffers for results are passed in, as in that form no global analysis is required. The bufferization passes only deallocate memrefs that were created by an alloc operation (as evidenced by having the corresponding memory effect). As a consequence, function arguments are never freed.

It would also be interesting to support the first case. Adding the extra constraints to avoid leaks could be a first step. This could be an extra pass that inserts copies for returned memrefs if they are not locally allocated and treating functions as allocating operations which require their results to be free’d.

I don’t think we have the infrastructure to add the allocating memory effect onto functions conditionally. So one would likely need to define additional call operation with that effect. Or use a different way to configure the deallocation pass and how it detects allocations.

@dfki-mako FYI.

Sounds good to me. The question is whether we want to “annotate” function arguments, carry additional information in the type system and/or express this knowledge using ops (+ opt. additional program analyses). The first proposal might not be the best solution since it does not really feel well integrated into the high-level infrastructure of MLIR (However, it would be the “traditional” way which works out quite well for calling conventions). Adding knowledge to the type system might also lead to the question whether we also want to express additional address spaces, which cannot interfere by definition, for instance. Not sure whether this is also on the agenda and/or should be covered in this scope.

@_sean_silva What do you think? In particular with regards to additional ops that could allow us to model abstract calling conventions on top of the existing MLIR dialects.

Extending the memref type with some sort of notion of alias sets doesn’t feel like the right design to me. That’s just my gut feeling – if someone has an interesting idea, I’m willing to look at it though.

As this is load-bearing on externally visible calling convention boundaries it cannot be an analysis.

Annotating is a lightweight solution. Using function argument/result attrs is a simple solution. Some sort of lightweight “alias assertion ops” might be interesting to investigate as well if we want to express very complex situations other than simple alias sets and boolean flags like “can it be used as a scratchpad”.

The biggest difficulty I see is how a frontend will provide the calling convention information. Two ideas:

  • Bufferize and then run a frontend-specific pass that adjusts the calling conventions for external functions.
  • Frontends emit memref wrapper functions (around the rest of the tensor code) with their desired calling convention, and then have all the tensor functions be private. Then after we bufferize, we can inline into the wrappers and start to use the information that is annotated.

I totally agree and I think that this might not be the best way to do it.

Obviously, it cannot be an analysis on its own :wink:

Exactly :slight_smile: I guess investing more effort into lightweight ops might be the better way instead of dealing with boolean flags.

I think that both ideas can might be appropriate. On the one hand, creating and running a frontend-specific pass helps us to hide “potentially unnecessary” information from other passes running in between. On the other hand, using wrapper functions might be useful to leverage the exposed information about contents/ownership/reuse etc. of buffers in all internal passes. However, it might be cleaner to use frontend-specific “adjustment” passes in the end… :thinking:

Should note that you aren’t really going to be able to get out of annotating, or at least the infra is going to have to support it as a mechanism for providing alias information/side effects/etc.

Along the lines of the above, annotating is kind of a requirement for the infra to support. I’m not sure what you mean by does not really feel well integrated into the high-level infrastructure of MLIR, but I would not use what is currently possible with respect to interprocedural analysis/aliasing/etc. in MLIR as any kind of indicator of what the infra can/will support. We’ve basically done very little-to-no work in this area.

Can you elaborate what you mean by this? You cannot infer aliasing properties for call-sites that you cannot see, so you can only infer this for private functions where the whole call graph is visible. For externally visible functions, you need a calling convention (that might feature annotations). Is that what you mean here?

Using wrappers has the added benefits of making the external API independent of how bufferization rewrites signatures. Also, it would give a natural place to put annotations on memref arguments while still being in tensor land.

One approach from there could be a pass that does naive bufferization to bring everything to buffer land. From there you can do a series of local and global analyses and rewrites to reduce memory pressure (phase ordering yeah!).

Yes, exactly. No analysis can recover the information about external calls, so IR annotations are required to be provided by some oracle such as the frontend.

Exactly :slight_smile: That is essentially what I meant. I totally agree that you cannot recover information about external calls using an analysis.

I agree completely and that is essentially what I was referring to :nerd_face: @herhut: Using wrapper functions definitely help to make the API independent of any internal bufferization functionality.