[RFC] Lifetime Annotations of Memory Within MLIR

AlexandreSinger · August 10, 2023, 9:03pm

Background

One of the many benefits of using MLIR is the ability to analyze code at different levels of abstraction. One of these analyses that my team (@kaitingwang and @frank_gao) and I think are quite useful is lifetime analysis, where the lifetime of pointers (the points in execution in which a pointer is in use) is known and can be used for optimizations such as buffer reuse.

Through optimization passes in MLIR, such as the Bufferization passes, buffers are created. At which time the lifetime of the buffers would be best known. Once these buffers are lowered to lower-level dialects, it becomes increasingly difficult to analyze their lifetimes (especially if they are mixed into different dialects).

Within LLVMIR, lifetime is marked using the llvm.lifetime.start and llvm.lifetime.end intrinsics, and there exists analysis passes within the LLVM project that use these lifetimes for optimizations.

https://llvm.org/docs/LangRef.html#int-lifestart

Proposed Addition

We propose that MLIR should also have the capability of explicitly declaring the lifetime of buffers within the IR itself. Since, especially in higher-level dialects, it is improper to work with pointers within MLIR, we propose that two new ops be contributed to the MemRef dialect:

memref::LifetimeStartOp which would lower to the LLVM::LifetimeStartOp
memref::LifetimeEndOp which would lower to the LLVM::LifetimeEndOp

Within the IR, the code would look something like this:

func.func @lifetime(%d : index) {
  %0 = memref.alloca() : memref<32xf32>
  %1 = memref.alloca(%d) : memref<?xf32>  
  // ... (no previous uses)
  memref.lifetime_start %0 : memref<32xf32>
  // First use of %0
  memref.lifetime_start %1 : memref<?xf32>
  // ... (some uses of the memrefs)
  memref.lifetime_end %0 : memref<32xf32>
  memref.lifetime_end %1 : memref<?xf32>
  // ... (no further uses)
  return
}

Which would then be lowered to the LLVM dialect like so:

func.func @lifetime(%d : index) {
  // ...
  %A = unrealized_conversion_cast %0 : memref<32xf32> -> llvm.struct<...>
  llvm.intr.lifetime.start %A.aligned, 128
  // ...
  %B = unrealized_conversion_cast %1 : memref<?xf32> -> llvm.struct<...>
  llvm.intr.lifetime.start %B.aligned, -1

  // ...
  return
}

The benefits of this approach is that it allows for the lifetime of the buffers to be declared at a higher-level dialect (when the buffers are created) and the information can be passed to LLVM for more complex lifetime analysis, while being non-intrusive to other dialects while lowering. This would also open the opportunity for high-level lifetime analysis and optimizations to be implemented in MLIR itself.

Alternative Solution

Alternatively, the lifetime information could be encoded into an op like the memref::AllocaScopeOp:

https://mlir.llvm.org/docs/Dialects/MemRef/#memrefalloca_scope-memrefallocascopeop

However, there are a few issues with using this op for this purpose at the moment:

Firstly, this op is currently being lowered into LLVM::StackSaveOp and LLVM::StackRestoreOp. Functionality could be added into the MemRefToLLVM conversion pass to emit the lifetime information; however, this would require more options to be added to allow a toolchain to decide whether the op would emit stack save/restore or lifetime information.

Secondly, lifetime is not specific to stack allocations; the lifetime of any MemRef is of interest for lifetime analysis and transformations. Since the AllocaScopeOp is specifically for stack allocations, the lifetime of heap allocated MemRefs (memref::AllocOps) would be left behind. This would be hard to fix, since the semantics of this op are specific to stack allocations.

Finally, although the use of a scoped region to define the lifetime of a single memref is quite elegant, to use this for each and every allocation would not work since it is possible for the lifetime of two buffers to overlap, which would require the scopes to be nested. This causes a problem for when the lifetime of buffer A begins before buffer C, but the lifetime of buffer A ends before buffer C, as shown in the example below:

memref.alloca_scope {
  %A = memref.alloca(): memref<32xf32>
  // Initialize %0
  memref.alloca_scope {
    %B = memref.alloca(): memref<32xf32>
    // Initialize %1
    memref.alloca_scope {
      %C = memref.alloca(): memref<32xf32>
      affine.for %i = 0 to 32 {
        %0 = affine.load %A[%i] : memref<32xf32>
        %1 = affine.load %B[%i] : memref<32xf32>
        %2 = arith.addf %0, %1 : f32
        affine.store %2, %C[%i] : memref<32xf32>
      }
      // Buffer A's lifetime ends here; however, C's may not necessarily end here.
    }
  }
  // Cannot continue to use C here since it was declared within the scope of A
}

For these reasons, we believe that adding the lifetime annotations as explicit ops within the MemRef dialect is more appropriate.

rengolin · August 11, 2023, 9:12am

I welcome this addition, both for stack and heap objects. Stack colouring, thread-aware hoisting and buffer reuse becomes much easier with these markers.

Practically speaking, bufferization can add the markers by just looking at def-use chains. But buffer reuse is a complex and global analysis that is easier handled in multiple steps. For example, bufferization can be naive and emit redundant buffers and the markers, and then some later liveness analysis pass marks reuse and a final rewrite removes the duplicates.

We can always improve the one-shot bufferization, but with dialect intermixing (upstream/downstream), it becomes hard to infer semantics at the boundaries. There will always be a place for post-bufferization analyses, rewrites and cleanups.

However, unlike LLVM IR, buffer semantics may be different depending on the dialect and we can’t control downstream dialects, so I’d avoid carrying on lifetime markers for too long. Or at least make it “safe” to ignore them.

An unrelated pass may move some buffers and not update the markers, which has the same effect as LLVM IR metadata. So, if your downstream pass / dialect needs them, you could make sure your passes happen “just after” bufferization, or re-run lifetime analysis.

Alternatively, we can make it so all operations on memrefs become conscious of lifetime (ex. in the builder API), which would be a great cost to all dialects that don’t care about it, and easy to introduce bugs on unrelated operations.

Moxinilian · August 11, 2023, 9:36am

Why can lifetimes not be inferred when lowering from memref to LLVM by looking at all memref.alloca and walking through uses from there? If a downstream dialect uses the result of the alloca in a way that creates a view in it, there could be an interface to “join” the lifetime of another variable to the lifetime of the allocated value (typically a result of the alloca user). Is implementing such an interface too much work for memref users perhaps?

I’m saying this because I am a bit worried about the bookkeeping burden of making sure lifetime annotations remain correct over transformations. But I feel like what I am suggesting only moves the burden to not forgetting to implement the interface as a third-party dialect.

rengolin · August 11, 2023, 9:44am

It can. That’s what liveness analysis does.

That’s why I compare the lifetime markers with LLVM metadata.

Yup. There is no silver bullet, AFAICS.

AlexandreSinger · August 11, 2023, 11:56am

Thank you both for your excellent comments.

@rengolin I completely agree with your points on the effects that other transformation passes may have on these annotations and the importance of not carrying these markers for too long. I like your point on running the lifeliness analysis pass (the pass that may generate these annotations) just before performing any lifetime-based transformations. In my opinion, lifetime analysis fits into MLIR as either a pseudo front-end for lower-level IRs (such as LLVMIR) to emit these annotations or as a mechanism for higher-level optimizations within MLIR. Through the use of a general lifeliness analysis pass (that can be run at different levels of abstraction), both of these goals can be realized.

On @Moxinilian 's point, you are absolutely correct, the burdens of bookkeeping can be quite large so long as the lifetime annotations exist for a long amount of time; therefore performing the lifeliness analysis just before lowering the dialects (where transformations that could invalidate it are less likely) would be an ideal place for it. The goal is to use the high-level MemRef semantics to aid in lifeliness analysis (since performing the same analysis on bare pointers is considerably more difficult). You make an excellent point that an interface may need to be used that would “join” the lifetime of one MemRef to another. I do believe, however, that this analysis should be separate from the MemRefToLLVM lowering pass since this analysis is not needed for those that do not want it and it would be ideal to do this analysis slightly earlier than when MemRefToLLVM may be called. This is because some other lowering passes may use UnrealizedConversionCasts on the MemRefs as part of their lowering, which effectively turns the MemRefs into bare pointers, making the analysis more challenging. Ideally this would be implemented as an analysis pass as part of the MemRef dialect that could be run just before lowering the dialects towards a target.

The goal of this contribution is to enable the annotation of MemRefs with lifetime information. This would allow external projects to implement their own form of lifeliness analysis (based on the assumptions of their own application) and / or enable a general form of lifeliness analysis to be potentially contributed to upstream MLIR in the future for all to enjoy (as previously described).

nlopes · August 11, 2023, 1:10pm

Just a quick note to say that the design of LLVM’s lifetime intrinsics is not great. You may want to avoid doing the same mistakes.

For example, when LLVM sees an alloca, it must decide whether it’s alive or not. To conclude that, it must peak in to the future to check if there might be a lifetime.start on that alloca. If yes, then the alloca is dead, otherwise it’s alive. Add branching to the equation, and it’s nuts.
A solution is to explicitly mark allocas as dead so there’s no guessing.

As far as using lifetime markers on heap allocations, although that is documented in LangRef, LLVM’s code doesn’t support it. Someone demanded that text, but I don’t think anything was ever implemented to use it. Actually, I’m thinking in removing that text until there’s an implementation.

Search the archives for discussions on LLVM’s lifetime markers. There’s plenty of stuff there. I’ve written just a short summary.

AlexandreSinger · August 13, 2023, 2:59am

@nlopes We share your concern with the design of LLVM’s lifetime intrinsics. Our hope is to use MLIR’s ability to represent IR in a more complex way than LLVM to better enforce and analyze lifetime within the IR. In the Alternative Solution section of the RFC, I mentioned our attempt to use a scoped region to define the lifetime of MemRefs, but due to potential overlapping lifetimes this idea could not succeed.

As explained in a previous comment, liveliness analysis should occur at a higher level of abstraction (before lowering) and needs to persist until lowering (if used for targeting a backend like LLVM) or until a pipeline flow has concluded (such as in the case of Bufferization). From this, it is clear that the marking of lifetime must be persistent from one pass to another; therefore it would need to be baked into the IR somehow.

We agree that the use of intrinsics to label lifetime within LLVM has made it tricky to determine whether a buffer is alive or dead at any given instruction; however, other than the aforementioned scoped region approach, we feel that using distinct ops to label the points where the lifetime begins and ends is the best way to handle this situation. This would allow tools such as dominance analysis combined with MLIR’s higher level control flow representation to make analyzing and working with lifetimes easier. Our hope is that the built-in verification and canonicalization passes within MLIR can help alleviate some of the pain points of the LLVM implementation by better enforcing the semantics of these ops within the IR itself.

On the topic of lifetime markers on heap allocations, although LLVM code currently does not support it, we still believe that it has a place in MLIR. External projects and future passes that implement buffer reuse, for example, would be indifferent to whether the buffer is allocated on the stack or the heap. Since in MLIR they are both allocated at a high level (using the alloc op for heap and alloca for stack), if the code is allocating two buffer’s of the same size when it could get away with just one, that is something that can and should be optimized. When lowering to LLVMIR, these ops can simply be removed if they are used on anything other than an alloca (through a pass like the memref-expand pass that legalizes the MemRef dialect for export to LLVMIR); but within MLIR itself, the opportunity is still there.

efriedma-quic · August 16, 2023, 9:57pm

For “why llvm.lifetime.start/end is broken”, see in particular discussion on incorrect transformation around icmp: unclear semantics of "lifetime" intrinsics leads to miscompilation · Issue #45725 · llvm/llvm-project · GitHub .

I understand wanting to handle “lifetimes” on both the stack and heap, but I’m not sure it actually makes sense to use the same solution for both. In particular, on the stack, you want the compiler to manage allocas to minimize stack usage. On the heap, you don’t want that: you want the compiler to allocate memory exactly where you told it to. (If you didn’t want that, you’d be calling memory management functions like malloc/free, instead of trying to explicitly reuse memory.)

For heap and other explicitly managed memory, you probably want an API that looks more like C++ placement new/delete: you construct an object in a given location in memory, use it, then delete it. So something like “ptr lifetime.new(ptr, i64)” to allocate a new object at a given address, then “void lifetime.delete(ptr)” to end that object’s lifetime. Then when you want to use the memory again, you construct another object. This exposes the maximum amount of useful information, and it’s easier to analyze because you can treat each value returned by lifetime.new as a distinct object.

mehdi_amini · August 17, 2023, 6:47am

(ignoring below all the problems related to the practical experience from LLVM for now)

There is something not clear to me in the proposal: what is “lifetime” for a memref?
Is this from alloc to dealloc, or instead intended for something more conceptual about “when is the memref ‘valid’ to use”, in the way C++ differentiate memory allocation from object construction? (which is also what LLVM lifetime was intended to model?)
That is for example, would one memref allocation have possibly multiple (non-overlapping) section of lifetime.start / lifetime.end in this model?

rengolin · August 17, 2023, 12:17pm

Is memref a reference to the data or to the buffer that contains the data? If it’s the former, then when the “data is dead” we can reuse its buffer, somehow like a register allocator. If it’s the latter, then we can treat it like an arena allocator and placement new (as @efriedma-quic) suggests.

But that still doesn’t clear up what a memref lifetime means!

Exactly! Do we model lifetime of the data (program variable mapping) through a memref (pointer to a buffer region), or do we link memrefs to variables, then lifetime is of memrefs, but they can be reused with some colouring algorithm? Would this make the heap look more like a stack?

AlexandreSinger · August 18, 2023, 5:58pm

@efriedma-quic I think this is an excellent point. We originally had in mind that, since heap allocation in MLIR is done through the alloca op similar to the alloc op, the heap allocation would carry the same semantic as stack allocation (the user wants memory allocated and they do not care where or how, they just want it available); however, you brought up an excellent point. Heap allocations inherently carry with them the assumption that the user has full control of the memory and only releases it when it is freed by the user. Also, as explained in previous discussions on this topic in llvm, heap allocations have the free op which can act like a lifetime end.

With this in mind, we think it would be best (at least in this initial proposal) to focus only on stack based allocations. It would also be easier to relax this assumption to allow heap allocations in the future than it would be to disallow it later (causing some of the problems we see in the LLVM implementation today).

Our initial hope for including heap allocations as well as stack allocations was to reuse the analysis such that both can benefit from this proposal; however, you are correct that the semantics of heap allocations should be preserved.

AlexandreSinger · August 18, 2023, 6:05pm

We truly appreciate all of the discussion on this topic. We understand that lifetime markers in LLVM has been a hotly contested topic and we are happy that people do not want to repeat the same problems within MLIR. We have done our best to look through the resources discussed in this thread and tried to come up with high-level assumptions that the MLIR version of the lifetime markers should have.

We have raised a revision on Phabricator introducing the lifetime start and end op as described in this thread. We detail the assumptions we believe the lifetime markers should have in the description of the lifetime_start op. Any comments on this would be appreciated:
https://reviews.llvm.org/D158088

AlexandreSinger · August 18, 2023, 6:34pm

@mehdi_amini This is an important point that I hope I can clerify. We see these markers as notations to the compiler for where the buffer was expected to be used when it was created; meaning that before the lifetime begins or after the lifetime ends, the buffer can be changed into any state without consequence to program functionality. If the buffer was generated by a frontend, then the scope in which the buffer was defined can be used as its lifetime (or even the information could be passed from the user somehow). If the buffer was generated by an optimization pass (such as Bufferization within MLIR which turns Tensors into buffers), then the lifetime can be deduced by looking at the use-def chain of the original Tensor and inserting the lifetime_start just before the first use of the tensor and the lifetime_end just after the last use of the tensor if possible.

This lifetime information can then be passed into the compiler to give more information about where the buffer was intended on being valid. Before the lifetime start or after the lifetime end, the creator of the buffer does not expect the buffer to be used. This allows for passes such as buffer reuse, which can use these lifetime markers to detect if two buffers do not overlap and legally reuse them. Our goal is to make these lifetime markers useful first and foremost.

With this in mind, I can answer your example question: “would one memref allocation have possibly multiple (non-overlapping) section of lifetime.start / lifetime.end in this model”. Hypothetically it could. It is possible that the liferange of a buffer could be split into two discrete regions (where in between the buffer can be changed without any consequence to the functionality of the program); however this complicates the possible analysis. These would be difficult to insert automatically (from the Bufferization example), it would be difficult to semantically validate (if the user were to insert them), and it would be (I think) impossible to come from the frontend if scoped regions are used to generate them.

We want these buffers to be as usable as possible. By forcing buffers to only have one section of lifetime, we feel this simplifies the analysis more than we lose. We have recently included this in the documentation for the lifetime markers within the revision we raised:

4. A path of execution within the program cannot execute a `lifetime_start`
   op on the same MemRef more than once.

This should hopefully make any analysis passes that use these markers more easily distinguish the life ranges of the buffers.

mehdi_amini · August 19, 2023, 11:14pm

I guess I don’t understand how you can express buffer re-use without having multiple non-overlapping lifetime over a buffer. Similarly what about subviews of a buffer creating spatially non-overlapping (but temporarily overlapping)?

It seems your current model is describing just the lifetime of the buffer allocation? I don’t really quite understand why it is useful right now?

rengolin · August 20, 2023, 9:40am

Indeed. It needs to be the lifetime of values not buffers. The best way to do that would be to bring it from tensors (through bufferization) as I mention above. Either as an existing annotation on tensors (via some liveness analysis pass at tensor level) or introduced by bufferization by doing that analysis on the fly.

Right, which could be a good candidate for stack promotion, and given lifetime markers (and thread information), also hoist it to the right loop nest level.

AlexandreSinger · August 21, 2023, 2:56pm

I apologize in advance for this longer response, but I felt an example would best explain where we see this contribution fitting into MLIR. Where we find this contribution most useful is when combating the loss of information when lowering from the Tensor level of abstraction to the MemRef level of abstraction.

Take the following example:

func.func @example(%A : tensor<32xf32>) -> tensor<32xf32> {
  %0 = dialectA.op %A : tensor<32xf32>
  %1 = dialectB.op %0 : tensor<32xf32>
  %2 = dialectC.op %1 : tensor<32xf32>
  return %2 : tensor<32xf32>
}

Since the Tensor type in MLIR is SSA, it inherently contains within itself lifetime. The lifetime begins when the tensor is created (it cannot be used before this point) and the lifetime ends when it is last used (there are no further uses and there should not exist any other values pointing to the same region of memory).

However, in order to be compiled to CPUs, GPUs, Accelerators, or anything with memory, it needs to be lowered into the MemRef dialect. Since the above example is all static shaped, and the size of the Tensors are reletively small, we can lower the Tensors into buffers allocated on the stack; which may look something like the following:

func.func @example(%A : memref<32xf32>, %OUT : memref<32xf32>) {
  %buff0 = memref.alloca() : memref<32xf32>
  dialectA.op %A, %buff0 : memref<32xf32> -> memref<32xf32>
  %buff1 = memref.alloca() : memref<32xf32>
  dialectB.op %buff0, %buff1 : memref<32xf32> -> memref<32xf32>
  %buff2 = memref.alloca() : memref<32xf32>
  dialectC.op %buff1, %buff2 : memref<32xf32> -> memref<32xf32>
  memref.copy %buff2, %OUT : memref<32xf32> -> memref<32xf32>
}

Notice, however, that we have lost information. MLIR allows for stack allocation to be anywhere within the function (not just at the entry like in LLVM), so we can express when the lifetime of the buffers began; however, we have lost information about when the lifetime of the buffers ended. Prior to this conversion, the end of the lifetime of the buffer was know (since it was the last use of the original Tensor), but that information can no longer be captured at the MemRef level. This would require expensive analysis to reconstruct where the lifetime of the MemRef has ended, since later in the pipeline the MemRefs may be subviewed or casted in ways the compiler may not expect (for example in the lowering process of the hypothetical dialectA, dialectB, and dialectC ops).

We argue that if the MemRef dialect had the ability to express the lifetimes, this information can be preserved to lower level dialects that may use it. Continuing with the prior example, we propose that the conversion do the following:

func.func @example(%A : memref<32xf32>, %OUT : memref<32xf32>) {
  %buff0 = memref.alloca() : memref<32xf32>
  %buff1 = memref.alloca() : memref<32xf32>
  %buff2 = memref.alloca() : memref<32xf32>
  memref.lifetime_start %buff0 : memref<32xf32> // [ lifetime 0
  dialectA.op %A, %buff0 : memref<32xf32> -> memref<32xf32>
  memref.lifetime_start %buff1 : memref<32xf32> // [ lifetime 1
  dialectB.op %buff0, %buff1 : memref<32xf32> -> memref<32xf32>
  memref.lifetime_end %buff0 : memref<32xf32> //     lifetime 0 ]
  memref.lifetime_start %buff2 : memref<32xf32> // [ lifetime 2
  dialectC.op %buff1, %buff2 : memref<32xf32> -> memref<32xf32>
  memref.lifetime_end %buff1 : memref<32xf32> //     lifetime 1 ]
  memref.copy %buff2, %OUT : memref<32xf32> -> memref<32xf32>
  memref.lifetime_end %buff2 : memref<32xf32> //     lifetime 2 ]
}

With this information now expressed in the MemRef dialect, it is now trivial to notice that %buff0 can be reused instead of allocating more space on the stack for %buff2 since their lifetimes do not overlap. It is important to note that this these markers were simply known from the declaration and last use of the Tensor that was bufferized.

This information can also be turned off (by not emitting these ops) if the user does not use these annotations.

I hope this example helps to clarify why we think this is useful to contribute to the MLIR infrastructure.

mehdi_amini · August 21, 2023, 6:38pm

I understand what it is and what you’re doing, you’re not really providing anything new right now, so it still isn’t clear to me that this is the most useful model.
You are trying to tie together the “lifetime” of the memory allocation with the lifetime of the objects in memory, I think a couple of folks mentioned it and asked why, but you haven’t addressed this question yet?

Take for example a compiler like XLA which will allocate a single buffer and subview into the buffer. The lifetime of the “tensor” (actually memref of course) are not tied in any way to the allocation.

It could look (pseudo code):

func.func @example(%A : memref<32xf32>, %OUT : memref<32xf32>) {
  %buff = memref.alloca() : memref<64xf32>
  %t1 = memref.view %buff[0...32] : memref<32xf32>
  %t2 = memref.view %buff[32...64] : memref<32xf32> // disjoint from %t1
  %t3 = memref.view %buff[0...64] : memref<64xf32> // overlap with %t1 and %t2
  // ... more temporary overlapping views here.

  // ... some computations below

Now all these %tx “tensor” have different lifetime in the function, and it can be interesting to track them, but none of it is tied to the memory allocation itself.

pag · August 21, 2023, 6:51pm

This proposal suffers the same flaw that LLVM’s lifetimes and debug information intrinsics suffer: they are asynchronous with respect to dataflow. That is, they take as input the value, but they don’t produce a new value. In the general case, getting in the way hinders optimization, though it also makes it harder for other transforms to break invariants (e.g. by replacing the value, by reordering the intrinsic, etc.). In this specific case, getting in the way seems like exactly how you’d want to do things, because then the only uses (to be optimized) would exist between the two lifetime calls.

Thus, why not something like:

%buff0_lifeline = memref.lifetime_start %buff0 : memref<32xf32> // [ lifetime 0
...  // uses are all `%buff0_lifeline`, not `%buff0`.
memref.lifetime_end %buff0_lifeline : memref<32xf32> //     lifetime 0 ]

The validation logic for lifetime_start would require that its input have no other uses, and the validation logic for lifetime_end would be similar to your current logic.

nlopes · August 21, 2023, 6:56pm

Yep, I think that makes optimizations harder than they should be.
Creating a new value, a la memory SSA, where memory is a register, is the way to go IMHO.

rengolin · August 21, 2023, 8:09pm

You’re still marking lifetime of the buffers, not the variables that live in them, as if there’s a direct connection between them. With destination-passing, for example, there will be buffer reuse (in-place ops) where the buffer’s own lifetime will be longer than any variable that inhabits it.

I think @pag is going in the right direction, making the markers themselves as the objects tracking liveness, not the buffers. They would represent the values living in the buffers, not the buffers themselves.

That could track multiple objects on the same buffer, multiple buffers for the same object, and we could even construct an “array of markers” to cover the case @mehdi_amini is talking about.

Topic		Replies	Views
MLIR News, 53th edition (16th August 2023) Newsletter llvm-weekly	0	769	August 16, 2023
MLIR News, 62nd edition (14th Feb 2024) Newsletter llvm-weekly	0	466	February 13, 2024
Need help on code generation semantics for memref MLIR	3	323	September 30, 2020
Codegen Dialect Overview MLIR	11	14689	March 24, 2024
[RFC] Adding memref.calloc MLIR	8	170	December 7, 2023

[RFC] Lifetime Annotations of Memory Within MLIR

Background

Proposed Addition

Alternative Solution

Related Topics