Motivation & Background
Many real-world workloads are expressed as structured, sequential computation graphs such as pipelines, processing chains, and iterative blocks. In these patterns, intermediate buffers are allocated, used, and released in strict sequence, resulting in lifetimes that do not overlap.
After the bufferization and deallocation pipeline, the IR contains explicit memref.alloc /memref.dealloc pairs. Buffers with non-overlapping lifetimes each receive their own heap allocation even when they could use shared memory. As a result, peak memory usage scales with the total number of intermediate allocations rather than with the maximum number of simultaneously live buffers.
// Example IR with two allocations whose lifetimes do not overlap
func.func @example(%arg0: memref<1024xf32>, %arg1: memref<512xf64>) {
%a = memref.alloc() : memref<1024xf32> // allocates 4096 bytes
linalg.generic ⦠ins(%arg0) outs(%a) ā¦
memref.dealloc %a : memref<1024xf32>
%b = memref.alloc() : memref<512xf64> // 4096 bytes
linalg.generic ⦠ins(%arg1) outs(%b) ā¦
memref.dealloc %b : memref<512xf64>
}
// These could share one 4096-byte allocation, a significant drop in peak memory requirement.
The bufferization documentation acknowledges this gap:
āThis implies reusing already allocated buffers when possible, turning bufferization into an algorithmically complex problem with similarities to register allocation.ā
āmlir/docs/Bufferization.md
Today, thereās no upstream buffer reuse pass in MLIR, which forces every major downstream stack ā IREE, TVM , and XLA to build their own. This duplicates effort, fragments memory optimizations across ecosystems, and raises the barrier for new MLIR adopters who must either accept inflated memory usage or build complex infrastructure themselves.
Proposal
This project proposes two components to address this gap :
-
An analysis pass that computes allocation lifetimes and reports peak live bytes and reuse opportunities without mutating IR.
-
An opt-in rewrite pass that merges non-overlapping
memref.alloc/memref.deallocpairs into a shared memory pool usingmemref.view
pass placement:
The pass runs after `lower-deallocations` and `promote-buffers-to-stack`:
one-shot-bufferize
ā buffer-deallocation-pipeline (ownership-based-buffer-deallocation ā lower-deallocations)
ā optimize-allocation-liveness // shrinks lifetimes (more reuse chances)
ā promote-buffers-to-stack // small allocs ā alloca (exclude from reuse)
ā buffer-reuse ā Proposed Pass
ā lower to LLVM
This placement avoids conflicts with ownership-based deallocation because:
bufferization.deallocops are already lowered so ownership flags and base-pointer aliasing checks are no longer present.- Lifetimes are already optimized, Small allocs are already promoted to stack, Only large heap-allocated buffers with explicit dealloc remain.
Scope
Our current scope is deliberately conservative to guarantee correctness while leaving a clear path for future extensions.
An allocation is eligible for pooling only if all conditions hold:
-
memref.allocandmemref.deallocare in the same block -
Static shape
-
Identity (contiguous) layout
-
Default memory space
-
Non-escaping (not returned, not passed to calls, not captured by region ops)
-
Proven non-overlapping lifetime with another eligible allocation
Uncertain cases will be skipped and tracked with per-reason statistics.
Mechanism
-
Collect alloc/dealloc pairs via
MemoryEffectOpInterface. -
Compute lifetime intervals using operation numbering and
BufferViewFlowAnalysis::resolve(). -
Assign reusable slots using greedy linear-scan allocation.
-
Compute pool layout:
- Total pool size = sum of slot sizes with alignment padding.
-
Rewrite IR:
-
Insert pool allocation
-
Replace original allocs with
memref.view -
Remove individual deallocs
-
Insert a single pool deallocation
-
// After Rewrite Example:
func.func @example(%arg0: memref<1024xf32>, %arg1: memref<512xf64>) {
%c0 = arith.constant 0 : index
%pool = memref.alloc() {alignment = 64} : memref<4096xi8>
%a = memref.view %pool[%c0][] : memref<4096xi8> to memref<1024xf32>
linalg.generic ... ins(%arg0) outs(%a) ...
%b = memref.view %pool[%c0][] : memref<4096xi8> to memref<512xf64>
linalg.generic ... ins(%arg1) outs(%b) ...
memref.dealloc %pool : memref<4096xi8>
}
// Peak memory reduces from 8192 bytes to 4096 bytes.
Expected Impact
-
Peak memory reduction for any pipeline with sequential non-overlapping temporary buffers (common in ML inference workloads).
-
Reusable lifetime analysis infrastructure that other passes or downstream users can build on.
-
Presents an upstream alternative to ad-hoc buffer reuse strategies, potentially reducing the need for each downstream user to develop and maintain separate solutions.
Iām considering this as a GSOC project. Iām posting this RFC for finding potential mentor, to gather feedback and ensure the direction aligns with MLIR expectations. I would appreciate feedback/guidance from maintainers or contributors of this area.