Welcome to the 66th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous editions.
Highlights, Discussions and Ecosystem:
-
Matthias Springer scripted an RFC to - Design a new (additional) dialect conversion driver without the quirks of the current dialect conversion driver. Why? What’s wrong with the current Dialect Conversion? The proposal [RFC] A New "One-Shot" Dialect Conversion Driver answers those questions, and provides (proposes)… “a new, simpler dialect conversion driver. (In addition to the existing dialect conversion driver. No changes needed for existing users of the current dialect conversion driver, if they do not want to use the new driver).”
-
Renato triggered discussions on deallocation, “I noticed that the ownership based deallocation pass adds the
bufferization.dealloc
op at the end of the scope, and thus moves the deallocations for all buffers in the context block to the end. This is safe to do, however it also keeps buffers alive when they’re long dead and can make larger models run out of memory.” [click here]. -
Sheen re-initiated discussion on – How to extend linalg ops without modifying MLIR code?. e.g. "
linalg.softmax
example: I want to tile and fuse the softmax and its producers and consumers directly, and treat the softmax as a whole, ignoring its actual computation". Some interesting points in the discussions, “…so far this has been points made in random discussions, forum and conferences. We’ve been dancing around the idea for over a year, and most people feel sympathetic to it, but not quite to create one.” The main controversies around the topic are: A: Is it isolated from above or not? If it is, then why not just outline into a function?; B: Why not just use a generic? For non perfectly nested ops, can we even safely define semantics?; C : Does it propagate interfaces through? Should it require that all ops have the same interfaces?; D: If all ops implement an interface, does that mean the group does, too? Are all interfaces composable that way? If not, how do we describe composability?; E: Can I merge any two groups together? Split them apart? -
Menookar proposes a compile-time optimization on existing
memref.alloc
to reduce memory usage and improve memory locality. [RFC] Compile-time memref.alloc Scheduling/Merging optimization. The RFC “… proposes an optimization to consolidate multiple allocations (memref.alloc
ops) into a singlememref.alloc
op and each static-shapedmemref.alloc
op will be transformed into a “slice” from thesingle allocated buffer
withmemref.view
and some compile-time decidedoffsets
. This optimization works onmemref
instead oftensor
ops, so it should be executed after bufferization pass, and before buffer-deallocation.” -
Discussions on the [RFC]Add op for semantics of nullptr in memref dialect. Alex, " It would be nice to take this opportunity to better specify the semantics of null memrefs. I suppose accessing any element in such a memref should be treated as undefined behavior. Do we want the load to result in
poison
? Separately, what should the lowering ofmemref.null
be? Memref constructed from a null pointer, but what offset/sizes/strides?" -
Discussions around semantics of named-ops - theory (what it should be) and practice (what is currently implicitly implied via implementation) - Notes from the MLIR Upstream Round Table @ EuroLLVM 2024
MLIR Commits Recently:
-
Mogball added - [mlir][ods] Optimize FoldAdaptor constructor (#93219) · llvm/llvm-project@264aaa5 · GitHub
-
Mubashar added - [mlir][vector] Add deinterleave operation to vector dialect (#92409) · llvm/llvm-project@bf4d99e · GitHub
-
Aart Bik, “This is a proof of concept recognition of the most basic forms of ReLu
operations, used to show-case sparsification of end-to-end PyTorch
models. In the long run, we must avoid lowering such constructs too
early (with this need for raising them back).” [click here]. -
Renato added more linalg named ops [click here].
-
Javed added a patch to identify linalg named ops such as linalg.fill/elementwise, towards a more comprehensive linalg.generic specialization. [click here].
-
Kunwar added support for reducing operations with multiple results
using PartialReductionOpInterface. Also added an implementation of
PartialReductionOpInterface for multiple results for linalg.generic. [click here]. -
Adam added fold pack and unpack of empty input tensor (#92247).
-
Felix added a patch to include the “no signed wrap” and “no unsigned wrap” flags, which can be used to annotate some Ops in the
arith
dialect and also in LLVMIR, in the integer range inference. [click here]. -
Christian consolidated the different topological sort utilities into one place. It adds them to the analysis folder because the
SliceAnalysis
uses some of these. [click here]. -
Spencer implemented folding and rewrite logic to eliminate no-op tensor and memref operations. This handles two specific cases: A: tensor.insert_slice operations where the size of the inserted sliceis known to be 0; B:memref.copy operations where either the source or target memrefs are known to be emtpy. [click here].
-
Jeremy added
simple canonicalization rules to the polynomial dialect. Mainly to get the boilerplate incorporated before more substantial canonicalization patterns are added
. [click here]. -
Matthias added a commit to turn the 1:N dialect conversion pattern for function signatures into a pattern for
FunctionOpInterface
. This is similar to the interface-based pattern that is provided with the 1:1 dialect conversion (populateFunctionOpInterfaceTypeConversionPattern
). No change in functionality apart from supporting allFunctionOpInterface
ops and not justfunc::FuncOp
[click here]. -
Andrzej split the
TransposeOpLowering
into two patterns: 1.Transpose2DWithUnitDimToShapeCast
- rewrites 2Dvector.transpose
asvector.shape_cast
(there has to be at least one unit dim); 2. TransposeOpLowering- the original pattern without the part extracted into
Transpose2DWithUnitDimToShapeCast`. [click here]. -
Benoit added Vector-dialect interleave-to-shuffle pattern, enable in VectorToSPIRV. [click here].
-
Corentin created an expansion pattern to lower math.rsqrt(x) into
fdiv(1, sqrt(x)). [click here]. -
Adam, “packs a matmul MxNxK operation into 4D blocked layout. Any present batch dimensions remain unchanged and the result is unpacked back to the original layout.” – [click here]]([mlir][linalg] Block pack matmul pass (#89782) · llvm/llvm-project@4c3db25 · GitHub).
Related Projects
- Triton community meeting - https://www.youtube.com/watch?v=uRlqolhNbRk
- IREE community meeting - https://www.youtube.com/watch?v=b779to--7es
- OpenXLA community meeting - https://www.youtube.com/watch?v=YK1CLzIcsJ8&t=2s
Useful Links