MLIR News, 61st edition (28th Jan 2024)

javedabsar · January 27, 2024, 4:43pm

Welcome to the 61st issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous edition.

Highlights and Ecosystem:

Lot of improvements to bufferizer (e.g. deallocation, see details below) by Matthias; arith-to-arm-sme conversion pass and arm-sme work by Andrzej and Cullen; lots of clang-tidy fixes (Mehdi); tiling fusion changes.
Some RFCs in active discussions - [RFC] `ptr` dialect & modularizing ptr ops in the LLVM dialect; [RFC] Add XeGPU dialect for Intel GPUs; [RFC] Tiling interface supports fuse consumer
EuroLLVM 2024 in Vienna, Austria Important dates & More

MLIR Commits Recently:

Matthias Springer improved MLIR one-shot-bufferizer’s buffer deallocation. The buffer deallocation pass checks the IR (“operation preconditions”) to make sure that there is no IR that is unsupported. In such a case, the pass signals a failure. The pass now rejects all ops with unknown memory effects. We do not know whether such an op allocates memory or not. Therefore, the buffer deallocation pass does not know whether a deallocation op should be inserted or not. Memory effects are queried from the MemoryEffectOpInterface interface. Ops that do not implement this interface but have the RecursiveMemoryEffects trait do not have any side effects (apart from the ones that their nested ops may have). Unregistered ops are now rejected by the pass because they do not implement the MemoryEffectOpInterface and neither do we know if they have RecursiveMemoryEffects or not. All test cases that currently have unregistered ops are updated to use registered ops. [click here],
Cullen Rhodes added arith-to-arm-sme conversion pass (#78197) to convertArith dialect to ArmSME dialect. [click here]. There’s currently only a single op that’s converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, ‘createLoopOverTileSlices’ is moved to ArmSME utils since it’s relevant for both conversions.
Mahesh added changes that now bring the tiling and fusion capabilities using
scf.forall on par with what was already supported by scf.for. Using LoopLikeOpInterface as the basis for the implementation unifies all the tiling logic for both scf.for and scf.forall. The only difference is the actual loop generation. This is a follow up to #72178 [click here].
Andrzej extended vector.insert_strided_sliceandvector.insert_strided_slice`
to allow scalable input and output vectors. For scalable sizes, the corresponding slice size has to match the corresponding dimension in the output/input vector (insert/extract, respectively). [click here].
Kohei Yamaguchi fixed SimplifyClones with dealloc before cloneOp. The SimplifyClones pass relies on the assumption that the deallocOp follows the cloneOp. However, a crash occurs when there is a
redundantDealloc preceding the cloneOp. This PR addresses the issue by ensuring the presence of deallocOp after cloneOp. The verification is performed by checking if the loop of the sub sequent node of cloneOp reaches the tail of the list. [click here].
Ryan added Fold cmp(x, x) when x isn’t a constant (#78812). Such cases show up in the middle of optimizations passes, e.g., after some rewrites and then CSE. The current folder can fold such cases when the inputs are constant; this patch improves it to fold even if the inputs are non-constant. [click here].
Matthias added a new interface method to BufferizableOpInterface: hasTensorSemantics. This method returns “true” if the op has tensor semantics and should be bufferized. Until now, we assumed that an op has tensor semantics if it has tensor operands and/or tensor op results. However, there are ops like ml_program.global that do not have any results/operands but must still be bufferized (#75103). The new interface method can return “true” for such ops. [click here].
Currently the tileConsumerAndFuseProducerGreedilyUsingSCFFor method greedily fuses through all slices that are generated during the tile and fuse flow. That is not the normal use case. Ideally the caller would like to control which slices get fused and which dont. This patch [click here] introduces a new field to the SCFTileAndFuseOptions to specify this control. The contol function also allows the caller to specify if the replacement for the fused producer needs to be yielded from within the tiled computation. This allows replacing the fused producers in case they have other uses. Without this the original producers still survive negating the utility of the fusion. The change here also means that the name of the function tileConsumerAndFuseProducerGreedily... can be updated.

Related Projects

Triton community meeting - https://www.youtube.com/watch?v=uRlqolhNbRk
IREE community meeting - https://www.youtube.com/watch?v=b779to--7es
OpenXLA community meeting - https://www.youtube.com/watch?v=YK1CLzIcsJ8&t=2s

Useful Links

Latest Community topics - LLVM Discussion Forums
MLIR Open Design Meetings
Deprecations & Current Refactoring
TensorFlow Forum
Alex Bradbury’s LLVM Weekly
IREE (openxla.github.io)

Topic	Replies	Views
MLIR News, 60th edition (7th Jan 2024) Newsletter llvm-weekly	617	January 7, 2024
MLIR News, 54th edition (30th August 2023) Newsletter llvm-weekly	630	August 26, 2023
MLIR News, 59th edition (20th December 2023) Newsletter llvm-weekly	511	December 18, 2023
MLIR News, 23th edition (12/26/2020) Newsletter	787	December 14, 2020
WIP - MLIR News 72th Edition (26th Nov 2024) Newsletter llvm-weekly	124	November 24, 2024

MLIR News, 61st edition (28th Jan 2024)

Related topics