Welcome to the 61st issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous edition.
Highlights and Ecosystem:
-
Lot of improvements to bufferizer (e.g. deallocation, see details below) by Matthias; arith-to-arm-sme conversion pass and arm-sme work by Andrzej and Cullen; lots of clang-tidy fixes (Mehdi); tiling fusion changes.
-
Some RFCs in active discussions - [RFC] `ptr` dialect & modularizing ptr ops in the LLVM dialect; [RFC] Add XeGPU dialect for Intel GPUs; [RFC] Tiling interface supports fuse consumer
MLIR Commits Recently:
-
Matthias Springer improved MLIR one-shot-bufferizerâs buffer deallocation. The buffer deallocation pass checks the IR (âoperation preconditionsâ) to make sure that there is no IR that is unsupported. In such a case, the pass signals a failure. The pass now rejects all ops with unknown memory effects. We do not know whether such an op allocates memory or not. Therefore, the buffer deallocation pass does not know whether a deallocation op should be inserted or not. Memory effects are queried from the
MemoryEffectOpInterface
interface. Ops that do not implement this interface but have theRecursiveMemoryEffects
trait do not have any side effects (apart from the ones that their nested ops may have). Unregistered ops are now rejected by the pass because they do not implement theMemoryEffectOpInterface
and neither do we know if they haveRecursiveMemoryEffects
or not. All test cases that currently have unregistered ops are updated to use registered ops. [click here], -
Cullen Rhodes added
arith-to-arm-sme conversion
pass (#78197) to convertArith dialect to ArmSME dialect. [click here]. Thereâs currently only a single op thatâs converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, âcreateLoopOverTileSlicesâ is moved to ArmSME utils since itâs relevant for both conversions. -
Mahesh added changes that now bring the tiling and fusion capabilities using
scf.forall
on par with what was already supported byscf.for
. UsingLoopLikeOpInterface
as the basis for the implementation unifies all the tiling logic for bothscf.for
andscf.forall
. The only difference is the actual loop generation. This is a follow up to #72178 [click here]. -
Andrzej extended vector.insert_strided_slice
and
vector.insert_strided_slice`
to allow scalable input and output vectors. For scalable sizes, the corresponding slice size has to match the corresponding dimension in the output/input vector (insert/extract, respectively). [click here]. -
Kohei Yamaguchi fixed SimplifyClones with dealloc before cloneOp. The SimplifyClones pass relies on the assumption that the deallocOp follows the cloneOp. However, a crash occurs when there is a
redundantDealloc preceding the cloneOp. This PR addresses the issue by ensuring the presence of deallocOp after cloneOp. The verification is performed by checking if the loop of the sub sequent node of cloneOp reaches the tail of the list. [click here]. -
Ryan added Fold cmp(x, x) when x isnât a constant (#78812). Such cases show up in the middle of optimizations passes, e.g., after some rewrites and then CSE. The current folder can fold such cases when the inputs are constant; this patch improves it to fold even if the inputs are non-constant. [click here].
-
Matthias added a new interface method to
BufferizableOpInterface
:hasTensorSemantics
. This method returns âtrueâ if the op has tensor semantics and should be bufferized. Until now, we assumed that an op has tensor semantics if it has tensor operands and/or tensor op results. However, there are ops likeml_program.global
that do not have any results/operands but must still be bufferized (#75103). The new interface method can return âtrueâ for such ops. [click here]. -
Currently the
tileConsumerAndFuseProducerGreedilyUsingSCFFor
method greedily fuses through all slices that are generated during the tile and fuse flow. That is not the normal use case. Ideally the caller would like to control which slices get fused and which dont. This patch [click here] introduces a new field to theSCFTileAndFuseOptions
to specify this control. The contol function also allows the caller to specify if the replacement for the fused producer needs to be yielded from within the tiled computation. This allows replacing the fused producers in case they have other uses. Without this the original producers still survive negating the utility of the fusion. The change here also means that the name of the functiontileConsumerAndFuseProducerGreedily...
can be updated.
Related Projects
- Triton community meeting - https://www.youtube.com/watch?v=uRlqolhNbRk
- IREE community meeting - https://www.youtube.com/watch?v=b779to--7es
- OpenXLA community meeting - https://www.youtube.com/watch?v=YK1CLzIcsJ8&t=2s
Useful Links