MLIR News, 67th edition (18th June 2024)

javedabsar · June 15, 2024, 12:09pm

Welcome to the 67th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous editions.

Highlights, Discussions and Ecosystem:

Canonicalization in MLIR - Discussions triggered by RFC : Update to “General Design” section of Operation Canonicalizations in MLIR. Also read Dan Gohman’s [write-up on Canonicalization]. Key observations :
– canonicalizations is run frequently, and intermix with other canonicalization
– Because canonicalize are iterated, you need a lattice like approach where you don’t get cycles in canonicalizations
– Canonicalizations should not be required for correctness. If they are, then it’s a lowering, and canonicalize hooks are the wrong place to do it.
– You shouldn’t do something expensive that is O(n) in canonicalize.
– Canonicalize isn’t a great place to put things with complicated cost models, because the def-use graph changes a lot and the iterative nature means that the IR is changing a lot.
– at a given abstraction level two semantically equivalent program when canonicalized should be transformed in the same form as much as possible.
Fusion-by-Diffusion ? [RFC]. YunFly, " We intend to fuse all other ops(pack+fill+broadcast+add+relu ) into this deeply tiled loops in one pass/transform".
Continued discussion on the [RFC] A New "One-Shot" Dialect Conversion Driver.
Jeremy Kun, " With the new polynomial dialect getting settled in, I wanted to start work on a pass that replaces transcendental math ops (like sin or relu ) with polynomial approximations." — [RFC]: A polynomial approximation pass.
Mahesh Ravishankar updated the documentation for TilingInterface (Thanks Mahesh! documentation is immensely useful to community). [click here].
[RFC] Vector Distribution for CPU (convert vector to physical register size vector) – " This pass is proposed so that we can use tile vector and vectorize for loop according to the hardware SIMD instructions."
Continued discussion on [RFC] Sharding Framework Design for Device Mesh - #118 by fschlimb.
[RFC] Add affine.parallel. Proposal is intended to provide an effective representation of loops over affine induction variables whose iterations may safely be run in parallel. It draws on techniques and experience from the Stripe dialect discussed in the open design meeting on 2019-11-7 , as well as a proposal for a parallel loop operation / dialect made by Stephan Herhut.
[RFC] Add operandIndex to sideeffect instance
Extending `tileConsumerAndFuseProducer` to handle more patterns.
[RFC] New dialect to expose handy utilities.

MLIR Commits Recently:

zjgarvey added named op: linalg.conv_2d_ngchw_gfchw_q. This op is similar to
linalg.conv_2d_ngchw_gfchw, but additionally incorporates zero point
offset corrections… [click here].
Han Chung added port for unrolling vector.bitcast ops. [click here].
Matthias Springer and Markus Bock simplified the design of the GreedyPatternRewriterDriver class. This class used to inherit from both PatternRewriter and RewriterBase::Listener and then attached itself as a listener. In the new design, the class has a PatternRewriter field instead of inheriting from PatternRewriter, which is generally perferred in object-oriented programming. [click here].
Stephen Tozer set debug info format in MLIR → LLVM-IR translation. MLIR’s LLVM dialect does not internally support debug records, only converting to/from debug intrinsics. To smooth the transition from intrinsics to records, there is a step prior to IR->MLIR translation that switches the IR module to intrinsic-form; this patch adds the equivalent conversion to record-form at MLIR->IR translation. This is a partial reapply of #95098 which can be landed once the flang frontend has been updated by #95306. This is the counterpart to the earlier patch #89735 which handled the IR->MLIR conversion. [click here].
This change modifies the conversion of bufferization.clone to memref to
generate the runtime calculations and allocation to allow for cloning an
unranked memref. [click here].
Max added FoldPadWithProducerReshapeOpByExpansion fusion by collapsing and fusion by expansion patterns for tensor.pad ops in ElementwiseOpFusion. Pad ops can be expanded or collapsed as long as none of the padded dimensions will be expanded or collapsed.[click here].
Andrzej updated tests for scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir [click here].
Matthias added a commit to simplify and improve documentation for the part of the ConversionPatternRewriter API that deals with signature conversions. [click here].
Krzysztof Drewniak generalized and improved -int-range-optimizations (#94712). [click here]. When the integer range analysis was first develop, a pass that did integer range-based constant folding was developed and used as a test pass. There was an intent to add such a folding to SCCP, but that hasn’t happened. Meanwhile, -int-range-optimizations was added to the arith dialect’s transformations. The cmpi simplification in that pass is a strict subset of the constant folding that lived in -test-int-range-inference. This commit moves the former test pass into -int-range-optimizaitons, subsuming its previous contents.
Fixed crash in the ownership-based buffer deallocation
pass when indirectly calling a function via SSA value. Such functions
must be conservatively assumed to be public. [click here].
Rolf Moren extended transform.foreach to take multiple arguments [click here]. This changes transform.foreach’s interface to take multiple arguments, e.g. transform.foreach %ops1, %ops2, %params : … { ^bb0(%op1, %op2, %param): BODY } The semantics are that the payloads for these handles get iterated over as if the payloads have been zipped-up together - BODY gets executed once for each such tuple.
Max added transpose + pack/unpack folding support for transpose ops in the form of linalg.generic ops. There were also some bugs with the permutation composing in the previous patterns, so this PR fixes these bugs and adds tests for them as well.Those fixed. [click here].
Guray added gpu.cluster_dim_blocks and gpu.cluster_block_id Op. [click here].

Related Projects

Triton community meeting - https://www.youtube.com/watch?v=uRlqolhNbRk
IREE community meeting - https://www.youtube.com/watch?v=b779to--7es
OpenXLA community meeting - https://www.youtube.com/watch?v=YK1CLzIcsJ8&t=2s

Useful Links

Topic	Replies	Views
MLIR News, 62nd edition (14th Feb 2024) Newsletter llvm-weekly	531	February 13, 2024
MLIR News, 54th edition (30th August 2023) Newsletter llvm-weekly	624	August 26, 2023
MLIR News, 64th edition (15th April 2024) Newsletter llvm-weekly	483	April 14, 2024
MLIR News, 66th edition (27th May 2024) Newsletter llvm-weekly	481	May 27, 2024
MLIR News, 65th edition (7th May 2024) Newsletter llvm-weekly	463	May 6, 2024

MLIR News, 67th edition (18th June 2024)

Related Topics