Welcome to the 67th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous editions.
Highlights, Discussions and Ecosystem:
-
Canonicalization in MLIR - Discussions triggered by RFC : Update to “General Design” section of Operation Canonicalizations in MLIR. Also read Dan Gohman’s [write-up on Canonicalization]. Key observations :
– canonicalizations is run frequently, and intermix with other canonicalization
– Because canonicalize are iterated, you need a lattice like approach where you don’t get cycles in canonicalizations
– Canonicalizations should not be required for correctness. If they are, then it’s a lowering, and canonicalize hooks are the wrong place to do it.
– You shouldn’t do something expensive that is O(n) in canonicalize.
– Canonicalize isn’t a great place to put things with complicated cost models, because the def-use graph changes a lot and the iterative nature means that the IR is changing a lot.
– at a given abstraction level two semantically equivalent program when canonicalized should be transformed in the same form as much as possible. -
Fusion-by-Diffusion ? [RFC]. YunFly, " We intend to fuse all other ops(
pack+fill+broadcast+add+relu
) into this deeply tiled loops in one pass/transform". -
Continued discussion on the [RFC] A New "One-Shot" Dialect Conversion Driver.
-
Jeremy Kun, " With the new polynomial dialect getting settled in, I wanted to start work on a pass that replaces transcendental math ops (like
sin
orrelu
) with polynomial approximations." — [RFC]: A polynomial approximation pass. -
Mahesh Ravishankar updated the documentation for TilingInterface (Thanks Mahesh! documentation is immensely useful to community). [click here].
-
[RFC] Vector Distribution for CPU (convert vector to physical register size vector) – " This pass is proposed so that we can use
tile vector
and vectorizefor loop
according to the hardware SIMD instructions." -
Continued discussion on [RFC] Sharding Framework Design for Device Mesh - #118 by fschlimb.
-
[RFC] Add affine.parallel. Proposal is intended to provide an effective representation of loops over affine induction variables whose iterations may safely be run in parallel. It draws on techniques and experience from the Stripe dialect discussed in the open design meeting on 2019-11-7 , as well as a proposal for a parallel loop operation / dialect made by Stephan Herhut.
-
Extending `tileConsumerAndFuseProducer` to handle more patterns.
MLIR Commits Recently:
-
zjgarvey added named op: linalg.conv_2d_ngchw_gfchw_q. This op is similar to
linalg.conv_2d_ngchw_gfchw, but additionally incorporates zero point
offset corrections… [click here]. -
Han Chung added port for unrolling vector.bitcast ops. [click here].
-
Matthias Springer and Markus Bock simplified the design of the
GreedyPatternRewriterDriver
class. This class used to inherit from bothPatternRewriter
andRewriterBase::Listener
and then attached itself as a listener. In the new design, the class has aPatternRewriter
field instead of inheriting fromPatternRewriter
, which is generally perferred in object-oriented programming. [click here]. -
Stephen Tozer set debug info format in MLIR → LLVM-IR translation. MLIR’s LLVM dialect does not internally support debug records, only converting to/from debug intrinsics. To smooth the transition from intrinsics to records, there is a step prior to IR->MLIR translation that switches the IR module to intrinsic-form; this patch adds the equivalent conversion to record-form at MLIR->IR translation. This is a partial reapply of #95098 which can be landed once the flang frontend has been updated by #95306. This is the counterpart to the earlier patch #89735 which handled the IR->MLIR conversion. [click here].
-
This change modifies the conversion of bufferization.clone to memref to
generate the runtime calculations and allocation to allow for cloning an
unranked memref. [click here]. -
Max added
FoldPadWithProducerReshapeOpByExpansion
fusion by collapsing and fusion by expansion patterns fortensor.pad
ops in ElementwiseOpFusion. Pad ops can be expanded or collapsed as long as none of the padded dimensions will be expanded or collapsed.[click here]. -
Andrzej updated tests for scalable vectors to: * vector-transfer-collapse-inner-most-dims.mlir [click here].
-
Matthias added a commit to simplify and improve documentation for the part of the
ConversionPatternRewriter
API that deals with signature conversions. [click here]. -
Krzysztof Drewniak generalized and improved -int-range-optimizations (#94712). [click here]. When the integer range analysis was first develop, a pass that did integer range-based constant folding was developed and used as a test pass. There was an intent to add such a folding to SCCP, but that hasn’t happened. Meanwhile, -int-range-optimizations was added to the arith dialect’s transformations. The cmpi simplification in that pass is a strict subset of the constant folding that lived in -test-int-range-inference. This commit moves the former test pass into -int-range-optimizaitons, subsuming its previous contents.
-
Fixed crash in the ownership-based buffer deallocation
pass when indirectly calling a function via SSA value. Such functions
must be conservatively assumed to be public. [click here]. -
Rolf Moren extended transform.foreach to take multiple arguments [click here]. This changes transform.foreach’s interface to take multiple arguments, e.g. transform.foreach %ops1, %ops2, %params : … { ^bb0(%op1, %op2, %param): BODY } The semantics are that the payloads for these handles get iterated over as if the payloads have been zipped-up together - BODY gets executed once for each such tuple.
-
Max added transpose + pack/unpack folding support for transpose ops in the form of
linalg.generic
ops. There were also some bugs with the permutation composing in the previous patterns, so this PR fixes these bugs and adds tests for them as well.Those fixed. [click here]. -
Guray added gpu.cluster_dim_blocks and gpu.cluster_block_id Op. [click here].
Related Projects
- Triton community meeting - https://www.youtube.com/watch?v=uRlqolhNbRk
- IREE community meeting - https://www.youtube.com/watch?v=b779to--7es
- OpenXLA community meeting - https://www.youtube.com/watch?v=YK1CLzIcsJ8&t=2s
Useful Links