See the previous published edition.
Welcome to the thirty-fourth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
The sources for the MLIR logo (in all its variants) are now publicly available here.
MLIR Core
Infrastructure
- The Greedy Pattern Rewriter supports now two modes of traversal. It also got a configuration structure to control various options.
- Block arguments can now hold a location.
- A new
print-ir-after-failure
IR pass printing flag was added to print the IR only after a pass fails.
Codegen
- Sparse tensor support:
- Codegen now handles the “dimension ordering” part to the sparse tensor type (e.g. row-wise vs. column-wise for matrices but generalized to n-dimensional).
- Python bindings with sparse tensor dialect and transformations have been implemented. As proof of concept, a Python program was written that generates and runs kernels for all sparse annotation combinations for SpMxSpM (64 total). This is ideal for verifying correctness of codegen but also performing a state space search for the best performing kernel variant
- MLIR Async dialect supports error propagation (similar to TFRT AsyncValue).
SPIR-V
spv.module
is changed to beSingleBlock
+NoTerminator
. This helps to removespv.mlir.endmodule
.
In the Ecosystem
IREE : An Experimental MLIR Execution Environment
- Replaced the VMLA reference back-end with VMVX. The motivation was to build a new reference back-end that is architecturally aligned with how the other back-ends have evolved.
- Added static library loading, in support of embedded use cases.
- LLVM backend now uses dynamic pass pipelines, part of a larger effort to make compilation more configurable and dynamic.
- MobileBERT TFLite model compiling successfully via TOSA.
- CUDA backend:
- Extended functional support to all XLA ops supported in IREE
- MobileNet runs successfully as well as some other smaller models
- Made the compilation flow more general to support targeting CUDA and ROCM through the same path
- Looking at solving the first level of obvious performance problems in MobileNet
TensorFlow / MLIR-HLO
Kernel Generator
- Generalized rank specialization has landed. Now it is possible to generate code for unranked kernels that result in an expression of cwise TF ops.
- Support for TF ops on complex types is further expanded. mHLO operations now can be lowered to
complex.mul
,complex.eq
,complex.neq
,complex.div
. IEEE 754 implementation of complex division is coming. - We are further experimenting with linalg vectorization for broadcasting logic. First results show that MLIR-generated kernels with vectorization are faster than the Eigen-generated ones.
MLIR-HLO / XLA
Following the presentation on DISC last months, AliBaba started upstreaming their code (PR 1, 2, 3, 4)
Recent Talks
Recent Publications
The Evolution of Domain-Specific Computing for Deep Learning in IEEE Circuits and Systems Magazine teases how Xilinx is using MLIR dialects to represent multicore devices with explicit data movement.
[…] this section provides an overview of how we are leveraging MLIR instruction to build next-generation domain-specific tooling for Xilinx Versal devices.