See the previous published edition.
Welcome to the thirty-fourth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
The sources for the MLIR logo (in all its variants) are now publicly available here.
- The Greedy Pattern Rewriter supports now two modes of traversal. It also got a configuration structure to control various options.
- Block arguments can now hold a location.
- A new
print-ir-after-failureIR pass printing flag was added to print the IR only after a pass fails.
- Sparse tensor support:
- Codegen now handles the “dimension ordering” part to the sparse tensor type (e.g. row-wise vs. column-wise for matrices but generalized to n-dimensional).
- Python bindings with sparse tensor dialect and transformations have been implemented. As proof of concept, a Python program was written that generates and runs kernels for all sparse annotation combinations for SpMxSpM (64 total). This is ideal for verifying correctness of codegen but also performing a state space search for the best performing kernel variant
- MLIR Async dialect supports error propagation (similar to TFRT AsyncValue).
spv.moduleis changed to be
NoTerminator. This helps to remove
IREE : An Experimental MLIR Execution Environment
- Replaced the VMLA reference back-end with VMVX. The motivation was to build a new reference back-end that is architecturally aligned with how the other back-ends have evolved.
- Added static library loading, in support of embedded use cases.
- LLVM backend now uses dynamic pass pipelines, part of a larger effort to make compilation more configurable and dynamic.
- MobileBERT TFLite model compiling successfully via TOSA.
- CUDA backend:
- Extended functional support to all XLA ops supported in IREE
- MobileNet runs successfully as well as some other smaller models
- Made the compilation flow more general to support targeting CUDA and ROCM through the same path
- Looking at solving the first level of obvious performance problems in MobileNet
- Generalized rank specialization has landed. Now it is possible to generate code for unranked kernels that result in an expression of cwise TF ops.
- Support for TF ops on complex types is further expanded. mHLO operations now can be lowered to
complex.div. IEEE 754 implementation of complex division is coming.
- We are further experimenting with linalg vectorization for broadcasting logic. First results show that MLIR-generated kernels with vectorization are faster than the Eigen-generated ones.
The Evolution of Domain-Specific Computing for Deep Learning in IEEE Circuits and Systems Magazine teases how Xilinx is using MLIR dialects to represent multicore devices with explicit data movement.
[…] this section provides an overview of how we are leveraging MLIR instruction to build next-generation domain-specific tooling for Xilinx Versal devices.