MLIR News, 34th edition (5/15 - 5/28/2021)

mehdi_amini · May 17, 2021, 4:34pm

See the previous published edition.
Welcome to the thirty-fourth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

The sources for the MLIR logo (in all its variants) are now publicly available here.

MLIR Core

Infrastructure

The Greedy Pattern Rewriter supports now two modes of traversal. It also got a configuration structure to control various options.
Block arguments can now hold a location.
A new print-ir-after-failure IR pass printing flag was added to print the IR only after a pass fails.

Codegen

Sparse tensor support:
- Codegen now handles the “dimension ordering” part to the sparse tensor type (e.g. row-wise vs. column-wise for matrices but generalized to n-dimensional).
- Python bindings with sparse tensor dialect and transformations have been implemented. As proof of concept, a Python program was written that generates and runs kernels for all sparse annotation combinations for SpMxSpM (64 total). This is ideal for verifying correctness of codegen but also performing a state space search for the best performing kernel variant
MLIR Async dialect supports error propagation (similar to TFRT AsyncValue).

SPIR-V

spv.module is changed to be SingleBlock + NoTerminator. This helps to remove spv.mlir.endmodule.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

Replaced the VMLA reference back-end with VMVX. The motivation was to build a new reference back-end that is architecturally aligned with how the other back-ends have evolved.
Added static library loading, in support of embedded use cases.
LLVM backend now uses dynamic pass pipelines, part of a larger effort to make compilation more configurable and dynamic.
MobileBERT TFLite model compiling successfully via TOSA.
CUDA backend:
- Extended functional support to all XLA ops supported in IREE
- MobileNet runs successfully as well as some other smaller models
- Made the compilation flow more general to support targeting CUDA and ROCM through the same path
- Looking at solving the first level of obvious performance problems in MobileNet

TensorFlow / MLIR-HLO

Kernel Generator

Generalized rank specialization has landed. Now it is possible to generate code for unranked kernels that result in an expression of cwise TF ops.
Support for TF ops on complex types is further expanded. mHLO operations now can be lowered to complex.mul, complex.eq, complex.neq, complex.div. IEEE 754 implementation of complex division is coming.
We are further experimenting with linalg vectorization for broadcasting logic. First results show that MLIR-generated kernels with vectorization are faster than the Eigen-generated ones.

MLIR-HLO / XLA

Following the presentation on DISC last months, AliBaba started upstreaming their code (PR 1, 2, 3, 4)

Recent Talks

2021-05-27: Quantum-Classical Compilation with MLIR ; slides - recording

Recent Publications

The Evolution of Domain-Specific Computing for Deep Learning in IEEE Circuits and Systems Magazine teases how Xilinx is using MLIR dialects to represent multicore devices with explicit data movement.

[…] this section provides an overview of how we are leveraging MLIR instruction to build next-generation domain-specific tooling for Xilinx Versal devices.

Topic	Replies	Views
MLIR News, 39th edition (7/24 - 8/7/2021) Newsletter	812	July 29, 2021
MLIR News, 35th edition (5/29 - 6/12/2021) Newsletter	961	June 1, 2021
MLIR News, 37th edition (6/26 - 7/9/2021) Newsletter	1100	June 29, 2021
MLIR News, 43rd edition (9/18 - 10/1/2021) Newsletter	792	September 24, 2021
MLIR News, 32nd edition (4/17 - 4/30/2021) Newsletter	1034	April 19, 2021