MLIR News, 34th edition (5/15 - 5/28/2021)

See the previous published edition.
Welcome to the thirty-fourth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

The sources for the MLIR logo (in all its variants) are now publicly available here.




  • Sparse tensor support:
    • Codegen now handles the “dimension ordering” part to the sparse tensor type (e.g. row-wise vs. column-wise for matrices but generalized to n-dimensional).
    • Python bindings with sparse tensor dialect and transformations have been implemented. As proof of concept, a Python program was written that generates and runs kernels for all sparse annotation combinations for SpMxSpM (64 total). This is ideal for verifying correctness of codegen but also performing a state space search for the best performing kernel variant
  • MLIR Async dialect supports error propagation (similar to TFRT AsyncValue).


  • spv.module is changed to be SingleBlock + NoTerminator. This helps to remove spv.mlir.endmodule.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

  • Replaced the VMLA reference back-end with VMVX. The motivation was to build a new reference back-end that is architecturally aligned with how the other back-ends have evolved.
  • Added static library loading, in support of embedded use cases.
  • LLVM backend now uses dynamic pass pipelines, part of a larger effort to make compilation more configurable and dynamic.
  • MobileBERT TFLite model compiling successfully via TOSA.
  • CUDA backend:
    • Extended functional support to all XLA ops supported in IREE
    • MobileNet runs successfully as well as some other smaller models
    • Made the compilation flow more general to support targeting CUDA and ROCM through the same path
    • Looking at solving the first level of obvious performance problems in MobileNet

TensorFlow / MLIR-HLO

Kernel Generator

  • Generalized rank specialization has landed. Now it is possible to generate code for unranked kernels that result in an expression of cwise TF ops.
  • Support for TF ops on complex types is further expanded. mHLO operations now can be lowered to complex.mul, complex.eq, complex.neq, complex.div. IEEE 754 implementation of complex division is coming.
  • We are further experimenting with linalg vectorization for broadcasting logic. First results show that MLIR-generated kernels with vectorization are faster than the Eigen-generated ones.


Following the presentation on DISC last months, AliBaba started upstreaming their code (PR 1, 2, 3, 4)

Recent Talks

Recent Publications

The Evolution of Domain-Specific Computing for Deep Learning in IEEE Circuits and Systems Magazine teases how Xilinx is using MLIR dialects to represent multicore devices with explicit data movement.

[…] this section provides an overview of how we are leveraging MLIR instruction to build next-generation domain-specific tooling for Xilinx Versal devices.