MLIR News, 17th edition (10/2/2020)

See the previous published edition.

Welcome to the seventeenth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!



Optimizations and Code Generation

Retargetable and Structured CodeGen / Linalg

  • Linalg on tensors now supports specifying reductions.
  • Work on transformations on Linalg on tensors has started.
  • Subview op gained rank-reducing semantics, and allows preparing to remove SliceOp as well as the ConvertLinalgToLLVM pass.
  • Work on mapping vectors to dynamic SSA values (e.g. loop ivs, SPMD ids) has started.

CPU codegen

  • Generalized vector reductions (and printing for testing/debugging) to work on arbitrary sized integers (i1, i2, i4, i6, etc.) for signless, signed, and unsigned (i32, si32, ui32).
    • Added tests for various sizes
    • This prepares a future XLA:CPU Linalg lowering into vector reductions
  • Continued investigation of XLA:CPU matvec performance
    • Generalized the matmulstrategy to include matvec, experimented with several tiling sizes and lowering strategies (dot vs axpy)
    • Extracted stand-alone benchmark for some of the problematic kernels (e.g. 128x16384xf32) but due to lack of reuse (without further fusion), performance remains far from peak


  • Target device related information is moved from resource limits to target environment.
  • SPIR-V function control are properly supported in (de)serialization.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

  • Published 2020 Q4 OKRs
  • Open sourced a test for MobileBERT on the SQuAD task, for use in Q4 targeted perf burndown
  • Started uVkCompute, a micro Vulkan compute pipeline and a collection of compute shaders. Will be used to assess performance across many GPUs in support of IREE’s Vulkan back-end.
  • Initial bring-up of Metal HAL driver

mlir-npcomp: Prototype for compiling numpy programs

(see the talk from last week)

TensorFlow / MLIR-HLO

  • GPU Kernel CodeGen:
    • Support for generating host and device side for unary operations has finally landed in TensorFlow (not yet enabled). We also added support for multiple CUDA architectures (fatbin). Next up is fixing performance issues.
    • We continue to build out missing pieces of our lowering pipeline to support further unary operations. This involves adding lowerings between dialects all the way down to LLVM.
  • XLA GPU Backend:
    • Started working on ElementalIrEmitter input → LHLO migration.
    • Experimented with “layout canonicalization”, aka permute dimensions such that physical layout is major-to-minor. This canonicalization replaces the need of having explicit layouts in MHLO. Also plan to try running layout assignment on a layout-less MHLO.

Recent Talks