MLIR News, 15th edition (9/4/2020)

See the previous published edition.

Welcome to the fifteenth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!



Table-driven Infrastructure

Optimizations and Code Generation

CPU codegen

  • Published MLIR case-studies documents related to CPU codegen for the Vector dialect:
  • Implemented an lowering option to use 32-bit indices when this improves speed (findings reported in the transfer document listed above):
    • 32-bit index comparison speeds up by 4x (AVX2) and 2x (AVX512)
    • benefits vector.create_mask and vector.transfer_read/write
  • Hosted a Sparse Tensors “meet-and-greet” with participants from University of Utah, Arizona, Pacific Northwest National Laboratory, Stanford, and Google. Please express your interest in the thread to be invited to future sessions.


  • More ops are supported in the dialect, including Intel subgroup ops and more GLSL ops.
  • Resource limit fields now have default values.
  • An optional name can be given to spv.module now.
  • SPIR-V to LLVM conversion GSOC project is wrapping up. The final batch of code, mainly for bringing up mlir-spirv-cpu-runner, is ready and in the process of landing one by one.


  • OpenMP: Added a conversion -convert-openmp-to-llvm from “OpenMP with standard dialect” to OpenMP with LLVM.
  • OpenACC: Updates to acc.loop operands and attributes.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

mlir-npcomp: Prototype for compiling numpy programs

NPCOMP gained an initial framework for declaratively specifying pytorch bindings in Python. This is intended to replace the existing mechanism, which is based on an unstable interface exported from Pytorch.

TensorFlow / MLIR-HLO

Further explored XLA->LMHLO FusionOp migration. Here are a few issues encountered:

  • MHLO is used for fused ops. It doesn’t have layouts, which XLA/GPU use for codegen. Options considered:
    • Make XLA/GPU not dependent on layouts during Fusion codegen. Caused performance regression for various reasons.
    • Canonicalize fused computations to always use descending physical layouts. It caused large performance regression, reasons not obvious.
    • Make MHLO ops carry layouts as an optimization hint. It works, but requires small fixes here and there.
  • XLA/GPU handles constants in a way that depends on op names. Serializing existing XLA HLO to MHLO doesn’t preserve the name. XLA/GPU refactoring is required to remove this dependency.

CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’

The LLHD Dialect gained some optimizations for the ‘extract’ operation, and better handling for memories.


1 Like