MLIR News, 36th edition (6/12 - 6/25/2021)

See the previous published edition.
Welcome to the thirty-sixth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!



Table-driven Infrastructure


  • Async dialect now has recursive work splitting (“Eigen style”) for parallel operations, which significantly improved performance

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

  • IREE CPU backend can compile and execute a dynamically shaped MLP model (PR)
    • The IREE stack does not rely on shape dialect for backend shape management. Once converted, Linalg ops (and IREE’s high level ops) themselves carry all the shape information explicitly by design.
    • So far, representing broadcasts in a more canonical form by avoiding dynamic-dim broadcasts (and associated reshapes) gets pretty far. Still looking for cases where this approach falls short and possible solutions.
    • Work to enable more dynamism is proceeding jointly at the PyTorch/CHLO/TOSA levels of abstraction. We do not believe that MHLO proper is the right level to be doing (ranked) dynamic shapes, but redirecting more work to CHLO has simplified things and revealed patterns that can be applied across the frontends.
  • Most instances of static pass registrations have been removed from IREE (PR) – with about 50 remaining in some low level dialects. Some refactoring work in progress to clean up some legacy code and better represent the current state of IREE compilation (PR, PR)
  • Target triples and data layout information for LLVM targets plumbed through. (PR)
  • CUDA Backend:
    • Add new op to GPU dialect to represent constant MMA matrix
    • Expand lowering of vector to GPU MMA ops to support scf ops
  • New facility for saving traces from Python model execution for later replay. Includes updated iree-run-trace and iree-benchmark-trace standalone tools for replaying a trace and benchmarking (minimal dependency, C-based for maximum portability). Will replace more ad-hoc mechanisms for adding benchmarking workloads.

mlir-npcomp: Prototype for compiling numerical python programs

  • 2021Q3 roadmap PR
  • torch dialect is now standalone. There is no longer a dependency on basicpy dialect or builtin/std types/ops PR, PR, PR, PR, PR, PR, PR, PR, PR
  • Continued progress on ResNet: Batch norm PR and relu support PR

Recent Talks