MLIR News, 8th edition (5/29/2020)

See the previous published edition.

Welcome to the eighth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

MLIR Core

Infrastructure

  • PatternRewriter now supports erasing blocks.
  • PatternRewriter is now aware of the implicit terminator management in region-holding operations.
  • We report a fatal error when C++ (or ODS) types and operations are used (built or casted) without being registered in the context. This catches misconfiguration of the compiler: everything “seems to work” but the folding or canonicalization patterns won’t be available for instance. Some subtle bugs that have been hard to track motivated these additions.

Table-driven Infrastructure

Optimizations and Code Generation

  • AMD contributed some refactoring for ROCm/HIP (1, 2, 3), preparing to land an mlir-rocm-runner tool.
  • Buffer Assignment is being extended to support dynamic shapes and more control flow patterns. We also landed initial support for regions but more work is needed.
  • First patches to implement constraint checking for shape calculations have landed. Previously, generated code assumed that all constraints are met and that shape calculations (like broadcasting) are always possible.
  • TensorFromElementsOp has graduated from XLA HLO dialect to standard. It is the first in a series of operations that allow to construct tensors, currently used mostly in shape computations. For the unranked case, we are looking for an operation that allows to iteratively construct tensors of dynamic rank.
  • View and SubView ops have been revamped to have a more aligned semantics. A great care has been provided on multiple canonicalization, and on the documentation.
  • Linalg transforms have been refactored and made more usable and composable with options configuration objects. Patterns are exposed for finer-grained programmatic control. A stagedPattern application function has been added to make specific pattern composition, canonicalizations and more global transformations compose better.

SPIR-V

  • Several patches landed to define the types and ops for cooperative matrix. They lay down the foundation for generating faster matmul implementation on NVIDIA GPUs (providing access to the TensorCore units).
  • Standard to SPIR-V conversion now supports allocation/deallocation for workgroup memories.
  • Subview conversion for SPIR-V is updated to match the updated subview op.

Other

  • OpenMP dialect: Definition of flush, parallel, and master constructs have landed. Pretty printing and translation to follow.
  • Loop dialect was renamed to SCF (structured control flow).

In the Ecosystem

Flang, the LLVM Fortran Compiler

Refactoring of parse tree to FIR lowering code is in progress. The first step is to get the PFT (pre-FIR tree, a light-weight tree with pointers to the parse tree) changes upstreamed. Review for the first patch for lowering OpenMP constructs to the OpenMP dialect is also in progress.

IREE : An Experimental MLIR Execution Environment

  • Many upstream changes in support of promoting matmul input operands to workgroup memory in IREE (D80188, D80365, D80411, IREE commit)
  • Moved constantOp + linalg op fusion into MLIR core (D79838), enabling IREE to use the LinAlg/Tensor fusion pass in core and delete its own
  • Reached a milestone on being able to compile a dynamic-shaped MLP direct from TensorFlow and run on VMLA and GPU (modulo a patch for dynamic broadcast support in the interface to LinAlg).
  • Work progressing on AOT CPU compilation (vs using the JIT)
  • Significant work validating and extending test coverage on real GPUs (currently Nvidia and AMD)
  • Initial check-in and validation of a Keras training workload (currently only works with the SGD optimizer).

mlir-npcomp: Prototype for compiling numpy programs

  • Implemented sufficient lowerings from TCF/TCP->LLVMJit for a basic demo of the flow.
  • Working this next week on tying it all together with the Python side.

TensorFlow

  • Prototyped a solution for fully dynamic broadcast on ranked tensors from LHLO, with an expansion that is compatible with LinAlg and particularly also fusion.
  • Work is in progress now to replace the “XLA emitters” (the late part of the XLA codegen) with MLIR based codegen on CPU and GPU (more details to come).

Recent Publications