MLIR News, 8th edition (5/29/2020)

mehdi_amini · May 18, 2020, 4:35am

Welcome to the eighth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

Progress has been made on [RFC] Allowing Dialects to relax the SSA-dominance condition : most of the last ODM was a live discussion on this topic, we’re converging towards making SSA dominance an opt-in feature, but also mandatory for multiple-blocks regions.
Following the recent changes around nested builder support for operations with regions, we are now deprecating EDSC {Loop/Block}Builder. This part of a larger effort to merge the features provided by EDSC into the core infrastructure.

MLIR Core

Infrastructure

PatternRewriter now supports erasing blocks.
PatternRewriter is now aware of the implicit terminator management in region-holding operations.
We report a fatal error when C++ (or ODS) types and operations are used (built or casted) without being registered in the context. This catches misconfiguration of the compiler: everything “seems to work” but the folding or canonicalization patterns won’t be available for instance. Some subtle bugs that have been hard to track motivated these additions.

Table-driven Infrastructure

Optional attributes are now correctly handled in assembly formats.

Optimizations and Code Generation

AMD contributed some refactoring for ROCm/HIP (1, 2, 3), preparing to land an mlir-rocm-runner tool.
Buffer Assignment is being extended to support dynamic shapes and more control flow patterns. We also landed initial support for regions but more work is needed.
First patches to implement constraint checking for shape calculations have landed. Previously, generated code assumed that all constraints are met and that shape calculations (like broadcasting) are always possible.
TensorFromElementsOp has graduated from XLA HLO dialect to standard. It is the first in a series of operations that allow to construct tensors, currently used mostly in shape computations. For the unranked case, we are looking for an operation that allows to iteratively construct tensors of dynamic rank.
View and SubView ops have been revamped to have a more aligned semantics. A great care has been provided on multiple canonicalization, and on the documentation.
Linalg transforms have been refactored and made more usable and composable with options configuration objects. Patterns are exposed for finer-grained programmatic control. A stagedPattern application function has been added to make specific pattern composition, canonicalizations and more global transformations compose better.

SPIR-V

Several patches landed to define the types and ops for cooperative matrix. They lay down the foundation for generating faster matmul implementation on NVIDIA GPUs (providing access to the TensorCore units).
Standard to SPIR-V conversion now supports allocation/deallocation for workgroup memories.
Subview conversion for SPIR-V is updated to match the updated subview op.

Other

OpenMP dialect: Definition of flush, parallel, and master constructs have landed. Pretty printing and translation to follow.
Loop dialect was renamed to SCF (structured control flow).

In the Ecosystem

Flang, the LLVM Fortran Compiler

Refactoring of parse tree to FIR lowering code is in progress. The first step is to get the PFT (pre-FIR tree, a light-weight tree with pointers to the parse tree) changes upstreamed. Review for the first patch for lowering OpenMP constructs to the OpenMP dialect is also in progress.

IREE : An Experimental MLIR Execution Environment

Many upstream changes in support of promoting matmul input operands to workgroup memory in IREE (D80188, D80365, D80411, IREE commit)
Moved constantOp + linalg op fusion into MLIR core (D79838), enabling IREE to use the LinAlg/Tensor fusion pass in core and delete its own
Reached a milestone on being able to compile a dynamic-shaped MLP direct from TensorFlow and run on VMLA and GPU (modulo a patch for dynamic broadcast support in the interface to LinAlg).
Work progressing on AOT CPU compilation (vs using the JIT)
Significant work validating and extending test coverage on real GPUs (currently Nvidia and AMD)
Initial check-in and validation of a Keras training workload (currently only works with the SGD optimizer).

mlir-npcomp: Prototype for compiling numpy programs

Implemented sufficient lowerings from TCF/TCP->LLVMJit for a basic demo of the flow.
Working this next week on tying it all together with the Python side.

TensorFlow

Prototyped a solution for fully dynamic broadcast on ranked tensors from LHLO, with an expansion that is compatible with LinAlg and particularly also fusion.
Work is in progress now to replace the “XLA emitters” (the late part of the XLA codegen) with MLIR based codegen on CPU and GPU (more details to come).

Recent Publications

Domain-Specific Multi-Level IR Rewriting for GPU
This work was presented at one of our open meeting last November as “A Compiler Intermediate Representation for Stencils” (slides - recording)

Topic		Replies	Views
MLIR News, 29th edition (3/6 - 3/19/2021) Newsletter	0	1064	March 9, 2021
MLIR News, 22nd edition (12/12/2020) Newsletter	0	797	November 30, 2020
MLIR News, 19th edition (10/31/2020) Newsletter	0	1286	October 19, 2020
MLIR News, 23th edition (12/26/2020) Newsletter	0	787	December 14, 2020
MLIR News, 16th edition (9/18/2020) Newsletter	2	977	September 11, 2020