MLIR News, 29th edition (3/6 - 3/19/2021)

mehdi_amini · March 9, 2021, 5:32am

See the previous published edition.
Welcome to the twenty-nineth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

Two exciting papers in the “recent publications” section at the bottom of this newsletter, check them out!
The Intel AMX dialect was added and discussed at the last open meeting (see “recent talks” section below).
Multiple RFCs are in-flight with patches up for review:

MLIR Core

Infrastructure

The initial revision for the DataLayout infrastructure has landed.
New OpTraitRewritePattern and OpInterfaceRewritePattern classes are being added to make matching operations with traits and interfaces easier (and much more efficient).
OwningRewritePatternList and OwningRewritePatternList::insert are undergoing a rename to RewritePatternSet and RewritePatternSet::add respectively. (Likely more minor/major refactorings related to canonicalization patterns still to come)
OwningRewritePatternList now supports a simpler insert that accepts a function pointer that implements the matchAndRewrite of the pattern.
The greedy pattern rewrite driver now populates its initial worklist top-down (which may result in some test case churn for users).
The destructors of Attribute/Type storage objects are now called when a context is destroyed, i.e. complex destructors are now supported (e.g. for parameters that dynamically allocate memory).

Table-driven Infrastructure

Operation Asm Format: Support for “else” groups is being added to optional elements. This allows for specifying a group of elements to parse/print when an optional group is not present.

CPU codegen

A new AMX vector dialect for Intel Advanced Matrix Extensions has been added
- unleashes the power of AMX using MLIR concepts (2-d vectors, memrefs, etc.) with just a few new operations
- includes fully functional integration tests (running on a Sapphire Rapids emulator) which test correctness but also document usage
Sparse compiler: stress testing continues, but ran clean for millions of test without finding new issues; some more discussion of adding sparse tensor type, work will start shortly.

SPIR-V

The SPIR-V dialect sees more ops for Vulkan graphics: spv.Image.
A few more patches landed into the SPIR-V dialect to improve op naming consistency.

Other

OpenMP: Pretty printer and parser for the work-sharing loop landed.
The Memref type now accepts an arbitrary attribute to model the memory space:

The memory space of a memref is specified by a target-specific attribute. It might be an integer value, string, dictionary or custom dialect attribute. The empty memory space (attribute is None) is target specific.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

CUDA backend
- Enabled CUDA E2E tests in IREE CI for several HLO ops
- Wrote documentation about design choices for the CUDA backend and how to run CUDA E2E
- Starting codegen improvements by adding tiling and distribution to blocks for element-wise ops

mlir-npcomp: Prototype for compiling Numpy programs

First iteration of roadmap
Rewrite of GlobalizeObjectGraph to allow importing ResNet and a speech-to-text model from TorchHub PR (IR for speech-to-text model)

TensorFlow / MLIR-HLO

XLA GPU backend

Migrated While and Conditional ops emitter to use LMHLO operations.
We can now instantiated a full LMHLO graph from an XLA computation and use it to emit LLVM.

Kernel Generator

We fixed the handling of return values for c-wrappers to also support returning memrefs (or anything else that lowers to a struct type in LLVM). Returning structs is not well defined, so instead we now pass a pointer to the result struct as first argument.
We improved broadcast elimination for partially static shapes, enabling more fusion as a result.
Fixed a precision issue with the approximation for tanh.
Ongoing work in the area of auto-vectorizing at the MLIR level and enabling fusion with broadcasts and dynamic shapes.

CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’

Initial Python bindings support is in progress

Recent Talks

2021-03-18: MLIR AMX Vector Dialect ; slides - recording
2021-03-11: Sparse Tensor Type Discussion ; recording

Recent Publications

EVEREST: A design environment for extreme-scale big data analytics on heterogeneous platforms

High-Performance Big Data Analytics (HPDA) applications are characterized by huge volumes of distributed and heterogeneous data that require efficient computation for knowledge extraction and decision making. […]
We present EVEREST, a novel H2020 project started on October 1st, 2020 that aims at developing a holistic environment for the co-design of HPDA applications on heterogeneous, distributed, and secure platforms. EVEREST focuses on programmability issues through a data-driven design approach, the use of hardware-accelerated AI, and an efficient runtime monitoring with virtualization support.
[…]
EVEREST aims at developing a unified MLIR representation for the transparent support of several high-level ML frameworks (e.g. TensorFlow or PyTorch) and high-level optimizers (e.g. XLA, Glow, TVM).

DISC: A Dynamic Shape Compiler for Machine Learning Workloads

Many recent machine learning models show dynamic shape characteristics. However, existing AI compiler optimization systems suffer a lot from problems brought by dynamic shape models, including compilation overhead, memory usage, optimization pipeline and deployment complexity. This paper provides a compiler system to natively support optimization for dynamic shape workloads, named DISC. DISC enriches a set of IR to form a fully dynamic shape representation. It generates the runtime flow at compile time to support processing dynamic shape based logic, which avoids the interpretation overhead at runtime and enlarges the opportunity of host-device co-optimization. It addresses the kernel fusion problem of dynamic shapes with shape propagation and constraints collecting methods. This is the first work to demonstrate how to build an end-to-end dynamic shape compiler based on MLIR infrastructure. Experiments show that DISC achieves up to 3.3x speedup than TensorFlow/PyTorch, and 1.8x than Nimble.

Topic	Replies	Views
MLIR News, 22nd edition (12/12/2020) Newsletter	797	November 30, 2020
MLIR News, 8th edition (5/29/2020) Newsletter	1485	May 18, 2020
MLIR News, 3rd edition (3/20/2020) Newsletter	1762	March 7, 2020
MLIR News, 23th edition (12/26/2020) Newsletter	787	December 14, 2020
MLIR News, 5th edition (4/17/2020) Newsletter	1254	April 5, 2020