MLIR News, 42nd edition (9/4 - 9/20/2021)

mehdi_amini · September 3, 2021, 10:12pm

Work in progress: this is a wiki post, everyone is welcome to modify it directly

Please update with work done between 9/3 and 9/20: you can update it along the way (don’t wait the end date to add entries here: you can add as the work is landing)

See the previous published edition
Welcome to the forty-second issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

MLIR Core

Infrastructure

@ftynse sent out a PSA about unrealized conversion casts during the *->LLVM dialect conversion lowerings, that initiated a discussion about how dialect conversion could handle unnecessary casts itself.
ElementsAttr is now an attribute interface.

DRR

A new returnType directive was added to explicitly extract the types of a Value, NativeCodeCall, and more.

Codegen

Sparse compiler progress:
- Added support for general affine subscripts (dense tensors only at the moment)
- Implemented cast operations (int/fp, int/int, fp/fp) within sparse linalg ops
- Improved sparse tensor convert with folding
- Started “sparse kernel” collection: matmul, convolution, quantized matmul, etc.
Conversion pipelines targeting the LLVM dialect must now run the -reconcile-unrealized-casts pass at the end instead (or in addition to) -convert-std-to-llvm to remove undesired casts and discover incomplete partial conversions.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

CPU backend has a new pipeline for using a Tensors → Vectors pass, with late bufferization. Matmuls/Batchm-matmul based codegen go through this path now
FFT is now tiled and distributed by default now. This helps remove FFTs as bottleneck on SPIR-V/CUDA backends since they are not now completely serialized.

Recent Talks

2021-09-09: An ML-Driven Autoconfigurator for Sparse Tensor Kernels in MLIR ; slides - recording (Discourse thread)

Recent Publications

Domain-Specific Multi-Level IR Rewriting for GPU: The Open Earth Compiler for GPU-accelerated Climate Simulation

Most compilers have a single core intermediate representation (IR) (e.g., LLVM) sometimes complemented with vaguely defined IR-like data structures. This IR is commonly low-level and close to machine instructions. As a result, optimizations relying on domain-specific information are either not possible or require complex analysis to recover the missing information. In contrast, multi-level rewriting instantiates a hierarchy of dialects (IRs), lowers programs level-by-level, and performs code transformations at the most suitable level. We demonstrate the effectiveness of this approach for the weather and climate domain. In particular, we develop a prototype compiler and design stencil- and GPU-specific dialects based on a set of newly introduced design principles. We find that two domain-specific optimizations (500 lines of code) realized on top of LLVM’s extensible MLIR compiler infrastructure suffice to outperform state-of-the-art solutions. In essence, multi-level rewriting promises to herald the age of specialized compilers composed from domain- and target-specific dialects implemented on top of a shared infrastructure.

Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

To meet the extreme compute demands for deep learning across commercial and scientific applications, dataflow accelerators are becoming increasingly popular. While these “domain-specific” accelerators are not fully programmable like CPUs and GPUs, they retain varying levels of flexibility with respect to data orchestration, i.e., dataflow and tiling optimizations to enhance efficiency. There are several challenges when designing new algorithms and mapping approaches to execute the algorithms for a target problem on new hardware. Previous works have addressed these challenges individually. To address this challenge as a whole, in this work, we present a HW-SW co-design ecosystem for spatial accelerators called Union within the popular MLIR compiler infrastructure. Our framework allows exploring different algorithms and their mappings on several accelerator cost models. Union also includes a plug-and-play library of accelerator cost models and mappers which can easily be extended. The algorithms and accelerator cost models are connected via a novel mapping abstraction that captures the map space of spatial accelerators which can be systematically pruned based on constraints from the hardware, workload, and mapper. We demonstrate the value of Union for the community with several case studies which examine offloading different tensor operations(CONV/GEMM/Tensor Contraction) on diverse accelerator architectures using different mapping schemes.

Topic		Replies	Views
MLIR News, 25th edition (1/22/2021) Newsletter	2	1258	January 29, 2021
MLIR News, 41st edition (8/21 - 9/3/2021) Newsletter	0	826	August 25, 2021
MLIR News, 37th edition (6/26 - 7/9/2021) Newsletter	0	1100	June 29, 2021
MLIR News, 35th edition (5/29 - 6/12/2021) Newsletter	0	961	June 1, 2021
MLIR News, 47th edition (11/13 - 12/10/2021) Newsletter	0	1150	November 19, 2021