MLIR News, 33rd edition (5/1 - 5/14/2021)

mehdi_amini · May 5, 2021, 12:52am

See the previous published edition.
Welcome to the thirty-third issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

MLIR Core

Infrastructure

Function arguments/results attributes are now represented in a single dense ArrayAttr, this dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds.
A debugName can be associated with a Pattern and printed during application
Pass timing has been factored out of the pass manager into a more general timing manager. The new TimingManager can be used to time arbitrary code paths either through start/stop calls on a timer handle, or through a RAII-style timing scope. The pass manager becomes a client to this new infrastructure and reports passes to the provided timing manager or timing scope. To indicate the change of scope, the --pass-timing options are now called --mlir-timing and are provided by a CL options struct separate from the pass manager. Users can provide their own timing implementation by subclassing TimingManager, and MLIR itself provides a DefaultTimingManager. The MlirOptMain has been updated with the new infrastructure and now includes execution time of the parser and output emitter.

Table-driven Infrastructure

ODS Op definition now allows to specify a specific namespace on a per-op basis.

Codegen

Affine vectorization now supports vectorizing reduction loops along the reduction dimension.
So does affine parallelization.
affine.parallel operation now support min/max bounds similarly to affine.for.

Linalg-on-Tensors specific bufferization strategy work has started ([RFC].([RFC] Linalg on Tensors Update and Comprehensive Bufferization RFC - #6 by nicolasvasilache)).
Linalg indexed_generic unification is completed (RFC) and vectorization is now available for ops with linalg.index.
Vector.transfer lowering refactoring is completed. It is now more progressive and composable.
Linalg (tensor and buffer) vectorization has been generalized going through a n-D vector.multi_reduction "add", %0 [1, 3] op. New vector transpose/broadcast/reduction canonicalization will be added on a per-need basis.
Migration to first-class citizen sparse tensor types has completed (Discourse).
- All linalg code and annotations related to sparse tensors have been removed in favor of proper sparse tensor types.
- All glue, clutter, and switches are all replaced by proper use of the sparse tensor types.
- Full verification of type consistency in operations and sparse primitives has been added.

SPIR-V

spv.BranchConditional’s (de)serialization is properly implemented now.
More progress on supporting graphics: spv.ImageQuerySize is defined now.
A few corner cases in vector/std to SPIR-V conversion are addressed.

In the Ecosystem

IREE : An Experimental MLIR Execution Environment

Revamped docs site launched: https://google.github.io/iree/
Initial commit to add experimental ROCM backend.
Performance work on convolution brought e2e MobileNetV3 (mobile, float) benchmark from 288ms → 58ms on CPU. Upcoming work on depthwise conv should bring this within 20% of TFLite.
TOSA: Nearing broad coverage support for TOSA ops. Soon will be focusing on supporting a representative set of TFLite models, both float and quantized.
Working with partners on Tiny IREE, enabling inference on embedded devices by removing large deps (e.g., std library) and providing a bare-bones runtime.
CUDA backend:
- Add functional support to all convolution named ops
- Add support for linking with libdevice and add scalarization for extended math op. This adds support for many extended math ops. 40/48 of IREE XLA op execution tests are passing.

TensorFlow / MLIR-HLO

Kernel CodeGen:

We are further expanding support for operations on complex types. For this, we have added support for complex constants (scalars) and are now reworking the lowering of HLO operations on complex types to be compliant wrt. NaN.
A first prototype of a generalized rank specialization is unblocking compound operations at the TensorFlow level, for example tf.round.

CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’

Handshake integration tests are coming online, demonstrating an end-to-end flow from a Standard dialect CFG to System Verilog output suitable for simulation or synthesis.

InferWidths and CheckWidths passes was added to FIRRTL dialect.
RTL dialect get renamed to HW.

Recent Talks

2021-05-06: Alibaba Group: Disc revisit & future discussion; slides - recording

Topic	Replies	Views
MLIR News, 32nd edition (4/17 - 4/30/2021) Newsletter	1046	April 19, 2021
MLIR News, 28th edition (2/20 - 3/5/2021) Newsletter	1337	February 22, 2021
MLIR News, 37th edition (6/26 - 7/9/2021) Newsletter	1108	June 29, 2021
MLIR News, 35th edition (5/29 - 6/12/2021) Newsletter	963	June 1, 2021
MLIR News, 19th edition (10/31/2020) Newsletter	1286	October 19, 2020