MLIR News, 25th edition (1/22/2021)

mehdi_amini · January 15, 2021, 4:48am

Welcome to the twenty-fifth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

MLIR Core

Infrastructure

An interface was added to better support cast-like operations.
A new builtin unrealized_conversion_cast operation was added to reduce the burden of intermixing type systems. [discussion]
FuncOpSignatureConversion(populateFuncOpTypeConversionPattern) now supports FunctonLike operations that are similar to FuncOp.
A custom stream can now be provided when generating PassManager crash reproducers.

Table-driven Infrastructure

A new RangeTypesMatchWith operation predicate was added
- This is a variant of TypesMatchWith that supports range equality.
[OpFormatGen] Enum attributes are now formatted as keywords when possible.
[OpFormatGen] Types, i.e. operand/result types, can now be used to anchor optional groups.

Optimizations and Code Generation

LLVM dialect now uses built-in types whenever possible.
Generalized masked and compressed load stores memref operands with indices and unified syntactic conventions for all memory ops in vector dialect
- simplifies rewriting between all these ops
- makes syntax more consistent, easier to read
- also prepares vectorization strategy in sparse compiler
The sparse codegen got the addition of vectorization strategies ; @aartbik tells a bit more about it on Discourse.
- innermost loops, choice of dense or dense/sparse for-loop vectorization
- handles both parallel and reduction types
- masked mem operations interact nicely with vector dialect folding/hoisting
- planned independent pass should partition loops into
- unconditional vector loops and scalar cleanup loops
Improved sparse runtime support library
- after Matrix Market Exchange format, now also FROSTT file format for tensors, see Sparse Matrix Market Exchange Format support in MLIR for more info.
- ability to lexicographically sort in memory by indices (important for sparse setup!)
- made format extension proposal to the FROSTT team
Started to implement the “backing store” for a MLIR sparse tensor type
- this will simplify using the sparse compiler a lot, and even enable setting up actual integration tests, more about this next time
Async dialect to LLVM lowering simplification
- coroutine and async runtime operations are now explicit operations instead of function calls
- coroutine intrinsics are operations in the LLVM dialect

SPIR-V

Code shuffling and refactoring for SPIR-V dialect and conversions are done: now they are following MLIR’s dialect and lowering conversions.
The SPIR-V dialect now knows traits like SignedOp, UnsignedOp, and UsableInSpecConstantOp to process ops in these categories uniformly.
spv.SpecConstantOperation is fully supported now, including serialization and deserialization.
A few operations (spv.GLSL.Fma/spv.Ordered/Unordered/spv.IsInf) are added, with lowerings from upper layers.

Other

Python bindings:
- Got support for extendable OpView classes.
- FuncOp and ModuleOp now have bindings.
f80 and f128 builtin floating point types were added.

In the Ecosystem

IREE : An MLIR Execution Environment for CPU and GPU

Quite a few productionalization activities:
- MacOS binaries are now being built (in addition to Linux). Windows binaries still exclude TensorFlow integration due to issues.
- More dogfooding of public APIs, notably resulting in an e2e JAX training example.
- Performance regression dashboard added for MobileBert on Samsung S20 (CPU and GPU).
- IREE core build no longer has a dependency on the tensorflow repository.
New HAL runtime and task scheduling system now functional for threaded tile dispatch on CPU (ResNet before, ResNet now with intra-op parallelism, looking forward to inter-op parallelism once the compiler emits more fine grained barriers).
Initial integration of LinAlg/Tensors to supplant the current codegen pipeline continues to make progress with initial e2e results (this work is needed to get the most out of fusion, parallelism and tunability). Targeting end of quarter for scale out to all supported workloads.
TOSA-based TFLite importer brought up with initial support for a few ops on CPU and GPU.

mlir-npcomp: Prototype for compiling numpy programs

Community member experimenting with a data pipeline system.
Starting to look at how to de-dynamize TorchScript programs.

TensorFlow / MLIR-HLO

XLA GPU backend

BatchNorm, Infeed, Outfeed, and CustomCall migrated to use MLIR.
All ElementalIrEmitter-based ops are migrated.

Kernel Generator:

Kernel generator now performs fusion at the “linalg-on-tensors” level, aligning our pipeline closer with IREE and with upstream MLIR.
We are tuning code generation for some broadcasting cases of binary operations. Currently we beat Eigen in some cases but are slower in others. Once this is fixed, we will launch a first binary kernel to production.
We are burning down the list of kernels that are missing implementations (5% to go). Remaining work is mostly on generalizing the tf bridge (tf to hlo legalization) for dynamic shapes by moving existing lowering patterns from the classic bridge to mlir.
The two kernels we launched to production last year to gain confidence in our host-side implementation are faring well. We had one bug wrt. zero-element tensors that was fixed but otherwise have not heard back from users (a good signal!). We will launch more unary kernels starting next week.
We started first investigations into bringing kernel generator to CPU. The goal is to build a rough prototype to better understand what the missing pieces are.

CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’

Handshake-runner is reimplemented using MLIR interface.
HandshakeToFIRRTL pass gets the implementation for lowering of Load, Store, Memory and Buffer ops.
George Lyon added a C API for emitting Verilog from a CIRCT MLIR module using SV(System Verilog) dialect.
ESI Cosimulation has a limited working prototype as part of the CIRCT integration tests.
During weeekly meeting January 13, Rachit talked about Calyx and propose the next steps and in the next meeting scheduled on January 20, sequential logic and cosimulation was discussed.

Recent Talks

2021-01-21 [RFC] Data Layout Modeling ; (recording)
2021-01-14 Open Meeting 1/14: Dialect Conversion and Type Conversion: the question of cast operations ; (recording)

mehdi_amini · January 28, 2021, 8:30pm

I had missed this update initially, just amended the TensorFlow section:

bhack · January 29, 2021, 11:00am

Thanks, as for Task/Phase 2:

Decision: adopt TFRT, but also support jitting CPU code in TFRT.

We could ask if the TFRT team could share some news more or less regularly on features that are visible at user level.

Topic	Replies	Views
MLIR News, 35th edition (5/29 - 6/12/2021) Newsletter	961	June 1, 2021
MLIR News, 44th edition (10/2 - 10/15/2021) Newsletter	701	October 14, 2021
MLIR News, 27th edition (2/19/2020) Newsletter	1084	February 8, 2021
MLIR News, 28th edition (2/20 - 3/5/2021) Newsletter	1327	February 22, 2021
MLIR News, 37th edition (6/26 - 7/9/2021) Newsletter	1100	June 29, 2021