See the previous published edition.
Welcome to the thirty-third issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
MLIR Core
Infrastructure
- Function arguments/results attributes are now represented in a single dense
ArrayAttr
, this dropped the compilation time of a somewhat large TensorFlow model from ~650 seconds to ~400 seconds. - A
debugName
can be associated with aPattern
and printed during application - Pass timing has been factored out of the pass manager into a more general timing manager. The new
TimingManager
can be used to time arbitrary code paths either through start/stop calls on a timer handle, or through a RAII-style timing scope. The pass manager becomes a client to this new infrastructure and reports passes to the provided timing manager or timing scope. To indicate the change of scope, the--pass-timing
options are now called--mlir-timing
and are provided by a CL options struct separate from the pass manager. Users can provide their own timing implementation by subclassingTimingManager
, and MLIR itself provides aDefaultTimingManager
. TheMlirOptMain
has been updated with the new infrastructure and now includes execution time of the parser and output emitter.
Table-driven Infrastructure
- ODS Op definition now allows to specify a specific namespace on a per-op basis.
Codegen
- Affine vectorization now supports vectorizing reduction loops along the reduction dimension.
- So does affine parallelization.
-
affine.parallel
operation now support min/max bounds similarly toaffine.for
.
- Linalg-on-Tensors specific bufferization strategy work has started ([RFC].([RFC] Linalg on Tensors Update and Comprehensive Bufferization RFC - #6 by nicolasvasilache)).
- Linalg indexed_generic unification is completed (RFC) and vectorization is now available for ops with linalg.index.
- Vector.transfer lowering refactoring is completed. It is now more progressive and composable.
- Linalg (tensor and buffer) vectorization has been generalized going through a n-D
vector.multi_reduction "add", %0 [1, 3]
op. New vector transpose/broadcast/reduction canonicalization will be added on a per-need basis. - Migration to first-class citizen sparse tensor types has completed (Discourse).
- All linalg code and annotations related to sparse tensors have been removed in favor of proper sparse tensor types.
- All glue, clutter, and switches are all replaced by proper use of the sparse tensor types.
- Full verification of type consistency in operations and sparse primitives has been added.
SPIR-V
- spv.BranchConditional’s (de)serialization is properly implemented now.
- More progress on supporting graphics: spv.ImageQuerySize is defined now.
- A few corner cases in vector/std to SPIR-V conversion are addressed.
In the Ecosystem
IREE : An Experimental MLIR Execution Environment
- Revamped docs site launched: https://google.github.io/iree/
- Initial commit to add experimental ROCM backend.
- Performance work on convolution brought e2e MobileNetV3 (mobile, float) benchmark from 288ms → 58ms on CPU. Upcoming work on depthwise conv should bring this within 20% of TFLite.
- TOSA: Nearing broad coverage support for TOSA ops. Soon will be focusing on supporting a representative set of TFLite models, both float and quantized.
- Working with partners on Tiny IREE, enabling inference on embedded devices by removing large deps (e.g., std library) and providing a bare-bones runtime.
- CUDA backend:
- Add functional support to all convolution named ops
- Add support for linking with libdevice and add scalarization for extended math op. This adds support for many extended math ops. 40/48 of IREE XLA op execution tests are passing.
TensorFlow / MLIR-HLO
Kernel CodeGen:
- We are further expanding support for operations on complex types. For this, we have added support for complex constants (scalars) and are now reworking the lowering of HLO operations on complex types to be compliant wrt. NaN.
- A first prototype of a generalized rank specialization is unblocking compound operations at the TensorFlow level, for example
tf.round
.
CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’
- Handshake integration tests are coming online, demonstrating an end-to-end flow from a Standard dialect CFG to System Verilog output suitable for simulation or synthesis.
- InferWidths and CheckWidths passes was added to FIRRTL dialect.
- RTL dialect get renamed to HW.