See the previous published edition.
Welcome to the fifteenth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
MLIR Core
Infrastructure
- Next steps on deprecating the global dialect registry:
- Upstream does not use the global registry anymore, we plan to remove it entirely “soon”.
- The global registry is now disabled by default: clients who still rely on the
MLIRContext
loading dialects from the global registry can temporarily callmlir::enableGlobalDialectRegistry(true);
before creating an MLIRContext. - Some assertions have been added defensively to catch incorrect setup / missing dialect dependency to help with transitions.
- Added a FAQ entry: Registered, loaded, dependent: what’s up with Dialects management?
- C API and Python Bindings: steady progress can be observed in the C API unit-test and the Python Bindings tests.
- A PDL Interpreter dialect was added, modeling a low level API for pattern rewrites.
- This moves PDL one step closer to being usable within the PatternRewrite infra.
Table-driven Infrastructure
- Integer Attributes in ODS now use the equivalent C++ type when applicable.
- Declarative assembly:
- Regions are now supported.
- Support for specifying “custom” directives: this allows to call into user-defined C++ helpers for printing/parsing in cases where the declarative assembly does not allow to express the desired parsing.
- A special derived attribute was added to model Symbol names in ODS, with a special treatment in the declarative assembly for formatting them properly.
- Structure fields can now have a default value.
Optimizations and Code Generation
CPU codegen
- Published MLIR case-studies documents related to CPU codegen for the Vector dialect:
- Implemented an lowering option to use 32-bit indices when this improves speed (findings reported in the transfer document listed above):
- 32-bit index comparison speeds up by 4x (AVX2) and 2x (AVX512)
- benefits vector.create_mask and vector.transfer_read/write
- Hosted a Sparse Tensors “meet-and-greet” with participants from University of Utah, Arizona, Pacific Northwest National Laboratory, Stanford, and Google. Please express your interest in the thread to be invited to future sessions.
SPIR-V
- More ops are supported in the dialect, including Intel subgroup ops and more GLSL ops.
- Resource limit fields now have default values.
- An optional name can be given to
spv.module
now. - SPIR-V to LLVM conversion GSOC project is wrapping up. The final batch of code, mainly for bringing up
mlir-spirv-cpu-runner
, is ready and in the process of landing one by one.
Other
- OpenMP: Added a conversion -convert-openmp-to-llvm from “OpenMP with standard dialect” to OpenMP with LLVM.
- OpenACC: Updates to acc.loop operands and attributes.
In the Ecosystem
IREE : An Experimental MLIR Execution Environment
- Initial performance work on matmul for CPU codegen back-end achieved MobileNetV2 running at ~43% of TFLite on Pixel 4. Next steps planned to close this gap.
- Java API milestone: simple_mul.mlir can [now be run with IREE using Java] (Adds support for invoking a function through the java api · openxla/iree@92689b8 · GitHub
). - Continued work on GPU codegen matmul performance. Also, planning on how to integrate the Vector dialect into the GPU back-end.
mlir-npcomp: Prototype for compiling numpy programs
NPCOMP gained an initial framework for declaratively specifying pytorch bindings in Python. This is intended to replace the existing mechanism, which is based on an unstable interface exported from Pytorch.
TensorFlow / MLIR-HLO
Further explored XLA->LMHLO FusionOp migration. Here are a few issues encountered:
- MHLO is used for fused ops. It doesn’t have layouts, which XLA/GPU use for codegen. Options considered:
- Make XLA/GPU not dependent on layouts during Fusion codegen. Caused performance regression for various reasons.
- Canonicalize fused computations to always use descending physical layouts. It caused large performance regression, reasons not obvious.
- Make MHLO ops carry layouts as an optimization hint. It works, but requires small fixes here and there.
- XLA/GPU handles constants in a way that depends on op names. Serializing existing XLA HLO to MHLO doesn’t preserve the name. XLA/GPU refactoring is required to remove this dependency.
CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’
The LLHD Dialect gained some optimizations for the ‘extract’ operation, and better handling for memories.