See the previous published edition.
Welcome to the thirtieth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
- Dialects got new abilities for better handling for unregistered operations:
ModuleOpdoes not have a terminator anymore, and operations can opt-out of this requirement (see also: [RFC] Making terminator optional (for single block / graph regions))
- The website documentation was updated to provide instructions about running the integration test, in particular the use of Intel emulator for running AVX-512, AMX, and other vector extension without the need of the most recent CPU.
- Sparse compiler progress:
- A proposal changelist that adds generic “format” attribute to tensor type out for discussion
- Extended the “secondary” storage types (pointers, indices) to include very concise 8-bit and 16-bit values as well to reduce memory footprint even more (this required adding the right zero extensions at the right places)
- Progress continues on the TOSA support, in particular with more lowerings to Linalg.
- Support for registering runtime functions (or callbacks) was added to the C API for the JIT, and an example shows how to use this to/from Python.
IREE : An Experimental MLIR Execution Environment
- IREE’s CPU and GPU backends can use Linalg on tensor based lowering to lower MobileBERT and MobileNetV2
- Performance of these are on par with the previous Linalg on buffers based lowering
- PR out for enabling fusion Linalg named ops (like Matmul and Conv variants) to be fusible with consumer elementwise operations (which would cover Bias Add, Sigmoid, etc). With these the performance of MobileNetV2 is now better on the Linalg on tensors path. Updated numbers of MobileBERT will be available soon once this lands.
- IREE Cuda backend now vectorizes all elementwise operations. This pretty hits peak for these classes of operations. Basic tiling and vectorization (not tuned) is enabled for matmul ops.
- Vectorization has been turned on by default for all GPU backends.
XLA-GPU is able to take a pure LMHLO module (with a few MLIR attributes specific to XLA) and run it to the end using existing infrastructure. All XLA/GPU production now goes through LMHLO. Individual debugging tools still depend on XLA HLO, though.
Kernel Generator improved support for fusion of ops with dynamic shapes in presence of dynamic broadcasts, and improved Rank Specialization for ops with arity > 2.
- 2021-04-01: Discussion about MLIR Bindings (C API, Python Bindings, other languages) status ; slides - recording
Polyhedral optimisation, a methodology that views nested loops as polyhedra and searches for their optimal transformation regarding specific objectives (parallelism, locality, etc.), sounds promising for mitigating difficulties in automatically optimising hardware designs described by high-level synthesis (HLS), which are typically software programs with nested loops. Nevertheless, existing polyhedral tools cannot meet the requirements from HLS developers for platform-specific customisation and software/hardware co-optimisation. This paper proposes ϕsm (phism), a polyhedral HLS framework built on MLIR, to address these challenges through progressive lowering multi-level intermediate representations (IRs) from polyhedra to HLS designs.