See the previous published edition.
Welcome to the thirty-first issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
Highlights
- The builtin
tensor
type has a new member: an opaque “encoding” attribute. - The mlir-npcomp project reached an important end-to-end milestone with the ability to compile and execute simple PyTorch examples (see below for the specifics).
- A new publication about Tensor Processing Primitives was just posted, and it’ll be presented next week during the Open Meeting.
MLIR Core
Infrastructure
- The builtin tensor type now supports a generic
encoding
attribute (discussion). - Pass analyses will soon be able to depend on other analyses.
- A language server and VSCode extension for MLIR are being added.
- This will provide IDE language functionality (e.g. SSA value def/use tracking) when interacting with .mlir files in VSCode.
Table-driven Infrastructure
- Interfaces and Traits can now be attached to Attributes/Types in tablegen in a similar way to operations.
Codegen
- The builtin Tensor type now supports an opaque “encoding” attribute. Up next: interfaces to assign semantics to its contents.
- Sparse tensor support progress:
- Added support for integral values (i32, i16, i8) for the numerical part of sparse tensors.
- Now that vectors with
index
element type are allowed, the restriction on vectorization of sparse tensors with index-typed overhead storage is lifted
- The AVX_512 dialect has been merged into a newly named
X86Vector
Dialect. - Added AVX dot operation to X86Vector dialect.
SPIR-V
- SPIR-V conversion now allows explicitly controlling bitwidth emulation for bitwidth unsupported in the target environment.
- A few fixes landed in SPIR-V conversion to handle dynamic ranked
memref
better. - A few utility functions are added in SPIR-V conversion for creating push constant blocks.
- Boolean
memrefs
are now properly handled when converting to SPIR-V.
In the Ecosystem
IREE : An Experimental MLIR Execution Environment
- IREE now has moved to use linalg on tensors based compilation flow by default. In the coming weeks any potential regressions will be addressed and legacy path will be deprecated
- The CUDA backend now has promotion of operands to use shared memory on NVIDIA GPU enabled. This is one step closer towards getting the CUDA backend on par with the SPIR-V backend (with the goal of targeting MMA intrinsics in NVVM). Eventually want to get CUDA backend generate code similar to CUTLASS
mlir-npcomp: Prototype for compiling numpy programs
- Basic infra for annotate shapes and dtypes on arguments PR
- MILESTONE: TorchScript unary tanh runs on reference backend PR, PR
TensorFlow / MLIR-HLO
Kernel Generator project:
- We added support for select are landing kernels that require it. We are also expanding support for complex numbers in code generation.
- The next goal is to complete the support for unsigned integers in HLO based code generation
TFRT: A New TensorFlow Runtime
TFRT JIT compilation can now specialize compiled kernels to operand shapes, and this allows to get rid of broadcasts at runtime, and improve performance (github commit). Longer term plan is to specialize to shape constraints to support partial dynamism at runtime without recompilation.**
CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’
Recent Talks
Recent Publications
TPPs define a compact, yet versatile set of 2D-tensor operators (or a virtual Tensor ISA), which subsequently can be utilized as building-blocks to construct complex operators on high-dimensional tensors. The TPP specification is platform- agnostic, thus code expressed via TPPs is portable, whereas the TPP implementation is highly-optimized and platform-specific.
[…] TPPs fit in the MLIR ecosystem/stack as a lowering dialect, and in this way the TPP back-end could be leveraged by multiple TC frameworks.
This line of research proposes a compiler-based approach for optimizing the accelerator memories on top of traditional HLS. The main idea is to use domain-specific annotations to pass useful information to the compiler, transform the intermediate representations, and interface directly with modern HLS tools.
[…] We target novel multi-level representations, like MLIR [6], to include more hardware-related information early in the compilation flow to make progressive refinements of the architecture at proper levels of abstraction