See the previous published edition.
Welcome to the thirty-eighth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
Highlights
MLIR Core
Infrastructure
-
InferShapedTypeOpInterface
is split into two. One of the parts,ReifyRankedShapedTypeOpInterface
handles the reification of the shape of a result type in terms of its operands when the type is ranked. This was previously done using thereifyReturnTypeShapesPerResultDim
method inInferShapedTypeOpInterface
. The newly created interface better matches needs of compilation lower down the stack which deals with ranked shape codegeneration, whereas theInferShapedTypeOpInterface
better matches needs of compilation higher up the stack where unranked shapes are resolved. The passResolveShapedTypeResultDims
is also split to reflect this change, with theResolveRankedShapeTypeResultDims
using the newly created interface.
Table-driven Infrastructure
-
Support binding multi-results of NativeCodeCall
- The form
NativeCodeCall<"Foo(...)">:$__1
is available for Value type.
- The form
- Improve the diagnostic message while pattern matching fail
- Operand type mismatch and NativeCodeCall failure will have log under debugging mode.
Codegen
- Sparse compiler progress:
- Replaced linalg.copy with memref.copy, which will fit nicer with bufferization improvements (this also removed dependence on linalg-to-loops pass)
- Landed the first version of software pipelining transformation.
SPIR-V
- spv.GLSL.FMix was defined.
Build
-
libMLIRPublicAPI.so
was removed. It was an early artifact of the Python integration and was not intended to force a shared library API for everyone. Note that we are not opposed to an aggregate library similar in purpose to this existing but the existing mechanic was wrong and needs to be revisited. - Python Build Re-engineering. The Python build was re-engineered to directly incorporate downstream packaging needs in the core setup. The design is based on creating static, self contained packages, as is a best practice for Python deployment. As a consequence, it also makes it impossible to have the “TypeID mismatches” that plagued previous versions. Downstreams updated: npcomp, circt, mlir-hlo. Looking for someone to finish an up-stream sample and exercise Windows builds (which should now work across projects).
- Emit strong definition for TypeID storage in Op/Type/Attributes definition (and dialects). It is our belief that with this patch and the previous, we should not be experiencing “TypeID mismatches” in MLIR based projects anymore. Please reach out if not the case.
In the Ecosystem
IREE : An Experimental MLIR Execution Environment
- CUDA backend:
- Many performance improvements across codegen and HAL, based on BertTraining model profiling. Optimized it to run in 135ms per iteration.
TensorFlow / MLIR-HLO
Kernel generator
- Enabled unsigned int kernels for more TF ops (Cast, LeftShift, RightShift, NonEqual, Equal, BitwiseOr, BitwiseXor and more)
- Lowering for AddOp/SubOp from ComplexDialect to Standard
- Conversion of math::Exp2Op to NVVM/ROCDL.
- Infrastructure for JIT compiled kernels is being added.
Recent Talks
- 2021-07-22: MLIR data visualization using PassInstrumentation ; slides - recording
- 2021-07-15: From MHLO To Linalg in IREE ; slides - recording
Recent Publications
ScaleHLS: Scalable High-Level Synthesis through MLIR
High-level Synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). HLS tools can be used to deliver solutions for many different kinds of design problems, which are often better solved with different levels of abstraction. While existing HLS tools are built using compiler infrastructures largely based on a single-level abstraction (e.g., LLVM), we propose ScaleHLS, a next-generation HLS compilation flow, on top of a multi-level compiler infrastructure called MLIR, for the first time. By using an intermediate representation (IR) that can be better tuned to particular algorithms at different representation levels, we are able to build this new HLS tool that is more scalable and customizable towards various applications coming with intrinsic structural or functional hierarchies. ScaleHLS is able to represent and optimize HLS designs at multiple levels of abstraction and provides an HLS-dedicated transform and analysis library to solve the optimization problems at the suitable representation levels. On top of the library, we also build an automated DSE engine to explore the multi-dimensional design space efficiently. In addition, we develop an HLS C front-end and a C/C++ emission back-end to translate HLS designs into/from MLIR for enabling the end-to-end ScaleHLS flow. Experimental results show that, comparing to the baseline designs only optimized by Xilinx Vivado HLS, ScaleHLS improves the performances with amazing quality-of-results – up to 768.1x better on computation kernel level programs and up to 3825.0x better on neural network models.