See the previous published edition.
Welcome to the twenty-fifth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!
MLIR Core
Infrastructure
- An interface was added to better support cast-like operations.
- A new builtin
unrealized_conversion_cast
operation was added to reduce the burden of intermixing type systems. [discussion] - FuncOpSignatureConversion(populateFuncOpTypeConversionPattern) now supports FunctonLike operations that are similar to FuncOp.
- A custom stream can now be provided when generating PassManager crash reproducers.
Table-driven Infrastructure
- A new RangeTypesMatchWith operation predicate was added
- This is a variant of TypesMatchWith that supports range equality.
- [OpFormatGen] Enum attributes are now formatted as keywords when possible.
- [OpFormatGen] Types, i.e. operand/result types, can now be used to anchor optional groups.
Optimizations and Code Generation
- LLVM dialect now uses built-in types whenever possible.
- Generalized masked and compressed load stores memref operands with indices and unified syntactic conventions for all memory ops in vector dialect
- simplifies rewriting between all these ops
- makes syntax more consistent, easier to read
- also prepares vectorization strategy in sparse compiler
- The sparse codegen got the addition of vectorization strategies ; @aartbik tells a bit more about it on Discourse.
- innermost loops, choice of dense or dense/sparse for-loop vectorization
- handles both parallel and reduction types
- masked mem operations interact nicely with vector dialect folding/hoisting
- planned independent pass should partition loops into
- unconditional vector loops and scalar cleanup loops
- Improved sparse runtime support library
- after Matrix Market Exchange format, now also FROSTT file format for tensors, see Sparse Matrix Market Exchange Format support in MLIR for more info.
- ability to lexicographically sort in memory by indices (important for sparse setup!)
- made format extension proposal to the FROSTT team
- Started to implement the âbacking storeâ for a MLIR sparse tensor type
- this will simplify using the sparse compiler a lot, and even enable setting up actual integration tests, more about this next time
- Async dialect to LLVM lowering simplification
- coroutine and async runtime operations are now explicit operations instead of function calls
- coroutine intrinsics are operations in the LLVM dialect
SPIR-V
- Code shuffling and refactoring for SPIR-V dialect and conversions are done: now they are following MLIRâs dialect and lowering conversions.
- The SPIR-V dialect now knows traits like
SignedOp
,UnsignedOp
, andUsableInSpecConstantOp
to process ops in these categories uniformly. -
spv.SpecConstantOperation
is fully supported now, including serialization and deserialization. - A few operations (spv.GLSL.Fma/spv.Ordered/Unordered/spv.IsInf) are added, with lowerings from upper layers.
Other
- Python bindings:
- Got support for extendable OpView classes.
-
FuncOp
andModuleOp
now have bindings.
- f80 and f128 builtin floating point types were added.
In the Ecosystem
IREE : An MLIR Execution Environment for CPU and GPU
- Quite a few productionalization activities:
- MacOS binaries are now being built (in addition to Linux). Windows binaries still exclude TensorFlow integration due to issues.
- More dogfooding of public APIs, notably resulting in an e2e JAX training example.
- Performance regression dashboard added for MobileBert on Samsung S20 (CPU and GPU).
- IREE core build no longer has a dependency on the tensorflow repository.
- New HAL runtime and task scheduling system now functional for threaded tile dispatch on CPU (ResNet before, ResNet now with intra-op parallelism, looking forward to inter-op parallelism once the compiler emits more fine grained barriers).
- Initial integration of LinAlg/Tensors to supplant the current codegen pipeline continues to make progress with initial e2e results (this work is needed to get the most out of fusion, parallelism and tunability). Targeting end of quarter for scale out to all supported workloads.
- TOSA-based TFLite importer brought up with initial support for a few ops on CPU and GPU.
mlir-npcomp: Prototype for compiling numpy programs
- Community member experimenting with a data pipeline system.
- Starting to look at how to de-dynamize TorchScript programs.
TensorFlow / MLIR-HLO
- BatchNorm, Infeed, Outfeed, and CustomCall migrated to use MLIR.
- All ElementalIrEmitter-based ops are migrated.
Kernel Generator:
- Kernel generator now performs fusion at the âlinalg-on-tensorsâ level, aligning our pipeline closer with IREE and with upstream MLIR.
- We are tuning code generation for some broadcasting cases of binary operations. Currently we beat Eigen in some cases but are slower in others. Once this is fixed, we will launch a first binary kernel to production.
- We are burning down the list of kernels that are missing implementations (5% to go). Remaining work is mostly on generalizing the tf bridge (tf to hlo legalization) for dynamic shapes by moving existing lowering patterns from the classic bridge to mlir.
- The two kernels we launched to production last year to gain confidence in our host-side implementation are faring well. We had one bug wrt. zero-element tensors that was fixed but otherwise have not heard back from users (a good signal!). We will launch more unary kernels starting next week.
- We started first investigations into bringing kernel generator to CPU. The goal is to build a rough prototype to better understand what the missing pieces are.
CIRCT : Circuit IR Compilers and Tools aka âMLIR for hardwareâ
- Handshake-runner is reimplemented using MLIR interface.
- HandshakeToFIRRTL pass gets the implementation for lowering of Load, Store, Memory and Buffer ops.
- George Lyon added a C API for emitting Verilog from a CIRCT MLIR module using SV(System Verilog) dialect.
- ESI Cosimulation has a limited working prototype as part of the CIRCT integration tests.
- During weeekly meeting January 13, Rachit talked about Calyx and propose the next steps and in the next meeting scheduled on January 20, sequential logic and cosimulation was discussed.