MLIR News, 25th edition (1/22/2021)

See the previous published edition.

Welcome to the twenty-fifth issue of the MLIR (bi)Weekly, a newsletter covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

MLIR Core

Infrastructure

Table-driven Infrastructure

Optimizations and Code Generation

  • LLVM dialect now uses built-in types whenever possible.
  • Generalized masked and compressed load stores memref operands with indices and unified syntactic conventions for all memory ops in vector dialect
    • simplifies rewriting between all these ops
    • makes syntax more consistent, easier to read
    • also prepares vectorization strategy in sparse compiler
  • The sparse codegen got the addition of vectorization strategies ; @aartbik tells a bit more about it on Discourse.
    • innermost loops, choice of dense or dense/sparse for-loop vectorization
    • handles both parallel and reduction types
    • masked mem operations interact nicely with vector dialect folding/hoisting
    • planned independent pass should partition loops into
    • unconditional vector loops and scalar cleanup loops
  • Improved sparse runtime support library
  • Started to implement the “backing store” for a MLIR sparse tensor type
    • this will simplify using the sparse compiler a lot, and even enable setting up actual integration tests, more about this next time
  • Async dialect to LLVM lowering simplification

SPIR-V

  • Code shuffling and refactoring for SPIR-V dialect and conversions are done: now they are following MLIR’s dialect and lowering conversions.
  • The SPIR-V dialect now knows traits like SignedOp, UnsignedOp, and UsableInSpecConstantOp to process ops in these categories uniformly.
  • spv.SpecConstantOperation is fully supported now, including serialization and deserialization.
  • A few operations (spv.GLSL.Fma/spv.Ordered/Unordered/spv.IsInf) are added, with lowerings from upper layers.

Other

  • Python bindings:
    • Got support for extendable OpView classes.
    • FuncOp and ModuleOp now have bindings.
  • f80 and f128 builtin floating point types were added.

In the Ecosystem

IREE : An MLIR Execution Environment for CPU and GPU

  • Quite a few productionalization activities:
    • MacOS binaries are now being built (in addition to Linux). Windows binaries still exclude TensorFlow integration due to issues.
    • More dogfooding of public APIs, notably resulting in an e2e JAX training example.
    • Performance regression dashboard added for MobileBert on Samsung S20 (CPU and GPU).
    • IREE core build no longer has a dependency on the tensorflow repository.
  • New HAL runtime and task scheduling system now functional for threaded tile dispatch on CPU (ResNet before, ResNet now with intra-op parallelism, looking forward to inter-op parallelism once the compiler emits more fine grained barriers).
  • Initial integration of LinAlg/Tensors to supplant the current codegen pipeline continues to make progress with initial e2e results (this work is needed to get the most out of fusion, parallelism and tunability). Targeting end of quarter for scale out to all supported workloads.
  • TOSA-based TFLite importer brought up with initial support for a few ops on CPU and GPU.

mlir-npcomp: Prototype for compiling numpy programs

  • Community member experimenting with a data pipeline system.
  • Starting to look at how to de-dynamize TorchScript programs.

TensorFlow / MLIR-HLO

XLA GPU backend

  • BatchNorm, Infeed, Outfeed, and CustomCall migrated to use MLIR.
  • All ElementalIrEmitter-based ops are migrated.

Kernel Generator:

  • Kernel generator now performs fusion at the “linalg-on-tensors” level, aligning our pipeline closer with IREE and with upstream MLIR.
  • We are tuning code generation for some broadcasting cases of binary operations. Currently we beat Eigen in some cases but are slower in others. Once this is fixed, we will launch a first binary kernel to production.
  • We are burning down the list of kernels that are missing implementations (5% to go). Remaining work is mostly on generalizing the tf bridge (tf to hlo legalization) for dynamic shapes by moving existing lowering patterns from the classic bridge to mlir.
  • The two kernels we launched to production last year to gain confidence in our host-side implementation are faring well. We had one bug wrt. zero-element tensors that was fixed but otherwise have not heard back from users (a good signal!). We will launch more unary kernels starting next week.
  • We started first investigations into bringing kernel generator to CPU. The goal is to build a rough prototype to better understand what the missing pieces are.

CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’

  • Handshake-runner is reimplemented using MLIR interface.
  • HandshakeToFIRRTL pass gets the implementation for lowering of Load, Store, Memory and Buffer ops.
  • George Lyon added a C API for emitting Verilog from a CIRCT MLIR module using SV(System Verilog) dialect.
  • ESI Cosimulation has a limited working prototype as part of the CIRCT integration tests.
  • During weeekly meeting January 13, Rachit talked about Calyx and propose the next steps and in the next meeting scheduled on January 20, sequential logic and cosimulation was discussed.

Recent Talks

I had missed this update initially, just amended the TensorFlow section:

1 Like

Thanks, as for Task/Phase 2:

Decision: adopt TFRT, but also support jitting CPU code in TFRT.

We could ask if the TFRT team could share some news more or less regularly on features that are visible at user level.