MLIR News, 12th edition (7/25/2020)

See the previous published edition.

Welcome to the twelfth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

The last open meeting gave the opportunity for a nice overview and discussion about the work on the asynchronous modeling:
[RFC] New dialect for modelling asynchronous execution at a higher-level
and [RFC] Async/Await dialect targeting LLVM coroutines (slides and recording).

MLIR Core

Infrastructure

  • The first commit in a large refactoring of DialectConversion type conversions has landed. This refactoring will ensure that patterns produce legal types, and that operands to conversion patterns have properly converted types.
  • Types and attributes now support a “mutable” component. This is a precondition for supporting recursive types that require delayed initialization, in particular LLVM structure types.
  • An RFC for “Debug Actions” was sent out. This framework would allow external entities to control various aspects of the compiler, and is intended to enable debug counters, optimization fuel, interactive compiler debugging, and more.
  • DialectConversion now emits an error if an operation or value was marked as “erased”, but has live users after the conversion process. (It previously asserted).

Shape Dialect

  • Restructure shape dialect so that its operations can operate on shape-specific values or on standard values. The shape-specific types like shape.shape and shape.size correspond to the standard types tensor<?xindex> and index with the difference that they can hold error values. The shape operations can now operate on either and ensure error propagation where it occurs.**

Optimizations and Code Generation

  • An RFC to clean up casting support in MLIR core was shared, with the goal to upstream the HLO cast operations. It has been tested with lowering of the dynamic reshape cast operation to LLVM.

CPU codegen

  • Started to explore what operations are missing in the Vector dialect to support sparse computations:
    • The gather/scatter operations were added to the Vector dialect
    • A reference lowering to LLVM intrinsics was added to run on CPU
    • Subsequent improvements made sure this lowers to e.g. effective AVX512
    • Currently benchmarking / debunking matrix times vector cases:
      • Sparse SDOT, Sparse SAXPY, Sparse SAXPY (jagged diagonal)
    • A few other useful primitives were identified, still TBD
  • Continued matmul on AArch64 study
    • Near peak matmul for all matrices of size multiple of tile_size including small matrices.

SPIR-V

  • Continued progress on SPIR-V to LLVM conversion. A conversion manual is set up to show how different IR constructs are mapped and tracking the progress. New patterns are added for simple SPIR-V branches, more patches are coming to cover spv.selection, spv.loop, and load/store memory operations.
  • Standard signed remainder to SPIR-V conversion is fixed to use spv.UMod to emulate because Vulkan does not support spv.SMod/spv.SRem with negative values.
  • A rationale section is added to the doc to explain why SPIR-V conversion is not using MemrefDescriptor as the CPU codegen.
  • Continued work on supporting shared memory for matmul

Other

In the Ecosystem

IREE : Compiler/runtime for ML models focused on resource constrained deployments

TensorFlow

  • XLA-CPU: Added support for generating matrix-vector multiplies through Linalg. Performance results look positive, still need to fix correctness issues with transposed inputs.
  • XLA-GPU: moving on with refactoring the backend to operate on the LMHLO dialect:
    • Further cleanups on the XLA/GPU backend
      • Not using XLA-based llvm_ir::AliasAnalysis anymore.
      • Nested computations don’t depend on BufferAssignment anymore.
    • LHLO::SortOp ported and passing unit tests:
    • LHLO ops should carry names for debugging.
    • Nested FusionOp (de)serialization support for the current MHLO.
    • Added tests with non-identity layouts.

CIRCT: Circuit IR Compilers and Tools aka ‘MLIR for hardware’

  • Conversion from standard to handshake was updated to generate a single block region, to be compatible with the upstreamed version of Graph Regions
  • An initial lowering from the handshake dialect into FIRRTL now exists
  • The handshake dialect gained a simple buffer insertion pass to break potential combinational loops after RTL generation
  • FIRRTL gained better support for bit extract operations
  • The FIRRTL to RTL transformation is now a Pass, rather than a legalization with patterns because the type conversions were too messy
  • LLHD gained a bunch of basic folding optimizations, particularly for variadic operations

Recent Talks

1 Like