MLIR News, 20th edition (11/14/2020)

See the previous published edition.

Welcome to the twentieth issue of the MLIR (bi)Weekly, a newsletter (published on Friday) covering developments in MLIR, and related projects in the ecosystem. MLIR (bi)Weekly is brought to you by a collective effort of contributors, we welcome your contributions!

Highlights

MLIR Core

Infrastructure

  • FuncOp textual syntax is slightly changing: PSA: Change for FuncOp syntax
  • FunctionLike ops gained helpers for erasing arguments and results
  • Operation locations are now printed by default using aliases, and are printed after the module instead of before (similarly to how LLVM metadata is printed).
  • DenseElementsAttr::getValues now supports custom native C++ floating point types, allowing for frameworks to use custom half/bfloat16 types when applicable.
  • The nesting behavior of the pass manager is now configurable and defaults to explicit nesting.
  • The walk methods now support walking Regions and Blocks.
  • Progress continues on the C API and the Python bindings:

Optimizations and Code Generation

  • Bufferization has been refactored significantly. It now smoothly supports gradual bufferization and is more ergonomic to use.
  • Memref casting operations have been upstreamed from the lhlo dialect into the standard dialect.
  • The async dialect has progressed further. On the CPU side, support for lowering scf.parallel loops to asynchronous execution has been prototyped and is under review. On the gpu side, a lowering to cuda and streams is implemented and under review.
  • A first functional version of sparse compilation in MLIR is under review.

SPIR-V

  • The SPIR-V dialect now supports spv.Vector{Insert|Extract}Dynamic. Patterns are added to convert from vector dialect to them.
  • The Vector16 capability is now supported to allow using 8/16-element vectors.
  • More decorations are supported in (de)serialization.

In the Ecosystem

mlir-npcomp: Prototype for compiling numerical Python programs

  • milestone: First end-to-end execution from PyTorch frontend to the reference backend. (simple program), (patch).
  • milestone: npcomp now uses 100% upstream passes for bufferization!
  • Almost all of the TCP dialect has been replaced with std elementwise ops on tensors + linalg named ops on tensors
  • Over 700 lines have been removed now that std.global_memref and tensor constant bufferization has landed upstream.

TensorFlow / MLIR-HLO

TensorFlow Kernel CodeGen:

  • The generated host-side for unary operations has landed and we are investigating last performance gaps. With the planned optimizations landed, we are on par with Eigen for some operations but 10-15% slower on others. This needs further investigation.
  • We added support for error reporting by lowering the assert operation in mlir core to TensorFlow framework specific code. This allows propagation of errors due to, e.g., constraints from the shape dialect, to the runtime. A practical use if to report failed broadcasts.
  • Binary operations produce the correct result but are significantly slower due to missed fusions of broadcasts. We have expanded our (small) fusion heuristic. Furthermore, we are currently reshuffling our lowering pipeline to the final anticipated form so that we can make better use of information from higher-level dialects (like shape). This should bring us closer in performance.
  • We have added more element-wise kernels, planning to launch them once the generated host-side has competitive performance.

CIRCT : Circuit IR Compilers and Tools aka ‘MLIR for hardware’

  • The first commits for a Verilog simulator DPI interface mechanism have landed. This will allow regressions against reference simulators, like Verilator.
  • An initial C API has landed.

Recent Talks

  • 2020-11-05: COMET: Domain Specific Compilation for Computational Chemistry ; slides - recording
1 Like