MLIR News, 56th edition (27th September 2023)

Welcome to the 56th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: Click here to see previous editions.

Highlights and Ecosystem

  • 2023 US LLVM Dev Meeting Oct 10th to 12th [Program].

  • Triton Developer Conference: A number of interesting presentations (a) The Triton Compiler: Past, Present and Future - Phil Tillet (OpenAI); (b) Hopper support in Triton - Nvidia; (c) Bringing Triton to AMD GPUs; (d) Intel XPU Backend for Triton - Google; (e) Vectorization of Triton Kernels for Qualcomm Hexagon; (f) Triton for MTIA; (g) Triton IR for high-performance fusions in XLA; (h) Triton for All - Microsoft; (i) PyTorch 2.0 and TorchInductor - Meta; (j) Pallas: A JAX Kernel Language - Google; (k) Grouped GEMMs in Triton - Nvidia.

  • Open MLIR Meeting 9/28/2023: [RFC] Sharding Framework Design for Device Mesh

  • MLIR C/C++ Frontend Working Group [Mon, Sep 25th]

  • LLVM Weekly [508th Issue].

MLIR Commits

  • Jakub Kuderski, “This patch extends matchPattern to support matching over Attributes. The primary use case is constant folders and canon patterns, where matching Attributes is preferred over Values/Operation * as it doesn’t require re-folding ops.” [Dif].

  • Matthias Springer, “One-Shot Bufferize no longer deallocates buffers, so deallocationFn
    can be removed. Note: There is a bufferization.dealloc_tensor op that now always
    bufferizes to memref.dealloc. This op will be phased out soon. [Diff]”.

  • Diego Caballero, “[mlir][Vector] Add support for Value indices to vector.extract/insert
    vector.extract/insert ops only support constant indices. This PR is extending them so that arbitrary values can be used instead.” [Diff].

  • Ingo Muller, “This PR adds a new transform op that replaces memref.allocas with memref.get_globals to newly inserted memref.globals. This is useful,
    for example, for allocations that should reside in the shared memory of a GPU, which have to be declared as globals.” [Diff].

  • Ingo Muller, “structured.masked_vectorize => structured.vectorize. This reflects
    the fact that since recently the op can also handle the unmasked case.” [Diff].

  • Martin Erhart, “This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the allow-return-allocs pass option will default to true now, create-deallocs defaults to false and they, as well as the escape
    attribute indicating whether a memref escapes the current region, will be removed.” [Diff].

  • Nicolas Vasilache, “This revision adds a rewrite for sequences of vector bitcast(trunci)
    to use a more efficient sequence of vector operations comprising shuffle and bitwise ops. The rewrite performs a simple enumeration of each of the bits in theresult vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence of shuffle,andi, ori followed by an optional final trunci/extui. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect.” [Diff].

  • Matthias Springer, “This revision adds support for empty tensor elimination to “bufferization.materialize_in_destination” by implementing the SubsetInsertionOpInterface”.[Diff].

  • Aart Bik, “A bufferization.alloc_tensor can be directly replaced with tensor.empty since these are more or less semantically equivalent. The latter is considered a bit more “pure” with respect to SSA semantics.” [Diff].

  • Victor Perez, “Change SingleBlock::{insert,push_back} to avoid inserting the argument operation after the block’s terminator. This allows removing SingleBlockImplicitTerminator’s functions with the same name.”, [Diff].

  • Martin Erhart, “Define a pipeline for buffer de-allocation. Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).”, [Diff].

MLIR RFC Discussions

  • Open MLIR Meeting 9/28/2023: [RFC] Sharding Framework Design for Device Mesh

  • A. Zinenko, " linalg.generic is a structuring concept that represents loops and memory accesses, the actual computation happens in its body that can contain any operation applicable to a valid tensor/memref element type. It is possible to have tensors of signed integers, and downstream dialects may well have operations processing those that can go inside a generic body. Named Linalg ops have predefined bodies using arith and therefore use signless integers. You may be confusing singless with unsigned. See MLIR Rationale - MLIR. In particular, MaxSIOp operates on signless integers. That’s why it has “S” in its name indicating that the most significant bit is interpreted as a sign by this operation as opposed to MaxUIOp that interprets it as actually most significant bit.".

  • M. Springer, " it’s worth mentioning that the DestinationStyleOpInterface guarantees that inits and results have the same dynamic shape. I added support in ValueBoundsConstraintSet for that recently".

MLIR Ecosystem

Useful Links