MLIR News, 56th edition (27th September 2023)

javedabsar · September 24, 2023, 11:55am

Welcome to the 56th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous editions.

Highlights and Ecosystem

2023 US LLVM Dev Meeting Oct 10th to 12th [Program].
Triton Developer Conference: A number of interesting presentations (a) The Triton Compiler: Past, Present and Future - Phil Tillet (OpenAI); (b) Hopper support in Triton - Nvidia; (c) Bringing Triton to AMD GPUs; (d) Intel XPU Backend for Triton - Google; (e) Vectorization of Triton Kernels for Qualcomm Hexagon; (f) Triton for MTIA; (g) Triton IR for high-performance fusions in XLA; (h) Triton for All - Microsoft; (i) PyTorch 2.0 and TorchInductor - Meta; (j) Pallas: A JAX Kernel Language - Google; (k) Grouped GEMMs in Triton - Nvidia.
Open MLIR Meeting 9/28/2023: [RFC] Sharding Framework Design for Device Mesh
MLIR C/C++ Frontend Working Group [Mon, Sep 25th]
LLVM Weekly [508th Issue].

MLIR Commits

Jakub Kuderski, “This patch extends matchPattern to support matching over Attributes. The primary use case is constant folders and canon patterns, where matching Attributes is preferred over Values/Operation * as it doesn’t require re-folding ops.” [Dif].
Matthias Springer, “One-Shot Bufferize no longer deallocates buffers, so deallocationFn
can be removed. Note: There is a bufferization.dealloc_tensor op that now always
bufferizes to memref.dealloc. This op will be phased out soon. [Diff]”.
Diego Caballero, “[mlir][Vector] Add support for Value indices to vector.extract/insert
vector.extract/insert ops only support constant indices. This PR is extending them so that arbitrary values can be used instead.” [Diff].
Ingo Muller, “This PR adds a new transform op that replaces memref.allocas with memref.get_globals to newly inserted memref.globals. This is useful,
for example, for allocations that should reside in the shared memory of a GPU, which have to be declared as globals.” [Diff].
Ingo Muller, “structured.masked_vectorize => structured.vectorize. This reflects
the fact that since recently the op can also handle the unmasked case.” [Diff].
Martin Erhart, “This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the allow-return-allocs pass option will default to true now, create-deallocs defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will be removed.” [Diff].
Nicolas Vasilache, “This revision adds a rewrite for sequences of vector bitcast(trunci)
to use a more efficient sequence of vector operations comprising shuffle and bitwise ops. The rewrite performs a simple enumeration of each of the bits in theresult vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence of shuffle,andi, ori followed by an optional final trunci/extui. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect.” [Diff].
Matthias Springer, “This revision adds support for empty tensor elimination to “bufferization.materialize_in_destination” by implementing the SubsetInsertionOpInterface”.[Diff].
Aart Bik, “A bufferization.alloc_tensor can be directly replaced with tensor.empty since these are more or less semantically equivalent. The latter is considered a bit more “pure” with respect to SSA semantics.” [Diff].
Victor Perez, “Change SingleBlock::{insert,push_back} to avoid inserting the argument operation after the block’s terminator. This allows removing SingleBlockImplicitTerminator’s functions with the same name.”, [Diff].
Martin Erhart, “Define a pipeline for buffer de-allocation. Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).”, [Diff].

MLIR RFC Discussions

Open MLIR Meeting 9/28/2023: [RFC] Sharding Framework Design for Device Mesh
A. Zinenko, " linalg.generic is a structuring concept that represents loops and memory accesses, the actual computation happens in its body that can contain any operation applicable to a valid tensor/memref element type. It is possible to have tensors of signed integers, and downstream dialects may well have operations processing those that can go inside a generic body. Named Linalg ops have predefined bodies using arith and therefore use signless integers. You may be confusing singless with unsigned. See MLIR Rationale - MLIR. In particular, MaxSIOp operates on signless integers. That’s why it has “S” in its name indicating that the most significant bit is interpreted as a sign by this operation as opposed to MaxUIOp that interprets it as actually most significant bit.".
M. Springer, " it’s worth mentioning that the DestinationStyleOpInterface guarantees that inits and results have the same dynamic shape. I added support in ValueBoundsConstraintSet for that recently".

MLIR Ecosystem

Useful Links

Latest Community topics - LLVM Discussion Forums
MLIR Open Design Meetings
Deprecations & Current Refactoring
TensorFlow Forum
Alex Bradbury’s LLVM Weekly
IREE (openxla.github.io)

Topic	Replies	Views
MLIR News, 57th edition (11th October 2023) Newsletter llvm-weekly	492	October 8, 2023
MLIR News, 59th edition (20th December 2023) Newsletter llvm-weekly	464	December 18, 2023
MLIR News, 54th edition (30th August 2023) Newsletter llvm-weekly	562	August 26, 2023
MLIR News, 48th edition (7th June 2023) Newsletter	1284	May 31, 2023
MLIR News, 35th edition (5/29 - 6/12/2021) Newsletter	911	June 1, 2021

MLIR News, 56th edition (27th September 2023)

Related Topics