Welcome to the 56th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous editions.
Highlights and Ecosystem
-
2023 US LLVM Dev Meeting Oct 10th to 12th [Program].
-
Triton Developer Conference: A number of interesting presentations (a) The Triton Compiler: Past, Present and Future - Phil Tillet (OpenAI); (b) Hopper support in Triton - Nvidia; (c) Bringing Triton to AMD GPUs; (d) Intel XPU Backend for Triton - Google; (e) Vectorization of Triton Kernels for Qualcomm Hexagon; (f) Triton for MTIA; (g) Triton IR for high-performance fusions in XLA; (h) Triton for All - Microsoft; (i) PyTorch 2.0 and TorchInductor - Meta; (j) Pallas: A JAX Kernel Language - Google; (k) Grouped GEMMs in Triton - Nvidia.
-
Open MLIR Meeting 9/28/2023: [RFC] Sharding Framework Design for Device Mesh
-
LLVM Weekly [508th Issue].
MLIR Commits
-
Jakub Kuderski, “This patch extends
matchPattern
to support matching overAttribute
s. The primary use case is constant folders and canon patterns, where matchingAttribute
s is preferred overValue
s/Operation *
as it doesn’t require re-folding ops.” [Dif]. -
Matthias Springer, “One-Shot Bufferize no longer deallocates buffers, so
deallocationFn
can be removed. Note: There is abufferization.dealloc_tensor
op that now always
bufferizes tomemref.dealloc
. This op will be phased out soon. [Diff]”. -
Diego Caballero, “[mlir][Vector] Add support for Value indices to vector.extract/insert
vector.extract/insert
ops only support constant indices. This PR is extending them so that arbitrary values can be used instead.” [Diff]. -
Ingo Muller, “This PR adds a new transform op that replaces
memref.alloca
s withmemref.get_global
s to newly insertedmemref.global
s. This is useful,
for example, for allocations that should reside in the shared memory of a GPU, which have to be declared as globals.” [Diff]. -
Ingo Muller, “
structured.masked_vectorize
=>structured.vectorize
. This reflects
the fact that since recently the op can also handle the unmasked case.” [Diff]. -
Martin Erhart, “This commit removes the deallocation capabilities of one-shot-bufferization. One-shot-bufferization should never deallocate any memrefs as this should be entirely handled by the ownership-based-buffer-deallocation pass going forward. This means the
allow-return-allocs
pass option will default to true now,create-deallocs
defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will be removed.” [Diff]. -
Nicolas Vasilache, “This revision adds a rewrite for sequences of vector
bitcast(trunci)
to use a more efficient sequence of vector operations comprisingshuffle
andbitwise
ops. The rewrite performs a simple enumeration of each of the bits in theresult vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence ofshuffle
,andi
,ori
followed by an optional finaltrunci
/extui
. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect.” [Diff]. -
Matthias Springer, “This revision adds support for empty tensor elimination to “bufferization.materialize_in_destination” by implementing the
SubsetInsertionOpInterface
”.[Diff]. -
Aart Bik, “A bufferization.alloc_tensor can be directly replaced with tensor.empty since these are more or less semantically equivalent. The latter is considered a bit more “pure” with respect to SSA semantics.” [Diff].
-
Victor Perez, “Change
SingleBlock::{insert,push_back}
to avoid inserting the argument operation after the block’s terminator. This allows removingSingleBlockImplicitTerminator
’s functions with the same name.”, [Diff]. -
Martin Erhart, “Define a pipeline for buffer de-allocation. Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).”, [Diff].
MLIR RFC Discussions
-
Open MLIR Meeting 9/28/2023: [RFC] Sharding Framework Design for Device Mesh
-
A. Zinenko, "
linalg.generic
is a structuring concept that represents loops and memory accesses, the actual computation happens in its body that can contain any operation applicable to a valid tensor/memref element type. It is possible to have tensors of signed integers, and downstream dialects may well have operations processing those that can go inside a generic body. Named Linalg ops have predefined bodies usingarith
and therefore use signless integers. You may be confusing singless with unsigned. See MLIR Rationale - MLIR. In particular,MaxSIOp
operates on signless integers. That’s why it has “S” in its name indicating that the most significant bit is interpreted as a sign by this operation as opposed toMaxUIOp
that interprets it as actually most significant bit.". -
M. Springer, " it’s worth mentioning that the
DestinationStyleOpInterface
guarantees that inits and results have the same dynamic shape. I added support inValueBoundsConstraintSet
for that recently".
MLIR Ecosystem
Useful Links