MLIR News, 55th edition (13th September 2023)

Welcome to the 55th issue of the MLIR Newsletter covering developments in MLIR, and related projects in the ecosystem. We welcome your contributions (contact: javed.absar@gmail.com). Click here to see previous editions.

Highlights and Ecosystem

MLIR Commits

  • Mehdi fixed some AffineOps to properly declare their inherent affinemap to make it more consistent with properties. [click here for diff].

  • Daniil Dudkin: This patch [click here for diff] is part of a larger initiative aimed at fixing floating-point max and min operations in MLIR: [RFC] Fix floating-point `max` and `min` operations in MLIR.

  • Stefan added Handle pointer attributes (noalias, nonnull, readonly, writeonly, dereferencable, dereferencable_or_null) for GPUs. [clck here].

  • Mahesh added movelinalg.filllinalg.pack pattern into fill canonicalization patterns. [clck here for diff].

  • This [diff] by Amy Wang - enables canonicalization to fold away unnecessary tensor.dim ops which in turn enables folding away of other operations, as can be seen in conv_tensors_dynamic where affine.min operations were folded away.

  • This [diff] from Vinicius adds support for the zeroinitializer constant to LLVM dialect. It’s meant to simplify zero-initialization of aggregate types in MLIR, although it can also be used with non-aggregate types.

  • Matthias landed a [diff] which provides a default (Interface) implementation for all ops that implement the DestinationStyleOpInterface. Result values of such ops are tied to
    operand, and those have the same type.

  • This [linalg patch] allows to supply an optional memory space of the promoted buffer.

  • In this [commit]) by Matthias Springer: scf.forall ops without shared outputs (i.e., fully bufferized ops) are lowered to scf.parallel. scf.forall ops are typically lowered by an earlier pass depending on the execution target. E.g., there are optimized lowerings for GPU execution. This new lowering is for completeness (convert-scf-to-cf can now lower all SCF loop constructs) and provides a simple CPU lowering strategy for testing purposes. scf.parallel is currently lowered to scf.for, which executes sequentially. The scf.parallel lowering could be improved in the future to run on multiple threads.

  • In this [alloc-to-alloca conversion for memref] from Alex Zinenko introduces a simple conversion of a memref.alloc/dealloc pair into an alloca in the same scope. Expose it as a transform op and a pattern. Allocas typically lower to stack allocations as opposed to alloc/dealloc that lower to significantly more expensive malloc/free calls. In addition, this can be combined with allocation hoisting from loops to further improve performance.

  • Nicholas Vasilache - [commit] Extract datalayout string attribute setting as a separate module pass. FuncToLLVM uses the data layout string attribute in 3 different ways:
    – LowerToLLVMOptions options(&getContext(), getAnalysis().getAtOrAbove(m));
    – options.dataLayout = llvm::DataLayout(this->dataLayout);
    – m->setAttr(…, this->dataLayout)); .
    The 3rd way is unrelated to the other 2 and occurs after conversion, making it confusing. This revision separates this post-hoc module annotation functionality into its own pass. The convert-func-to-llvm pass loses its data-layout option and instead recovers it from the llvm.data_layout attribute attached to the module, when present. In the future, LowerToLLVMOptions options(&getContext(), setAnalysis<DataLayoutAnalysis>().getAtOrAbove(m)) and options.dataLayout = llvm::DataLayout(dataLayout); should be unified.

MLIR RFC Discussions

  • In the “ConversionTarget” why do we have both “addLegalDialect” and “AddIllegalDialect” ? Cant you infer one from the another ? Whatever is not there in the legal, could be treated as illegal right ? why to mark something illegal explicitly ? — No there is also “unknown” legality Dialect Conversion - MLIR and the effect differs depending on mode of conversion as mentioned there.

  • Qs: “… the difference between canonicalize and sccp . As I have seen, both will use folders and constant materializers to replace ops with constants.” — Answer: “CCP is using the dataflow framework to do control flow analysis, it can infer that something is a constant from this analysis. Canonicalization is a very local transformation that eagerly turns values into constant and tries to iterate greedily.”

  • Questions on bufferization. Some answers from Matthias — "
    – The bufferization will only look at ops that have a tensor operand or tensor result.
    to_tensor and to_memref are used internally to connect bufferized IR and not-yet-bufferized IR. Kind of like unrealized_conversion_cast , but for memref->tensor and tensor->memref conversions. These conversion ops can also survive bufferization in case of partial bufferization. These ops don’t work with other types. Various other parts of the code base also assume tensor/memref types. I was looking at generalizing this to arbitrary “buffer” types (not just memref) at some point, but didn’t have a use case for it.
    – The analysis maintains alias sets and equivalence sets. These are sets of tensor SSA value. There is no tensor SSA value here. Maybe we could put the entire !dialect.struct value in there

MLIR Ecosystem

Useful Links

1 Like