Linalg ops prevent affine loop fusion

aturetsk · February 21, 2025, 10:32pm

Hi all,

I was looking into supporting dual compilation flow where some Linalg ops get converted to affine loops (which get fused if possible) and some remain structured Linalg ops for a different compilation method. I discovered that presence of a structured Linalg op in a block prevents loop fusion from happening.

For example, loop fusion won’t happen for this IR:

func.func @test(%arg0: memref<5xi8>, %arg1: memref<5xi8>) {
  %c15_i8 = arith.constant 15 : i8
  %alloc = memref.alloc() : memref<5xi8>
  affine.for %arg2 = 0 to 5 {
    affine.store %c15_i8, %alloc[%arg2] : memref<5xi8>
  }
  affine.for %arg2 = 0 to 5 {
    %0 = affine.load %alloc[%arg2] : memref<5xi8>
    %1 = affine.load %arg0[%arg2] : memref<5xi8>
    %2 = arith.muli %0, %1 : i8
    affine.store %2, %alloc[%arg2] : memref<5xi8>
  }

  linalg.elemwise_binary ins(%alloc, %alloc: memref<5xi8>, memref<5xi8>) outs(%arg1: memref<5xi8>)

  return
}

While if I replace linalg.elemwise_binary with, say, memref.copy the fusion works as expected.
It seems MemRefDependenceGraph considers LinalgOp as an “unknown region-holding” operator, so it fails to construct. llvm-project/mlir/lib/Dialect/Affine/Analysis/Utils.cpp at 2b06ceebb5b7482d3c7a497f553d37bef4b40d1e · llvm/llvm-project · GitHub

Would it be reasonable to consider LinalgOp to be “region-holding ops with a well-defined control flow” and handle it similarly to RegionBranchOpInterface?

github.com/llvm/llvm-project

mlir/lib/Dialect/Affine/Analysis/Utils.cpp

2b06ceebb


      
            if (!node)
              return false;
          } else if (!isMemoryEffectFree(&op) &&
                     (op.getNumRegions() == 0 || isa<RegionBranchOpInterface>(op))) {
            // Create graph node for top-level op unless it is known to be
            // memory-effect free. This covers all unknown/unregistered ops,
            // non-affine ops with memory effects, and region-holding ops with a
            // well-defined control flow. During the fusion validity checks, we look
            // for non-affine ops on the path from source to destination, at which
            // point we check which memrefs if any are used in the region.
            Node *node = addNodeToMDG(&op, *this, memrefAccesses);
            if (!node)
              return false;
          } else if (op.getNumRegions() != 0 && !isa<RegionBranchOpInterface>(op)) {
            // Return false if non-handled/unknown region-holding ops are found. We
            // won't know what such ops do or what its regions mean; for e.g., it may
            // not be an imperative op.
            LLVM_DEBUG(llvm::dbgs()
                       << "MDG init failed; unknown region-holding op found!\n");
            return false;
          }

MaheshRavishankar · February 22, 2025, 8:09pm

That seems like a good idea to me

bondhugula · February 23, 2025, 10:42am

The presence of other region-holding ops elsewhere in the Block shouldn’t prevent affine fusion in parts where fusion is possible, assuming those other ops implement the right interfaces and memory effects.

aturetsk · February 25, 2025, 9:08pm

Do you imply that Linalg operators should implement RegionBranchOpInterface?
If so, it’s not clear to me how it would look like since conceptually the control flow never really enters the region (not sure why Linalg operators are not IsolatedFromAbove in the first place, given their specifics and limitations, tbh).
Something like this?

void getSuccessorRegions(RegionBranchPoint point, SmallVectorImpl<RegionSuccessor> &regions) {
  // Control flow does not enter the region.
  if (point.isParent()) {
    regions.emplace_back();
  }
}

ftynse · February 28, 2025, 12:49am

No, they should not implement those interfaces. We explicitly removed that because the flow of values along control flow edges cannot be expressed with that interface. We either need to relax the RegionBranchOpInterface to not assume one-to-one value mapping from operands to block arguments, and from yielded values to op results; or define a separate interface for the affine fusion that could defer to RegionBranchOpInterface when available. I’d be in favor of the former, but it requires a careful consideration and probably an RFC.

MaheshRavishankar · February 28, 2025, 2:16am

I don’t think being IsolatedFromAbove provides much value for Linalg ops and it’s an unnecessary requirement. This is a digression though.

There might be a simpler option for you here. You can lower Linalg ops to affine loops. You could do that and do fusion

bondhugula · March 4, 2025, 3:36am

Yes, I think a basic interface is missing in MLIR, that if available, could be made use of in the affine MDG used by fusion. It could be called “LocalEffects” and can be implemented by imperative ops that have no effects on state outside of what’s directly available through their operands (for eg., they can’t access a memref.global, can’t make a call to another function that can potentially do so, can’t perform a synchronization/wait on other pending memory operations, etc. as examples), including through ops in their regions. LocalEffectsInterface could be attached to such ops: typically region-holding ops but also others that are otherwise not Pure, but still have local effects (including potentially mem read and write effects on one or more of their operands).

Being able to identify such “local”/“direct operand effecting” operations is, in general, useful for several analyses and optimizations. In general, such ops aren’t meant to use arbitrary/unknown types, i.e., only int/fp/tensor and such value/POD types and memrefs, and only have read/write effects on operand memref types if at all.

aturetsk · March 4, 2025, 9:03pm

@bondhugula,

Should LocalEffects be a treat or an interface? If you envision this as an interface, what methods should it provide?
Could this property be integrated into existing MemoryEffectOpInterface, rather than be represented by a separate treat/interface? E.g. add a method isLocal which returns false by default and can be overridden for operators which satisfy the criteria you described.

bondhugula · March 6, 2025, 3:06am

MemoryEffectOpInterface works based on “statically” associating effects to the op’s operands, results, block args, or symbol refs. The “local effects” property may not sometimes apply value-wise, but pertain to the op as a whole, i.e., an op that does not implement a memory effects interface may still want to specify that property. So, extending mlir::EffectInstance may not be sufficient.

On the Trait vs Interface, the latter is the recommended one, being more flexible in encoding information.

aturetsk · March 7, 2025, 8:29pm

bondhugula,
I’ve implemented the interface based on your description. Some tests are failing because I haven’t marked all applicable operators as LocalEffectsOpInterface . Once I get confirmation that the interface changes look good, I’ll add it to more ops. Please take a look:

github.com/llvm/llvm-project

[mlir] Introduction of LocalEffectsOpInterface

main ← aturetsk:turetski/affine-fusion-linalg-2

opened 08:24PM - 07 Mar 25 UTC

aturetsk

+107 -18

Introduce new interface `LocalEffectsOpInterface` to allow querying if the opera…tor does not change state outside its operand or its nested operator's operands. Meant to improve affine loop fusion analysis and potentially other analyses. Based on discussion: https://discourse.llvm.org/t/linalg-ops-prevent-affine-loop-fusion/84767

mehdi_amini · March 7, 2025, 8:29pm

I don’t quite understand why our current side-effect infra isn’t enough and why a new interface is needed actually. I can’t follow your description @bondhugula, can you elaborate with some examples maybe?

bondhugula · March 7, 2025, 11:50pm

An op can potentially have effects on memory that are not reflected in or even connected to its SSA operands. These are not captured in the declared memory effects (for e.g. the op may have zero operands or have only float/int operands or have no memory effects on its operands itself). The op will simply not be marked pure/speculatable, and a utility can’t distinguish it from other non-pure ops which only have “local effects”.

Consider the case where the semantics of an op is to read an operand memref and use that to read other memory (one or more levels of indirection). If such semantics aren’t captured directly via the op’s operands or those of its region’s ops, memory effects won’t tell a utility/pass whether other memrefs are potentially read/written (outside of the op’s operands).

Groverkss · March 8, 2025, 1:19am

The op will simply not be marked pure/speculatable, and a utility can’t distinguish it from other non-pure ops which only have “local effects”.

I think we already have “local effects” through recursive effects / speculation.

The op will implement recursive speculation, just like linalg does: llvm-project/mlir/lib/Dialect/Linalg/IR/LinalgOps.cpp at main · llvm/llvm-project · GitHub

For effects in a region, an operation should implement RecursiveMemoryEffects trait: llvm-project/mlir/include/mlir/Interfaces/SideEffectInterfaces.td at main · llvm/llvm-project · GitHub or explicitly list out memory effects in it’s regions.

bondhugula · March 11, 2025, 3:24am

I think we already have “local effects” through recursive effects / speculation.

No, recursive side effects don’t help with describing “local effects” and are orthogonal to the latter. An op with “local effects” as described above may or may not have any regions at all, and its local or non-local effects are independent of the ops contained in its region. I think I had already provided an example. But to make it clearer, consider:

%r = abc.wait %handle : memref<1xi32>

The above may declare a memory read effect on %handle, i.e., it reads the memory of handle. However, if the semantics of the op are to read the element of %handle and, based on that information, finish all associated memory writes/reads that were earlier issued on other memory (not reflected in the operands, but available through %handle or otherwise encoded in the op’s semantics), then the op has “non-local effects” per the proposed interface. This is just one among the numerous examples I can think of. In fact, one may have zero operand ops like “abc.flush” with such non-local effects. These ops will not implement RecursiveMemoryEffects, and in the latter case, have no memory effects at all. They are all simply non-pure.

In contrast, if there are ops that simply read/write their operands’ memory (e.g. memref.copy, affine.load/store, memref.load/ and the numerous other ops in upstream dialects) without anything else non-local, these are ops with “local effects”. They are non-pure as well but of a different kind.

The new interface trait being proposed allows utilities/analyses to distinguish between this class of ops and ones like “abc.wait” that I mentioned above. Otherwise, there is nothing in MLIR, direct or indirect, that would allow one to distinguish between these two classes. That ability can be pretty powerful and broadly applicable.

matthias-springer · March 11, 2025, 7:52am

I think the MemoryEffectsOpInterface can express non-local effects. Side effect do not necessarily have to be associated with an operand.

You can use this constructor: EffectInstance<EffectT>::EffectInstance(EffectT *effect, Resource *resource = DefaultResource::get())

E.g., we use this functionality in the transform dialect to indicate that a transform op may modify the payload, which is not an operand of the op. As another example, vector.print has MemoryEffects<[MemWrite]> (not associated to any operand) to model the fact that something is printed to stdout.

I would specify the memory effects as follows:

MemRead side effect on the operand.
MemRead side effect on the operation. (Encodes the fact that there are additional memory reads.)

bondhugula · March 11, 2025, 7:59am

Yes, it can be used to express non-local effects, but it doesn’t address the requirement to distinguish between those two classes of ops I mentioned above. Ops that have such non-local effects aren’t required to create abstract resources and model effects on such resources - e.g. gpu.wait aysnc doesn’t implement any such effects; similarly, gpu.barrier doesn’t. Requiring ops to model all such resources and declare such effects may not be practical: there is also no clear resource or an existing effect that may fit. The “local effects” interface is a way to summarily say such effects exist.

MemRead side effect on the operation. (Encodes the fact that there are
additional memory reads.)

This is trying to force-fit or piggy back things which may lead to other issues. For eg., saying that there is a “read” side effect on the operation is meaningless, it’s an effect of the operation; can’t mix imperative things and resources. The effect itself is on some other resource that you’d have to model (introduce) and it’ll have to be abstract - again, consider the barrier or wait example.

matthias-springer · March 11, 2025, 8:07am

Yes, I meant “of the operation”.

There is also DefaultResource for cases where you just want to say that there’s a side effect, but don’t want to specify the exact resource.

bondhugula · March 11, 2025, 8:12am

Yes, but that doesn’t distinguish between a local and global effect (for eg. an op can allocate some local memory, run and free it as parts of its internal semantics). So, all the problems that I mentioned above still exist despite what side effects interface provides.

The meaning of “local effect” in other words is that the underlying resource can possibly only be accessed by this op and only this op outside of the effects on the operands/results. The current design in MLIR doesn’t capture this info.

Saying “effect of the operation” is redundant! If it’s being declared on it, which other operation could it be possibly talking about?! It obviously has to be.

matthias-springer · March 11, 2025, 8:37am

I don’t understand the exact definition of “local effect” and “global effect” yet.

In case of a memory operation, what is the underlying resource? Do we model the entire main memory as a single resource? In that case, are the effects of the following ops all global effects? (Given that other ops can also access main memory.)

memref.alloc
memref.dealloc
memref.load
memref.store
hypothetical op can allocate some local memory, run and free it as parts of its internal semantics

bondhugula · March 11, 2025, 8:39am

These all have “local” effects since the effects are directly on their operands or results - i.e., the memory “resource” that they are and there no other effects.

EDIT: I realize the way I defined it ignored the effects that were already declared on operands/results. Updated that.

Topic		Replies	Views
[RFC] add linalg::IndexOnlyGenericOp MLIR	13	1323	March 6, 2020
Beginner Q: Help with loops/affine/linalg MLIR	15	2214	April 29, 2021
Bug in `OperationEquivalence` (breaks `-cse` on `linalg.index`) MLIR	48	1731	May 7, 2025
Understanding the affine loop fusion pass MLIR	20	1681	May 10, 2023
Should linalg.indexed_generic allow for affine operations on its body? MLIR	27	1776	October 31, 2021

Linalg ops prevent affine loop fusion

Related topics