How should non-control-flow region-bearing ops be used with by dataflow analyses?

j2kun · May 20, 2025, 6:23am

Suppose I have a region-bearing op that has a single region with a ReturnLike terminator that corresponds to the region-bearing op’s results.

Example: affine.for with an iter arg is the region-bearing op, affine.yield is the return-like terminator, and the yielded value corresponds directly with the op result. [edit: affine.for has the control flow interfaces, so I’m not sure which upstream ops this would apply to]

In a sparse data-flow analysis, I want the lattice value of the terminator’s operands to pass to the results of the region-bearing op, but as far as I can tell that is not possible today. I can’t even add a custom transfer function for the terminator, because ops with no results do not have their transfer functions called (cf. llvm-project/mlir/lib/Analysis/DataFlow/SparseAnalysis.cpp at 6181f4f89d022ebf33ed8a449655347eb1b9b6c0 · llvm/llvm-project · GitHub)

I’d like to add support for this upstream, but I am not familiar with the design considerations here so I want to ask if:

There’s a way to do this that I don’t see (the closest I see is visitNonControlFlowArgumentsImpl, but this is run at the start of the block before the terminator has analysis results ready).
If there’s any particular reason why this situation should not be supported in the sparse analysis by default.

ftynse · May 21, 2025, 8:39am

I expect the behavior you describe to already be the default, is it not? The main caveat is, similarly to the rest of MLIR, is that the parent operation must be handling the terminators rather than doing something special for the terminator itself.

For the affine.for example, it should happen in the handling of the affine.for itself which does have results. Normally it should implement RegionBranchOpInterface, triggering this logic: llvm-project/mlir/lib/Analysis/DataFlow/SparseAnalysis.cpp at 6181f4f89d022ebf33ed8a449655347eb1b9b6c0 · llvm/llvm-project · GitHub. This interface describes how data flows between terminator operands and parent results, or arguments of another region if the control flows there.

If you have an operation that doesn’t have control flow expressible via that interface, you should be able to propagate the lattices when handling the region-containing operation itself, including the connection between operation operands/results and region arguments/terminator operands. Once you update the lattices for region arguments/terminator operands and trigger propagation (propagateIfChanged, this not being automatic is a frequently hit problem), I’d expect the operations from the region to be added to the processing queue. My team has only done this for memory-related lattices on linalg.generic, so we didn’t need to enter the body, double-check that it works.

j2kun · May 21, 2025, 5:22pm

The affine.for example was poorly chosen in my example. My op in question does not implement RegionBranchOpInterface because there is no branching, and in that case there is no default behavior. But it appears this is mandatory for any region-holding op. I’ll give that a shot.

If you have an operation that doesn’t have control flow expressible via that interface, you should be able to propagate the lattices when handling the region-containing operation itself, including the connection between operation operands/results and region arguments/terminator operands.

This part I still don’t understand, because this is what I was trying to do. How does one establish the connection between terminator operands and region-holding op results? When visiting the region-holding op, the lattices for the ops in the region (including the terminator) may not be initialized yet. It doesn’t make sense to me to process the terminator at that step. Instead I propagate the lattices to the block arguments, the ops in the region get processed, and then when it visits the terminator this line (llvm-project/mlir/lib/Analysis/DataFlow/SparseAnalysis.cpp at 6181f4f89d022ebf33ed8a449655347eb1b9b6c0 · llvm/llvm-project · GitHub) causes it to end early and then it appears I’m stuck.

makslevental · May 21, 2025, 8:21pm

I vaguely remembering running into that early-bail too and being frustrated by it - but I don’t remember how I avoided it!

Trying to solve your problem de novo, is there anything preventing you from doing (in the transfer function for the region-bearing op):

auto termPoint = getProgramPointAfter(terminator);
AbstractSparseLattice *resState = getLatticeElement(op->getResult(x));
setToEntryState(resState);
DataFlowAnalysis::addDependency(resState, termPoint);

(I believe the “point” after the terminator is the right one but you might need to experiment).

Alternatively, I tried removing the early-bail just now and none of the upstream tests failed; looking ahead at removing the check, i.e., “breaking the invariant that resultLattices is non-empty”, I would be highly surprised if there’s any generic code anywhere that somehow assumes SmallVector<AbstractSparseLattice *> resultLattices(op->getNumResults()); is fixed-length (i.e, doesn’t iterate through the vector and instead tries to immediately access resultLattices[0]). Actually I’m fairly certain no one does because there are only two methods that are passed resultLattices, visitRegionSuccessors and visitExternalCall and Triton is the only project currently overriding visitRegionSuccessors; and you (heir) and Enzyme are the only projects overriding visitExternalCall (and you both are iterating).

j2kun · May 21, 2025, 8:50pm

termPoint = getProgramPointAfter(terminator);
AbstractSparseLattice *resState = getLatticeElement(op->getResult(x));
setToEntryState(resState);
DataFlowAnalysis::addDependency(resState, termPoint);

That seems like the thing I was missing. Ok now I have two principled ways to try to solve this. Thanks!

ftynse · May 22, 2025, 9:45am

makslevental:

auto termPoint = getProgramPointAfter(terminator);
AbstractSparseLattice *resState = getLatticeElement(op->getResult(x));
setToEntryState(resState);
DataFlowAnalysis::addDependency(resState, termPoint);

There’s also getLatticeElementFor that automatically sets up the dependency.

Relatedly, one thing I had found annoying is the confusion on the purpose of setToEntryState. This function is also being called when the framework hits an unanalyzable case to set the lattice to the terminal state. The entry and terminal states are equal in one kind of analyses (starting at the lattice top and meeting to go down the lattice; both the initial and the terminal state should be lattice top) and should be different in another (starting at the lattice bottom and joining to go up the lattice; the terminal state should still be lattice top). In Enzyme, IIRC, we ended up having explicit setup in analysis initialization / on first visit to set up the initial state and treating setToEntryState as setToTerminalState for the purposes of pessimizing the lattices in unanalyzable situations.

Topic		Replies	Views
Why scf.forall op doesn't have RegionBranchOp interface? MLIR	5	409	May 24, 2023
Structured Control Flow is Not Necessarily Regions MLIR	18	1801	April 16, 2020
[RFC] Structural and Reachable Control Flow Edges in `RegionBranchOpInterface` MLIR	20	266	April 22, 2026
SparseDataFlowAnalysis don't visit linalg.generic operation MLIR	4	222	September 13, 2023
[RFC?] Region control flow interfaces should encode "region not executed" correctly MLIR	5	353	July 18, 2023

How should non-control-flow region-bearing ops be used with by dataflow analyses?

Related topics