Suppose I have a region-bearing op that has a single region with a ReturnLike terminator that corresponds to the region-bearing op’s results.
Example: affine.for with an iter arg is the region-bearing op, affine.yield is the return-like terminator, and the yielded value corresponds directly with the op result. [edit: affine.for has the control flow interfaces, so I’m not sure which upstream ops this would apply to]
I’d like to add support for this upstream, but I am not familiar with the design considerations here so I want to ask if:
There’s a way to do this that I don’t see (the closest I see is visitNonControlFlowArgumentsImpl, but this is run at the start of the block before the terminator has analysis results ready).
If there’s any particular reason why this situation should not be supported in the sparse analysis by default.
I expect the behavior you describe to already be the default, is it not? The main caveat is, similarly to the rest of MLIR, is that the parent operation must be handling the terminators rather than doing something special for the terminator itself.
If you have an operation that doesn’t have control flow expressible via that interface, you should be able to propagate the lattices when handling the region-containing operation itself, including the connection between operation operands/results and region arguments/terminator operands. Once you update the lattices for region arguments/terminator operands and trigger propagation (propagateIfChanged, this not being automatic is a frequently hit problem), I’d expect the operations from the region to be added to the processing queue. My team has only done this for memory-related lattices on linalg.generic, so we didn’t need to enter the body, double-check that it works.
The affine.for example was poorly chosen in my example. My op in question does not implement RegionBranchOpInterface because there is no branching, and in that case there is no default behavior. But it appears this is mandatory for any region-holding op. I’ll give that a shot.
If you have an operation that doesn’t have control flow expressible via that interface, you should be able to propagate the lattices when handling the region-containing operation itself, including the connection between operation operands/results and region arguments/terminator operands.
This part I still don’t understand, because this is what I was trying to do. How does one establish the connection between terminator operands and region-holding op results? When visiting the region-holding op, the lattices for the ops in the region (including the terminator) may not be initialized yet. It doesn’t make sense to me to process the terminator at that step. Instead I propagate the lattices to the block arguments, the ops in the region get processed, and then when it visits the terminator this line (llvm-project/mlir/lib/Analysis/DataFlow/SparseAnalysis.cpp at 6181f4f89d022ebf33ed8a449655347eb1b9b6c0 · llvm/llvm-project · GitHub) causes it to end early and then it appears I’m stuck.
(I believe the “point” after the terminator is the right one but you might need to experiment).
Alternatively, I tried removing the early-bail just now and none of the upstream tests failed; looking ahead at removing the check, i.e., “breaking the invariant that resultLattices is non-empty”, I would be highly surprised if there’s any generic code anywhere that somehow assumes SmallVector<AbstractSparseLattice *> resultLattices(op->getNumResults()); is fixed-length (i.e, doesn’t iterate through the vector and instead tries to immediately access resultLattices[0]). Actually I’m fairly certain no one does because there are only two methods that are passed resultLattices, visitRegionSuccessors and visitExternalCall and Triton is the only project currently overriding visitRegionSuccessors; and you (heir) and Enzyme are the only projects overriding visitExternalCall (and you both are iterating).
There’s also getLatticeElementFor that automatically sets up the dependency.
Relatedly, one thing I had found annoying is the confusion on the purpose of setToEntryState. This function is also being called when the framework hits an unanalyzable case to set the lattice to the terminal state. The entry and terminal states are equal in one kind of analyses (starting at the lattice top and meeting to go down the lattice; both the initial and the terminal state should be lattice top) and should be different in another (starting at the lattice bottom and joining to go up the lattice; the terminal state should still be lattice top). In Enzyme, IIRC, we ended up having explicit setup in analysis initialization / on first visit to set up the initial state and treating setToEntryState as setToTerminalState for the purposes of pessimizing the lattices in unanalyzable situations.