Problem
Currently, we have the following wording in RegionBranchOpInterface
:
Only a region, i.e. a valid
index
, may use the parent operation as a successor.
I believe this is insufficient and incorrect. Specifically, this implies that the control is always transferred to a region before it is transferred back to the parent op. Since the region can only transfer the control back to the parent op from a terminator, this transitively implies that the first block of any region with a RegionBranchOpInterface
is always executed until the terminator can transfer the control flow back. This is trivially false for any conditional-like operation, e.g.
%false = arith.constant false
scf.if %false {
"this.is.not.executed"() : () -> ()
}
I observed this specifically in dataflow analyses that become incorrect due to this interpretation. For example:
// %cond is not a constant so we cannot constant-propagate it
store 0, %ptr
scf.if %cond {
// may or may not be executed
store 1, %ptr
}
// Abstract interpretation via forward dataflow indicates
// the value stored in %ptr to _always_ be "1" here, but should
// instead be a union of "0" and "1".
// Can be seen with -test-last-writer that indicates "store 1"
// as last writer but not "store 0".
load %ptr
But I suspect there are other analyses and/or passes that are affected, e.g., bufferization.
Proposal
I propose to remove the wording above and specifically allow the operation to indicate itself as a successor, thus signaling that it may not necessarily transfer control to any of its regions. By extension, region-carrying operations with no own side effects that don’t transfer control to their regions given specific values of operands are trivially dead and can be DCE’d.
This fixes dataflow transparently, but other passes using this interface may need to be adapted.
Alternatives
We can use the values provided by getRegionInvocationBounds
and check if a region can be executed 0
times. If so, assume the control can be transferred to any of the control flow successors of this region. In the worst case, this requires computing a transitive closure on the graph of control flow transfers between regions. The worst case is also the default one as getRegionInvocationBounds
has the default implementation indicating an “unknown” number of executions, that could include 0
times.