What’s the right way to deal with infinite loops during analyses and canonicalizations?
Is it safe to fold them away? Doing so would change the program behavior from “does not terminate” to “terminates”. Moreover, the loop body may contain side-effecting ops, and those side effects are getting lost.
We currently assume in various places of the code base that infinite loops can be removed and that came as a surprise to me. Some examples:
constantTripCount: this function returns 0 if the step size is 0. (I expected “infinite”.)
Most recently, this issue came up on this PR (Arith/int-range-narrowing.mlir test case).
Different case, but somewhat related: Is it safe to remove all operations that precede a ub.unreachable? I had an PR that added such a canonicalization and @mehdi_amini approved it, but I did not merge that part in the end because something seemed fishy to me. These were @mehdi_amini’s comments:
We should TODO that:
* this assumes we don't have calls that are "no return".
* this assumes that loops terminates.
* this assumes that nothing interrupts the control-flow (which is fine until early-exit is added).
Maybe we should have a trait "AlwaysForwardProgress" on operations to allow this kind of transformations?
This seems to be a common design questions in system languages such as C, C++ and Rust which is why at least LLVM supports both, treating infinite loops as UB (and in practice dead) or treating them as legal constructs based on metadata on the loop: LLVM Language Reference Manual — LLVM 23.0.0git documentation
What LLVM (and those languages) don’t do though is consider a loop containing side-effects to be UB/dead code. The above test cases you linked have side-effects in their body and should IMO not be removed and their trip count considered unknown.
Whether to remove non-side-effecting loops I am less sure about. The benefit of removing them doesn’t seem very obvious to me which makes me error on allowing non-terminating loops. If one does find a benefit, we could also add a mustprogress attribute on scf.for to guarantee termination in the scf.for logic.
For reference, this is what the llvm.loop.mustprogress documentation says:
The llvm.loop.mustprogress metadata indicates that this loop is required to
terminate, unwind, or interact with the environment in an observable way e.g.
via a volatile memory access, I/O, or other synchronization. If such a loop is
not found to interact with the environment in an observable way, the loop may
be removed. This corresponds to the mustprogress function attribute.
So to summarize, following the LLVM design here would mean:
Adding a new mustprogress attribute to loop ops such as scf.for and scf.while. (And also exposing it via the LoopLikeOpInterface.)
Applying the following rules during canonicalization:
Infinite loop with mustprogress = false: Do nothing. (I.e., do not remove the loop.)
Infinite loop with mustprogress = true and side effecting ops in the body: Do nothing.
Infinite loop with mustprogress = true and without side-effecting ops in the body: Remove the loop and replace its results with ub.poison.
It would make sense to me to follow the same design in MLIR. Any concerns? I would even be fine with mustprogress = 1 being the default value.
As a first step, I’d like to fix the trip count calculation, which is obviously wrong: it returns “0” for infinite loops.
The issue goes beyond loops: in MLIR any op can internally model a loop nest, which is why I suggested a trait (or an interface) in the review.
Defaulting to “must progress” for MLIR makes sense to me: it should be the vast majority of ops and only ops that may not terminate should implement the interface.
“Forward control-flow reasoning” is only valid in SSACFG regions. Inside a graph region, it can’t be applied, but we can still consider unreachable and no_return operations by their data dependencies, i.e., it’s can also be a property propagated on SSA def-use edges.
According to the recent clarification on side-effects, all non-dataflow things that affect the compiled output program need to be modeled as side-effects. In contrast to my old argument, that means we can always detect "meta-"operations that don’t “execute” but still impact code-gen via side-effects. However, they’re still never part of forward control flow reasoning.
Although I implemented it badly, I think it still stands to reason that an unreachable operation may still produce a value at compile-time, i.e., it can only be removed once it is known that it will never fold. In my old attempt, I only marked ops as no_return when they were unreachable, all their operands were m_Constantandop.fold(...) fails.
There is stil an open discussion on the RFC: what should be the default value for mustProgress.
This is the current implementation on the PR:
Op does not implement the ExecutionProgressOpInterface: we have no information, mustProgress = false.
The default implementation of the ExecutionProgressOpInterface is mustProgress = true.
Reason for this implementation approach:
Safety: If we don’t know for sure, we have to treat operations conservatively. mustProgress = true may trigger certain transformations. (E.g., what are we going to do with unregistered operations? Assuming that mustProgress = true may be wrong. Assuming that mustProgress = false does not add any additional information and is never wrong.)
Consistency with LLVM: mustprogress = false is the default value in LLVM.
Open questions:
@mehdi_amini What does this mean for post-dominance?
@mehdi_aminimustProgress = false by default makes the common case overly expensive.
@kuhar How do we deal with nested operations, where the mustProgress property of the enclosing and the inner operations do not match?
Re post-dominance: Example: Operations may be moved around based on post dominance. In the below example, B post-dominates A.
print("A")
infinite_loop // no side effects, infinite looping not statically detectable
print("B")
Would it be incorrect to reorder the program as follows?
print("A")
print("B")
infinite_loop // no side effects, infinite looping not statically detectable
I’m trying to understand how LLVM handles this case. It looks like the post-dominator tree is built based on only the CFG, without taking into account mustprogress.
the cost of this (checking an interface), but we can alleviate this in the future by stealing a bit on the operation class if this shows on profiles.
the way to land this in the ecosystem: the lack of progress is likely a “niche” situation. It seems to me to be more aligned with user expectations that the default is that operations are behaving “normally”, that is the control is transferred to the next operation in the block (in SSA CFG of course) and that only the rare cases of operation that may not progress would bear the burden of declaring it.
For nested operations, I would just mimic what we do for side-effects: do you see any issue with this?
I agree, but the moment you have an unregistered operation, you don’t have any information about the op. We have to be conservative. The conservative choice is mustProgress = false.
I see the following alternatives:
If the ExecutionProgressOpInterface is not implemented, mustProgress = false. Otherwise, mustProgress is defined by the interface.
If the operation is unregistered, mustProgress = false. If the operation is registered and the ExecutionProgressOpInterface is not implemented, mustProgress = true. Otherwise, mustProgress is defined by the interface.
Do you see any other alternatives?
Re Option 2: I believe this would be the first time that we treat unregistered operations in a special way. (As opposed to registered operations that do not implement any trait or interface.)
What does that mean in concretely? A trait like RecursiveMemoryEffects, but for mustProgress? I have the same question: what’s the default behavior for an op that does not implement any interface or trait?
One possibility: in the presence of regions, the default value of mustProgress (when the ExecutionProgressOpInterface is not implemented) is “false” if at least one operation in a region has mustProgress = false. Otherwise, mustProgress = true.
This sounds great to me. If checking for hasTrait is faster (either now in the future), we could also add a trait that implements the interface and always returns true for mustprogress and return true early in the implementation.
This is very subjective, but I think its not super niche. Any loop op that can create infinite loops + any call op (or call like op) may potentially not progress. Besides, regardless of how niche a case is, I think optimizations should generally be enabled by users by implementing interfaces or adding traits rather than disabled once someone nailed down the semantics or ran accross a miscompile.
Regarding churn for landing, we could have Pure also imply mustprogress the way it was done for speculability. That should cover the majority of relevant cases.
I don’t think that is an issue and the outer one “wins” so to say. Effectively what the mustProgress attribute on scf.for does is say “If it weren’t so, it’d be undefined behaviour, ergo it has to progress”, therefore it can always return true for “mustProgress”. An inner loop may or may not be marked as such (but could as an optimization), but not having the attribute set is always safe
+1, I’d expect that in ML-like applications mustProgress is a reasonable default, but for a general-purpose compiler framework, that’s overly optimistic in presence of function calls. Even in graphics (e.g., through the SPIR-V dialect), killing a thread or demoting it to a helper if fairly common.
This makes me think that having mustProgress as the default for loops / ifs is probably not safe except for linalg/affine. Patterns like:
void maybeQuit() {
if (buttonPressed) exit(0);
...
}
while (true) {
...
maybeQuit();
}
are fairly common. I’d think that it’d be safer to start with no progress assumption and only set it if (recursively), all the ops within the region are known to mustProgress / willreturn (do we also need this as a function att?).
I could see an argument that it’s onto the emitter/frontend to set mustProgress=false, but I think that this would end up being always set to false and enabled by analysis, so we might as well default to that in the generic code path.
Sure, now make a ratio of the number of “op-with-infinite-loop” and “call” ops to the rest of the ops in the ecosystem? I expect this ratio to be tiny, hence “niche”.
(it is also not “every call”, but only calls in dialects/languages which permit a lack of progress)
That would address my concern partially, because then I am concerned about the efficiency of the implementation. This may need to be more “core” and steal bits somewhere on Operation.
I’m keen on having the presence of some mandatory traits indicate the well-behavedness of operations for specific concerns under all contexts. With mandatory I mean that their absence will also carry information for registered ops. Maybe we should think about disallowing built-in interfaces (e.g., control flow) to have external models? Then their absence would also provide some information and fundamental op properties could never depend on the set of loaded dialects.
I am, however, a bit concerned with the existing traits not quite reflecting the actual guarantees that passes checking for them intend to use. In the past, I’ve done a little survey of ops and passes in the wild to determine what is tacitly assumed and what guarantee was actually needed. The short summary comes out to this:
MetaLike: Has structural validity constraints or non-CF, non-DF semantics.
I.e., ops you usually can’t modify unless your x-form knows their concrete type or interface.
E.g.: builtin.module, func.func, …
FunctionLike: DF validity is exclusively type-based.
I.e., verification depends only on the types, not sources of operands. One guarantee you need for outlining, inlining, strength reduction around it, …
E.g.: arith.add, … (most things actually, but notaffine.apply, for example)
ContinuationLike: Must receive control.
I.e., must be in an SSACFG block / CF construct. Necessary to allow control-flow reasoning.
E.g.: func.return, all SSACFG IsTerminator ops, …
DelimitedContinuationLike: Is ContinuationLike, and may yield control.
I.e., can appear anywhere in linear CF. Necessary to perform code motion.
E.g.: func.call, none of the IsTerminator ops, …
RoutineLike: FunctionLike and DelimitedContinuationLike
What I believe most people currently assume to find in SSACFG blocks, except terminators.
A few more that don’t matter here.
Am I correct in understanding that MustProgress is meant to imply DelimitedContinuationLike, i.e., only operations from which you are able to locally receive control (e.g., put a ContinuationLike after it), make progress? Because I would also consider undelimited continuations that prevent the current context from “returning”, but will still terminate the program, as MustProgress for the purposes of the earlier optimizations.
Edit: Maybe I should give a concrete example. Let’s say your dialect has some sort of Exception system, and you want to be able to throw from within an scf.if. You’ll need an op that is not a terminator, but isn’t a delimited continuation. If you have a full-blown effect system, then you might even make progress in the same function later.
Registered operation that does not implement ExectionProgressOpInterface: mustProgress = true
Registered operation that implements ExecutionProgressOpInterface: mustProgress is determined by the interface method.
I’d still prefer mustProgress = false for Case 2, but given that there’s precedent for special treatment of unregistered ops (mightHaveTrait), I can live with that.
Open question 1: Do we need special treatment for call ops in any shape or form? E.g., when calling a function that may not progress but there’s no annotation on the call op.
Observation 2: Give this a sanity check. The scf.for might not progress. However, the scf.execution_region is guaranteed to progress. (And there’s no way to specify that scf.execution_region might not progress because it doesn’t have a mustProgress attribute. Adding such an attribute to every single region op seems overkill…)
IMO, it would be correct to canonicalize the scf.for to mustProgress=true.
IMO, according to the current definition of “must progress” this would be valid: foo.Kill has an observable side effect, meaning that the loop also has an observable side effect. After executing foo.Kill, the loop stops executing.
Operations that "must progress" are required to return normally (control
flow reaches the next operation) or interact with the environment in an
observable way (e.g., volatile memory access, I/O, synchronization or
program termination).
OK, so I think in the llvm vernacular, foo.Kill would also be mustProgress but won’t willreturn? If this is the case, I think the valid optimization over the sample loop would be to turn it into:
Maybe I’m being way too obtuse about this, but I don’t quite understand it yet. In LLVM, it makes sense to me, but in MLIR I’m not following.
How would you interpret break and continue, for example? Are they simply not required to progress? And if so, is it reasonable that they should prevent DCE as mentioned above?