Infinite loops with no side effects


This email picks up the thread that to my knowledge was last discussed here:

In brief, infinite loops containing no side effects produce undefined behavior in C++ (and C in some cases), however in other languages, they have fully defined behavior. LLVM’s optimizer currently assumes that infinite loops eventually terminate in a few places, and will sometimes delete them in practice. There is currently no clean way to opt out of this behavior from languages where it’s not valid.

This is the subject of a long-standing LLVM bug:

I wrote a patch implementing Chandler’s idea from the above thread, @llvm.sideeffect, a new intrinsic which is a no-op except that it tells the optimizer to behave as if there were side effects present:

Similar results can be achieved with empty inline asms, however they tend to pessimize optimizations. The patch above allows all of the major optimizations to work in the presence of @llvm.sideeffect.

One of the concerns raised is that front-ends would have to emit a lot of these intrinsics, potentially one in every loop, one in every function (due to opportunistic tail-call optimization), and one in front of every label reachable by goto or similar, if a front-end can’t determine when they aren’t needed. This is indeed a downside. It’s mitigated in this patch by making sure that the major optimization passes aren’t pessimized.

From the alternatives I’ve read, the most promising alternative is Reid’s proposal here:

to make infinite loops defined by default, and add a “known to be productive” attribute to functions. It would be a more complex change, and could potentially require changes in out-of-tree codebases. And it would be suboptimal in some cases when cross-language inlining. However, it would solve the problem in a much less cluttered way. I’m willing to implement the LLVM portion of this if there’s consensus that it’s a better approach.



The two proposals have interesting upsides and downsides:

Making it a property of the function makes a lot of sense because we don’t need the power to say that loops A and B have different termination guarantees in the same function. Also, as you mention, it makes the front-end’s job a lot easier because it just slaps the attribute down and calls it a day.

I think the problem with the attribute (which is solved by the intrinsic) is that we would have to come up a very careful set of semantics for CFGs to make sure that hoisting wouldn’t “break” things like:

while (always_true_at_runtime)
*(volatile int *)NULL = 0;

doesn’t turn into:

*(volatile int *)NULL = 0;
while (always_true_at_runtime)

The intrinsic gives us a way of reifying the side effect in a semantically obvious way.

In the same way that calls which are not nounwind have an implicit abnormal edge heading out of the function, this intrinsic basically makes it possible to add abnormal edges to the CFG without having to retrofit LLVM to really have abnormal edges.

I think that we should move forward with this approach (as may be obvious given that I’ve okay’d the patch). It’s a lightweight solution, at least on LLVM’s side of things, and does not prevent other solutions later. This is a valid concern, however, I expect that most programs from higher-level languages will have well-structured loops, and it will be straightforward to emit the intrinsics. The problem is that it is not a function-level property, it is a per-loop property. This is even true in C. In C, we would need to mark loops that have source-level-constant controlling conditions, and only those loops, and allowed to be infinite. And, so, maybe we could use loop-level metadata, but that seems hard to place/preserve for unstructured loops (and, arguably, that’s more important in C/C++ than in other languages). -Hal

Personally, I don’t like the side effect intrinsic. It will pollute all the IR generated by non-C frontends. What most of these frontends really want is just a switch to disable a targeted set of optimizations.

One thing I like about the function attribute idea is that it’s conservatively correct to discard it when doing cross-language inlining. It just becomes something that C-family frontends need to remember to add to enable their special-case language rules, rather than something that non-C languages need to think about. Similar to the ‘access’, builtin vs nonbuiltin discussion happening in parallel, the attribute enables the optimization, rather than inhibiting it.

Understood. I also don’t like the fact that it will clutter the IR in many cases. As I said below, a function attribute is insufficient. It needs to be something we can mark per loop. This is needed to correctly model C. The sideeffect intrinsic is the best proposal I’ve seen so far. -Hal

Maybe we should do both? If the intrinsic is a special case, that seems
fine. It's cheap.

Okay. -Hal

As I understand it, part of the function attribute proposal is to change
the default semantics of LLVM IR to have defined behavior on infinite
loops, and then add an attribute opting into potential-UB. So if we do
that, then the role of @llvm.sideeffect becomes a little subtle -- it'd be
a way for a frontend for a language like C to opt into potential-UB for a
function, but then opt out for individual loops in that function. Is that
what you're proposing? If so, I'm ok taking that route.


That’s my understanding of the proposal. -Hal