Currently the optimization can easily lead to situations in which a function that has an infinite loop without side effects can have its whole body removed. This can lead to function execution falling into a subsequent function. Developers run into the problem in the wild and in general find this behavior hard to debug, surprising and worry about potential security issues around such an aggressive optimization.
I am aware that currently in C++ this is undefined behavior and unlike C11 we don’t have an exception for constant expressions that evaluate to true. I am also aware that once a program invokes undefined behavior we can have no expectations. I am also aware of N1528: Why undefined behavior for infinite loops? and the general reasons why we want to optimize around the assumptions of forward progress.
My question is not about whether this should be UB or whether we should optimize around UB. My question is more narrow. Can we maintain the benefit of optimizing around assuming forward progress, yet adjust the implementation to perhaps detect the trivial cases and produce a diagnostic or maybe can we trap instead? I am open to any alternative that improves the developer experience and hopefully help our users catch and fix bugs more easily.
The “empty function” thing isn’t really specific to infinite loops; you can trigger it with other forms of undefined behavior. Adding a trap instruction to completely empty functions is a really minor size cost; such functions should be very rare. (I don’t think anyone is really against it, just nobody has proposed a patch.)
Adding a trap instruction after calls which the compiler thinks don’t return (in IR, a “call” followed by an “unreachable”) would be a more significant size cost; calls like that pretty common in some kinds of code. For example, throwing an exception produces code like this. We’d want numbers before we turn this on by default.
For infinite loops specifically, it probably makes sense to adopt the C11 behavior in C++; it would be nice if the C++ committee could officially bless that approach, though.
Compile-time diagnostics for undefined behavior are tricky. It’s hard to detect that a loop is non-trivially infinite from the clang AST. And generally, the feedback we get is that people don’t want diagnostics that depend on compiler optimizations. Such warnings show up inconsistently (so builds break depending on the target/optimization level/etc.). And it’s easy to end up with unfixable false positives (e.g. if some code is unreachable due to jump threading).
The “empty function” thing isn’t really specific to infinite loops; you can trigger it with other forms of undefined behavior. Adding a trap instruction to completely empty functions is a really minor size cost; such functions should be very rare. (I don’t think anyone is really against it, just nobody has proposed a patch.)
I think adding a trap to completely empty functions sounds like a good compromise that would address many of the concerns and I would love to see someone put up a patch to do that. This is not my wheelhouse but if you think would be a simple PR I would be willing to take a try at it if someone was willing to guide me.
Adding a trap instruction after calls which the compiler thinks don’t return (in IR, a “call” followed by an “unreachable”) would be a more significant size cost; calls like that pretty common in some kinds of code. For example, throwing an exception produces code like this. We’d want numbers before we turn this on by default.
It would be nice if someone would attempt to do this work but it sounds non-trivial.
For infinite loops specifically, it probably makes sense to adopt the C11 behavior in C++; it would be nice if the C++ committee could officially bless that approach, though.
@jfb is writing a paper to attempt to address this in C++ and hopefully it will be well received.
Compile-time diagnostics for undefined behavior are tricky. It’s hard to detect that a loop is non-trivially infinite from the clang AST. And generally, the feedback we get is that people don’t want diagnostics that depend on compiler optimizations. Such warnings show up inconsistently (so builds break depending on the target/optimization level/etc.). And it’s easy to end up with unfixable false positives (e.g. if some code is unreachable due to jump threading).
I did look at it a bit - there’s a couple of related features already in - for MSVC there’s something that I think puts a nop after empty functions to address some limitation there, and there’s also a “trap on unreachable” that is enabled by default on MacOS and PS targets, I think - which sounds worse than it is - most unreachables get eliminated by optimizations anyway, so it’s not a huge cost.
Maybe trap-on-unreachable is what we should just enable by default/make the only behavior - if someone wants to measure/check it’s not too bad, which I don’t think it is.
When I was looking at this trying to address the zero-length-functions-break-DWARFv4 I think I got stuck in perfect-being-the-enemy-of-the-good - wanting to only put trap at the end of functions that don’t already end with some other unconditional branch (maybe even a call-to-noreturn? Not sure) and that was complicated because some targets (AMDGPU?) couldn’t readily emit a trap easily in their backends (something about needing certain properties on the function)… - I suspect throwing all that out and just doing “trap on unreachable” is probably fine.
We use traps on AMDGPU all the time for debugging (@dblaikie).
OpenMP offload doesn’t want to have a trap after UB / before unreachable because it generally pessimizes optimizations. That said, as long as we have configuration methods (as @nikic mentions) we should be fine.
PS needed a nop/trap after noreturn calls so that if such a call was the last thing in a function, with no padding before the next function, the “return address” would not point to the next function. (I argued that we should fix the stack walkers to subtract 1 from the return address, but it was deemed infeasible to find and fix them all. The size cost in this specific case is small, being an average of (1/function-alignment) bytes per function with this situation.) Similar reasoning applies to noreturn calls in the middle of a function, if you’re looking to do addr2line kinds of operations. A completely empty function is a different cause but the same effect.
Putting an unreachable after a provably infinite loop would be yet one more cause, but I’d think the mechanisms are all in place to make it work.
Adding a wrinkle to this is when the call to the never-returning function is converted into a direct branch.
Here, the return address seen by the never-return-calling-function no longer points back at this function, but is already pointing at the caller of this function.
That elides the last entry from the stack trace; but at least it doesn’t cause the trace to point to an entirely wrong function, which is what the other cases cause.