This is somewhat of a continuation of some previous discussions regarding the inefficiency of Clang/LLVM’s current mechanism for implementing C++ “noexcept”:
Feb 2017: [llvm-dev] help me understand how nounwind attribute on functions works?
Dec 2020: [llvm-dev] Catching exceptions while unwinding through -fno-exceptions code
Problem Statement
Currently, adding “noexcept” to a C++ function is relatively cheap (binary-size and performance of compiled-code) in GCC, but is not in Clang. I’d like to fix this – a function annotated “noexcept” should in no case generate worse code than a function not so marked.
This is only feasible to fix because the behavior required of noexcept is quite relaxed compared to a normal “catch” block. In particular, the runtime is not required to unwind the stack and call destructors prior to calling std::terminate – yet may do so if it wishes.
E.g. in this example program, ~Foo
is allowed to be called 0, 1, or 2 times, depending on your compiler/runtime’s implementation, or even on whether functions are inlined.
#include <stdio.h>
struct Foo { ~Foo() { puts("~Foo"); } };
static void a() { Foo f; throw 1; }
void b() noexcept { Foo g; a(); }
int main() { b(); }
However, Clang effectively implements “noexcept
” as if it were a “catch (...)
” surrounding the body of the function, and does not take advantage of the flexibility. (The catch-like behavior was required by C++ for “throw()
” prior to C++17, but when noexcept was introduced in C++11, it intentionally did not have that requirement.)
I already committed ⚙ D113620 Skip exception cleanups when the innermost scope is EHTerminateScope. to stop emitting EH cleanups when the current scope is a noexcept termination. This was an easy change, and reduces the amount of landingpad code generated along the path to termination, but doesn’t fully solve the problem.
Background
In the underlying EH metadata for the Itanium-ABI GCC LSDA format (used by e.g.__gxx_personality_v0
), every function is associated with a call-site map. This map associates a range of instruction-pointers to the address of a landingpad, and an action table. The landingpad
IR instruction is nearly a direct encoding of that action table: whether the address should be invoked during unwind as a “cleanup”, and a set of types for which this block should be the catch handler target.
However, there’s one further sort of action supported in the underlying format, which LLVM doesn’t yet expose in IR: termination. Termination is represented by an instruction-pointer being omitted from the callsite table, and indicates to the personality that std::terminate should be called. This is how GCC implements noexcept
today, and this is what I propose to add to LLVM IR.
There is a complication with this representation, however: given a noexcept function with a try/catch that catches some types, you still need to terminate for any others. But, you cannot omit the call from the callsite table – you need a callsite mapping to specify how to handle the types you do intend to catch. E.g.:
void fn() noexcept {
try { a(); } catch (int) {}
}
For this case, GCC emits an action table which specifies that it catches int, and is marked as having a cleanup (in order to handle any other type passing through). The landingpad code then manually calls std::terminate
for non-matching types. This is not ideal, since you don’t get the benefit of not unwinding the stack before terminating mentioned above, but is acceptable. Per comment 6 in above GCC bug, we could consider addressing that in the future.
Note that there’s also an additional side-benefit for debuggability, coincidentally requested in https://llvm.org/PR53062, after I’d already started working on this. By distinguishing “catch” from “terminate”, it enables the unwinder to terminate before unwinding the stack, similarly to what happens if you throw an exception without any matching catch in scope. That allows a diagnostic backtrace to pinpoint the error location. And, in fact, this proposal will cause that to just work with libcxxabi’s unwinder. Unfortunately, libstdc++'s unwinder actually defers termination until phase2 of the unwind, so the stack will be partially unwound if there are any intervening cleanups. That problem also affects GCC-compiled programs, see https://gcc.gnu.org/PR55918.
LLVM IR Proposal
I propose to introduce a new IR keyword for call instructions: “unwindabort”.
We will likewise need to add “unwindabort
” to other unwinding instructions: “resume
” and “catchswitch
”. (But possibly not “cleanupret
”, see below.)
Examples:
resume unwindabort { i8*, i32 } %exn
catchswitch within %parent [label %handler1] unwindabort
You might ask: doesn’t resume
unconditionally unwind, and therefore doesn’t resume unwindabort
unconditionally abort? Yes, that’s correct. It will. An alternative would be to convert resume
to call @__cxa_begin_catch; call @_ZSt9terminatev
. However, synthesizing such a call in the backend is ugly (and, has been a sticking-point in some previous conversations).
Thus, this proposal uses resume
– not just at the IR level, but all the way through lowering – resume unwindabort
is ultimately lowered to “call unwindabort @_Unwind_Resume
”. This lets the personality function handle termination the same way it otherwise would.
Name/Syntax Bikeshedding:
Some rationale for some of the somewhat-arbitrary choices I’ve made:
- I have made “unwindabort” an explicit part of the syntax for call, rather than an attribute. This seems appropriate, given that it needs to also be added to other non-call instructions, and that it is not usable on function definitions/declarations. It would be a viable alternative to use a function-attribute – but, only for call, not on resume or catchswitch.
- I placed the token “unwindabort” token after “call” and before fast-math-flags. This seems reasonable, since that’s where all the other non-attribute syntax goes on the call instruction (e.g. “
tail call unwindabort ninf preserve_allcc zeroext addrspace(1) i8* @foo(...) #3
”. - I placed the token at the end for catchswitch, because that’s where the other “unwind to” phrases went.
- There are other possible names instead of “unwindabort”, e.g. abortonunwind, terminateonunwind, stopunwind, endunwind. I didn’t like those as much.
Rejected Alternative: Pattern Matching
In previous discussions, it has been proposed that we could, in the backend, pattern-match a landing-pad which only calls std::terminate, and instead of generating code, generate the correct LSDA metadata.
However, today, we don’t actually have enough information in IR to be able to fully implement the desired semantics with pattern-matching. When we’re inlining a function with EH cleanups into a noexcept function, we are allowed to (and want to) drop the cleanups from the inlined body.
For this to be workable, we’d need to be able to effectively distinguish (at the IR level)
void foo() noexcept { xxx(); }
from
void foo() { try { xxx(); } catch (...) { __clang_call_terminate(); }
In the former case, we can drop the exceptional cleanups within xxx() when inlining xxx into foo. In the latter case, we must not. However, with today’s IR, these end up looking effectively equivalent.
Thus, pattern matching the existing IR isn’t a complete answer. We want something new in IR to indicate the allowed difference in semantics.
Rejected Alternative: Add unwindabort to invoke
Instead of extending “call
”, we could extend invoke – e.g. “invoke void @x to label X unwindabort
” – replacing “unwind label
”. I don’t prefer this, however – I don’t think it makes anything simpler. The purpose of “invoke
” is to describe the possibility of exceptional local flow control, in order to ensure that the proper local variables are accessible (and remain live!) in the landing-pad. That is unnecessary for unwindabort. All the handling for “call
” is already aware that calls can either return normally, or unwind out of the function. The introduction of “unwindabort
” doesn’t fundamentally change that – it just adjusts the action taken upon unwinding. Introducing this to “invoke
”, however, would mean teaching everywhere to deal with a special kind of invoke which doesn’t have an exceptional destination block.
This will be much more complicated, with no gain I can see.
Rejected Alternative: Add unwindabort to landingpad (but not call)
If we did consider implementing unwindabort
on landingpad (e.g. as per the “future extension” section below), it raises the question of whether we could simply use that instead of call unwindabort
. That is:
invoke void @a to label %invoke.cont unwind label %lpad
lpad:
landingpad { i8*, i32 } unwindabort
unreachable
While that should work, I think we do actually want “call unwindabort
” instead.
- “Call” is more amenable to optimizations than “invoke”, as mentioned before.
- Having this as the canonical form is weird, since the landingpad is unreachable.
- If we wanted to support this without the ABI extension, we’d only be able to actually support an “unwindabort” when it’s by itself without any other landingpad clauses. That’s weird.
Clang-side
As the support for this feature depends on both exception-handling style, and personality function, Clang will keep the ability to generate its existing explicit-catch code, and use call unwindabort
only when it’s known to work.
I have verified that it works for at least the _gxx_personality_v0
, _gxx_personality_sj0
, and __objc_personality_v0
personalities in the dwarf EH scheme. I’m pretty sure this can work for all of the “popular” ones (and plan to verify before enabling for those), but some of the more oddball ones (e.g. Non-apple ObjC, Windows non-C++ SEH) probably cannot support it, and will use the existing codegen indefinitely.
Other EH schemes:
SJLJ EH:
The setjmp/longjmp EH mechanism (with _gxx_personality_sj0 personality) uses a very similar – but not quite identical – table format to the normal Itanium unwind info. However, the LSDA callsite mapping doesn’t embed IP ranges. Instead, it contains a mapping from “callsite index” to action. The proper callsite index is stored to a special memory location before each call, and retrieved by the personality function. Landing-pads are indexed starting with 1: the callsite-indices 0 and -1 indicate special behavior: 0 means “call std::terminate”, and -1 means “propagate to caller”. LLVM currently doesn’t use 0. The unwindabort feature maps precisely to that. This is easy.
Windows EH:
The Windows EH handling in LLVM IR is significantly different than Itanium EH handling – instead of “landingpad
” and “resume
” IR, it uses a set of “funclet” IR instructions. The proposal will also work there, with some slight additional work.
Two of those additional instructions can unwind to the caller frame: catchswitch
and cleanupret
. Each of those could have the “unwindabort
” keyword added.
However, adding “cleanupret unwindabort
” is not really necessary, because the entire cleanup may simply be removed instead. With the funclet IR, we always know that a cleanuppad
is a cleanup, and can always associate it with its cleanupret
. Thus, anytime we’d want to create a “cleanupret unwindabort
” instruction, instead we can propagate the unwindabort
into everything unwinding into the cleanuppad, then deleting the cleanuppad. (Although, if it turns out to be convenient to allow for a temporary “cleanupret unwindabort
” to be created, so that such propagation can be deferred to a separate pass, we could do that too).
Catchswitch
has an “unwind” edge, which indicates where to continue unwinding to, if none of the catch clauses match. This unwind-edge is also required to be the same as the unwind-edge used by any other invoke within the body of a catchswitch, unless it’s creating a new nested try/catch block. (There can be only a single “upwards” unwind target.) We do need to add “unwindabort” there.
Sidenote: the cleanuppad has some interesting semantics. In Itanium unwinding, you are forbidden to unwind out of a cleanup – any nested throws must be caught and handled. However, the code must ensure that by itself (today in Clang: via invoke to a catch-all landingpad that calls terminate. After these changes: via marking any such calls unwindabort). However, in Windows EH, that happens automatically: all attempts to unwind out of a cleanuppad other than via “cleanupret
” implicitly acts as “unwindabort
”!
At the low level:
MS’s _CxxFrameHandler3 personality function reads a flag bit in the per-function metadata to determine that a function is nounwind. If the bit is set, any attempt to unwind past the frame will instead terminate. This is not an exact match for the proposed semantics, as it’s a per-function flag, rather than per-callsite. Unfortunately, per-function is insufficient in the face of inlining a noexcept function into a not-noexcept function. This is true even for MSVC. What they do – and what we can also do – is to use the function flag only when no callsite can unwind to caller. If anything might unwind to the caller, the calls which should terminate are assigned to a state entry in the cleanup map which points directly to the external symbol “__std_terminate” as the cleanup handler (as if that was a cleanuppad funclet, rather than an external function!).
Frames with the Structured Exception Handling (SEH) personality function, or compilations with SEH-interop enabled potentially pose a set of additional issues. Clang does not handle SEH-interop mode well today (e.g. it completely ignores noexcept!). I plan to defer worrying about either of these, falling back to the existing handling.
Webassembly:
I haven’t looked into it much. It looks like it’s significantly different than anything else – including windows – and it currently has 3 different unwinding schemes (enable-emscripten-cxx-exceptions, enable-emscripten-sjlj, and wasm-enable-eh). Needs investigation, but should be fine with the fallback in the meantime.
Other DWARF-style EH personalities:
There are some personalities used for C code (e.g. __gcc_personality_v0), which currently don’t support anything except cleanups. In these, “landingpad … catch” doesn’t work at all. And “call unwindabort” will also not work: the personality interprets a missing callsite as “continue unwind”, rather than “terminate”. As these routines are only used to implement attribute((cleanup)) for C (not C++) code which enables -fexceptions, this is not really an issue. It would only matter if we add a noexcept
mechanism for C code in the future.
Further ideas
Potential future extension: Add a new “unwindabort” landingpad clause
As discussed previously, in the Itanium ABI, we cannot currently represent “catch some types, otherwise terminate” in the LSDA metadata. It seems worthwhile considering how that might show up in IR if the underlying capability is added to the ABI.
I’ll just say, however, that it’s not really clear to me that actually making such an ABI change is worth the churn and compatibility issues. It would allow for better stacktraces in this edge-case, but there isn’t really potential for code-size savings, because the cleanups must be emitted for the “real” catch clauses in any case. Thus, this entire section is something I do not currently expect will actually be implemented.
However, if such an ABI extension were to occur, here’s how I propose to utilize it.
Add “unwindabort
” as a fourth kind of clause to “landingpad
” (beside “cleanup
”, “catch
”, and “filter
”). This would lower to the new kind of action in the LSDA, and would indicate that if none of the previous catch or filter clauses caught the exception, the program should terminate in the handler-search phase, without beginning to unwind – and neither continuing to search upwards, nor transferring control to the landingpad.
Now the previous example
void fn() noexcept {
try { a(); } catch (int) {}
}
can be turned into:
invoke void @a to label %invoke.cont unwind label %lpad
lpad:
landingpad { i8*, i32 } catch i8* @_ZTIi unwindabort
...check type matches/handle int catch...
Sharing empty LSDA across functions
After this proposal, many functions could have unique copies of the exact same LSDA:
GCC_except_table0:
.Lexception0:
.byte 255 # @LPStart Encoding = omit
.byte 255 # @TType Encoding = omit
.byte 1 # Call site Encoding = uleb128
.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
.Lcst_end0:
.p2align 2
It might be useful to emit this “empty” table only a single time, rather than once per function.
Improving mixing of -fno-exceptions and -fexceptions.
Once we have a mechanism which makes C++ “noexcept” cheap, it would be interesting to explore a mode where disabling exceptions means “unwinding through no-exceptions frame is guaranteed to abort”, rather than “undefined behavior if unwinding through a no-exceptions frame”. You might call it -fno-exceptions=noexcept
, or something along those lines, as the effect would be to cause every decl to be marked noexcept. This could allow the mixing of exceptions/no-exceptions modes within a single binary to be significantly safer – potentially without discarding the performance/codesize advantages of the existing -fno-exceptions mode. If we actually implement it by adding noexcept to the Decl, it also may be feasible to allow a #pragma to enable/disable exceptions within a TU – or, to import a module built with exceptions enabled and use it from a TU with exceptions disabled (and vice versa).