RFC: Add "call unwindabort" to LLVM IR

This is somewhat of a continuation of some previous discussions regarding the inefficiency of Clang/LLVM’s current mechanism for implementing C++ “noexcept”:
Feb 2017: [llvm-dev] help me understand how nounwind attribute on functions works?
Dec 2020: [llvm-dev] Catching exceptions while unwinding through -fno-exceptions code

Problem Statement

Currently, adding “noexcept” to a C++ function is relatively cheap (binary-size and performance of compiled-code) in GCC, but is not in Clang. I’d like to fix this – a function annotated “noexcept” should in no case generate worse code than a function not so marked.

This is only feasible to fix because the behavior required of noexcept is quite relaxed compared to a normal “catch” block. In particular, the runtime is not required to unwind the stack and call destructors prior to calling std::terminate – yet may do so if it wishes.

E.g. in this example program, ~Foo is allowed to be called 0, 1, or 2 times, depending on your compiler/runtime’s implementation, or even on whether functions are inlined.

  #include <stdio.h>
  struct Foo { ~Foo() { puts("~Foo"); } };
  static void a() { Foo f; throw 1; }
  void b() noexcept { Foo g; a(); }
  int main() { b(); }

However, Clang effectively implements “noexcept” as if it were a “catch (...)” surrounding the body of the function, and does not take advantage of the flexibility. (The catch-like behavior was required by C++ for “throw()” prior to C++17, but when noexcept was introduced in C++11, it intentionally did not have that requirement.)

I already committed ⚙ D113620 Skip exception cleanups when the innermost scope is EHTerminateScope. to stop emitting EH cleanups when the current scope is a noexcept termination. This was an easy change, and reduces the amount of landingpad code generated along the path to termination, but doesn’t fully solve the problem.

Background

In the underlying EH metadata for the Itanium-ABI GCC LSDA format (used by e.g.__gxx_personality_v0), every function is associated with a call-site map. This map associates a range of instruction-pointers to the address of a landingpad, and an action table. The landingpad IR instruction is nearly a direct encoding of that action table: whether the address should be invoked during unwind as a “cleanup”, and a set of types for which this block should be the catch handler target.

However, there’s one further sort of action supported in the underlying format, which LLVM doesn’t yet expose in IR: termination. Termination is represented by an instruction-pointer being omitted from the callsite table, and indicates to the personality that std::terminate should be called. This is how GCC implements noexcept today, and this is what I propose to add to LLVM IR.

There is a complication with this representation, however: given a noexcept function with a try/catch that catches some types, you still need to terminate for any others. But, you cannot omit the call from the callsite table – you need a callsite mapping to specify how to handle the types you do intend to catch. E.g.:

void fn() noexcept {
  try { a(); } catch (int) {}
}

For this case, GCC emits an action table which specifies that it catches int, and is marked as having a cleanup (in order to handle any other type passing through). The landingpad code then manually calls std::terminate for non-matching types. This is not ideal, since you don’t get the benefit of not unwinding the stack before terminating mentioned above, but is acceptable. Per comment 6 in above GCC bug, we could consider addressing that in the future.

Note that there’s also an additional side-benefit for debuggability, coincidentally requested in https://llvm.org/PR53062, after I’d already started working on this. By distinguishing “catch” from “terminate”, it enables the unwinder to terminate before unwinding the stack, similarly to what happens if you throw an exception without any matching catch in scope. That allows a diagnostic backtrace to pinpoint the error location. And, in fact, this proposal will cause that to just work with libcxxabi’s unwinder. Unfortunately, libstdc++'s unwinder actually defers termination until phase2 of the unwind, so the stack will be partially unwound if there are any intervening cleanups. That problem also affects GCC-compiled programs, see https://gcc.gnu.org/PR55918.

LLVM IR Proposal

I propose to introduce a new IR keyword for call instructions: “unwindabort”.

We will likewise need to add “unwindabort” to other unwinding instructions: “resume” and “catchswitch”. (But possibly not “cleanupret”, see below.)

Examples:

  resume unwindabort { i8*, i32 } %exn
  catchswitch within %parent [label %handler1] unwindabort

You might ask: doesn’t resume unconditionally unwind, and therefore doesn’t resume unwindabort unconditionally abort? Yes, that’s correct. It will. An alternative would be to convert resume to call @__cxa_begin_catch; call @_ZSt9terminatev. However, synthesizing such a call in the backend is ugly (and, has been a sticking-point in some previous conversations).

Thus, this proposal uses resume – not just at the IR level, but all the way through lowering – resume unwindabort is ultimately lowered to “call unwindabort @_Unwind_Resume”. This lets the personality function handle termination the same way it otherwise would.

Name/Syntax Bikeshedding:

Some rationale for some of the somewhat-arbitrary choices I’ve made:

  • I have made “unwindabort” an explicit part of the syntax for call, rather than an attribute. This seems appropriate, given that it needs to also be added to other non-call instructions, and that it is not usable on function definitions/declarations. It would be a viable alternative to use a function-attribute – but, only for call, not on resume or catchswitch.
  • I placed the token “unwindabort” token after “call” and before fast-math-flags. This seems reasonable, since that’s where all the other non-attribute syntax goes on the call instruction (e.g. “tail call unwindabort ninf preserve_allcc zeroext addrspace(1) i8* @foo(...) #3”.
  • I placed the token at the end for catchswitch, because that’s where the other “unwind to” phrases went.
  • There are other possible names instead of “unwindabort”, e.g. abortonunwind, terminateonunwind, stopunwind, endunwind. I didn’t like those as much.

Rejected Alternative: Pattern Matching

In previous discussions, it has been proposed that we could, in the backend, pattern-match a landing-pad which only calls std::terminate, and instead of generating code, generate the correct LSDA metadata.

However, today, we don’t actually have enough information in IR to be able to fully implement the desired semantics with pattern-matching. When we’re inlining a function with EH cleanups into a noexcept function, we are allowed to (and want to) drop the cleanups from the inlined body.

For this to be workable, we’d need to be able to effectively distinguish (at the IR level)
void foo() noexcept { xxx(); }
from
void foo() { try { xxx(); } catch (...) { __clang_call_terminate(); }

In the former case, we can drop the exceptional cleanups within xxx() when inlining xxx into foo. In the latter case, we must not. However, with today’s IR, these end up looking effectively equivalent.

Thus, pattern matching the existing IR isn’t a complete answer. We want something new in IR to indicate the allowed difference in semantics.

Rejected Alternative: Add unwindabort to invoke

Instead of extending “call”, we could extend invoke – e.g. “invoke void @x to label X unwindabort” – replacing “unwind label”. I don’t prefer this, however – I don’t think it makes anything simpler. The purpose of “invoke” is to describe the possibility of exceptional local flow control, in order to ensure that the proper local variables are accessible (and remain live!) in the landing-pad. That is unnecessary for unwindabort. All the handling for “call” is already aware that calls can either return normally, or unwind out of the function. The introduction of “unwindabort” doesn’t fundamentally change that – it just adjusts the action taken upon unwinding. Introducing this to “invoke”, however, would mean teaching everywhere to deal with a special kind of invoke which doesn’t have an exceptional destination block.

This will be much more complicated, with no gain I can see.

Rejected Alternative: Add unwindabort to landingpad (but not call)

If we did consider implementing unwindabort on landingpad (e.g. as per the “future extension” section below), it raises the question of whether we could simply use that instead of call unwindabort. That is:

   invoke void @a to label %invoke.cont unwind label %lpad
  lpad:
   landingpad { i8*, i32 } unwindabort
   unreachable

While that should work, I think we do actually want “call unwindabort” instead.

  1. “Call” is more amenable to optimizations than “invoke”, as mentioned before.
  2. Having this as the canonical form is weird, since the landingpad is unreachable.
  3. If we wanted to support this without the ABI extension, we’d only be able to actually support an “unwindabort” when it’s by itself without any other landingpad clauses. That’s weird.

Clang-side

As the support for this feature depends on both exception-handling style, and personality function, Clang will keep the ability to generate its existing explicit-catch code, and use call unwindabort only when it’s known to work.

I have verified that it works for at least the _gxx_personality_v0, _gxx_personality_sj0, and __objc_personality_v0 personalities in the dwarf EH scheme. I’m pretty sure this can work for all of the “popular” ones (and plan to verify before enabling for those), but some of the more oddball ones (e.g. Non-apple ObjC, Windows non-C++ SEH) probably cannot support it, and will use the existing codegen indefinitely.

Other EH schemes:

SJLJ EH:

The setjmp/longjmp EH mechanism (with _gxx_personality_sj0 personality) uses a very similar – but not quite identical – table format to the normal Itanium unwind info. However, the LSDA callsite mapping doesn’t embed IP ranges. Instead, it contains a mapping from “callsite index” to action. The proper callsite index is stored to a special memory location before each call, and retrieved by the personality function. Landing-pads are indexed starting with 1: the callsite-indices 0 and -1 indicate special behavior: 0 means “call std::terminate”, and -1 means “propagate to caller”. LLVM currently doesn’t use 0. The unwindabort feature maps precisely to that. This is easy.

Windows EH:

The Windows EH handling in LLVM IR is significantly different than Itanium EH handling – instead of “landingpad” and “resume” IR, it uses a set of “funclet” IR instructions. The proposal will also work there, with some slight additional work.

Two of those additional instructions can unwind to the caller frame: catchswitch and cleanupret. Each of those could have the “unwindabort” keyword added.

However, adding “cleanupret unwindabort” is not really necessary, because the entire cleanup may simply be removed instead. With the funclet IR, we always know that a cleanuppad is a cleanup, and can always associate it with its cleanupret. Thus, anytime we’d want to create a “cleanupret unwindabort” instruction, instead we can propagate the unwindabort into everything unwinding into the cleanuppad, then deleting the cleanuppad. (Although, if it turns out to be convenient to allow for a temporary “cleanupret unwindabort” to be created, so that such propagation can be deferred to a separate pass, we could do that too).

Catchswitch has an “unwind” edge, which indicates where to continue unwinding to, if none of the catch clauses match. This unwind-edge is also required to be the same as the unwind-edge used by any other invoke within the body of a catchswitch, unless it’s creating a new nested try/catch block. (There can be only a single “upwards” unwind target.) We do need to add “unwindabort” there.

Sidenote: the cleanuppad has some interesting semantics. In Itanium unwinding, you are forbidden to unwind out of a cleanup – any nested throws must be caught and handled. However, the code must ensure that by itself (today in Clang: via invoke to a catch-all landingpad that calls terminate. After these changes: via marking any such calls unwindabort). However, in Windows EH, that happens automatically: all attempts to unwind out of a cleanuppad other than via “cleanupret” implicitly acts as “unwindabort”!

At the low level:
MS’s _CxxFrameHandler3 personality function reads a flag bit in the per-function metadata to determine that a function is nounwind. If the bit is set, any attempt to unwind past the frame will instead terminate. This is not an exact match for the proposed semantics, as it’s a per-function flag, rather than per-callsite. Unfortunately, per-function is insufficient in the face of inlining a noexcept function into a not-noexcept function. This is true even for MSVC. What they do – and what we can also do – is to use the function flag only when no callsite can unwind to caller. If anything might unwind to the caller, the calls which should terminate are assigned to a state entry in the cleanup map which points directly to the external symbol “__std_terminate” as the cleanup handler (as if that was a cleanuppad funclet, rather than an external function!).

Frames with the Structured Exception Handling (SEH) personality function, or compilations with SEH-interop enabled potentially pose a set of additional issues. Clang does not handle SEH-interop mode well today (e.g. it completely ignores noexcept!). I plan to defer worrying about either of these, falling back to the existing handling.

Webassembly:

I haven’t looked into it much. It looks like it’s significantly different than anything else – including windows – and it currently has 3 different unwinding schemes (enable-emscripten-cxx-exceptions, enable-emscripten-sjlj, and wasm-enable-eh). Needs investigation, but should be fine with the fallback in the meantime.

Other DWARF-style EH personalities:

There are some personalities used for C code (e.g. __gcc_personality_v0), which currently don’t support anything except cleanups. In these, “landingpad … catch” doesn’t work at all. And “call unwindabort” will also not work: the personality interprets a missing callsite as “continue unwind”, rather than “terminate”. As these routines are only used to implement attribute((cleanup)) for C (not C++) code which enables -fexceptions, this is not really an issue. It would only matter if we add a noexcept mechanism for C code in the future.

Further ideas

Potential future extension: Add a new “unwindabort” landingpad clause

As discussed previously, in the Itanium ABI, we cannot currently represent “catch some types, otherwise terminate” in the LSDA metadata. It seems worthwhile considering how that might show up in IR if the underlying capability is added to the ABI.

I’ll just say, however, that it’s not really clear to me that actually making such an ABI change is worth the churn and compatibility issues. It would allow for better stacktraces in this edge-case, but there isn’t really potential for code-size savings, because the cleanups must be emitted for the “real” catch clauses in any case. Thus, this entire section is something I do not currently expect will actually be implemented.

However, if such an ABI extension were to occur, here’s how I propose to utilize it.

Add “unwindabort” as a fourth kind of clause to “landingpad” (beside “cleanup”, “catch”, and “filter”). This would lower to the new kind of action in the LSDA, and would indicate that if none of the previous catch or filter clauses caught the exception, the program should terminate in the handler-search phase, without beginning to unwind – and neither continuing to search upwards, nor transferring control to the landingpad.

Now the previous example

  void fn() noexcept {
    try { a(); } catch (int) {}
  }

can be turned into:

   invoke void @a to label %invoke.cont unwind label %lpad
 lpad:
   landingpad { i8*, i32 } catch i8* @_ZTIi unwindabort
   ...check type matches/handle int catch...

Sharing empty LSDA across functions

After this proposal, many functions could have unique copies of the exact same LSDA:

GCC_except_table0:
.Lexception0:
        .byte   255                             # @LPStart Encoding = omit
        .byte   255                             # @TType Encoding = omit
        .byte   1                               # Call site Encoding = uleb128
        .uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
.Lcst_end0:
        .p2align        2

It might be useful to emit this “empty” table only a single time, rather than once per function.

Improving mixing of -fno-exceptions and -fexceptions.

Once we have a mechanism which makes C++ “noexcept” cheap, it would be interesting to explore a mode where disabling exceptions means “unwinding through no-exceptions frame is guaranteed to abort”, rather than “undefined behavior if unwinding through a no-exceptions frame”. You might call it -fno-exceptions=noexcept, or something along those lines, as the effect would be to cause every decl to be marked noexcept. This could allow the mixing of exceptions/no-exceptions modes within a single binary to be significantly safer – potentially without discarding the performance/codesize advantages of the existing -fno-exceptions mode. If we actually implement it by adding noexcept to the Decl, it also may be feasible to allow a #pragma to enable/disable exceptions within a TU – or, to import a module built with exceptions enabled and use it from a TU with exceptions disabled (and vice versa).

Nice writeup! This is definitely an area where we can take advantage of the liberal language in the C++ standard.

To make sure we’re on the same page, using an example program: Compiler Explorer this is what I think the proposal would do:

define dso_local void @_Z3foov() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
  %1 = alloca %struct.A, align 1
  %2 = alloca i8*, align 8
  %3 = alloca i32, align 4
// Replace with call
;  invoke void @_Z8canThrowv()
;          to label %4 unwind label %5
  call unwindabort void @_Z8canThrowv()

4:                                                ; preds = %0
  call void @_ZN1AD1Ev(%struct.A* noundef nonnull align 1 dereferenceable(1) %1) #4
  ret void

// Destructor call is completely removed, terminate call encoded as hole in call-site table
;5:                                                ; preds = %0
;  %6 = landingpad { i8*, i32 }
;          catch i8* null
;  %7 = extractvalue { i8*, i32 } %6, 0
;  store i8* %7, i8** %2, align 8
;  %8 = extractvalue { i8*, i32 } %6, 1
;  store i32 %8, i32* %3, align 4
;  call void @_ZN1AD1Ev(%struct.A* noundef nonnull align 1 dereferenceable(1) %1) #4
;  br label %9
;
;9:                                                ; preds = %5
;  %10 = load i8*, i8** %2, align 8
;  call void @__clang_call_terminate(i8* %10) #5
;  unreachable
}

I purposefully pulled from Clang-14 before your change in ⚙ D113620 Skip exception cleanups when the innermost scope is EHTerminateScope. since one of the major improvements unwindabort has is it persists through inlining.

call unwindabort is required so that we can track which call-site entries to omit but it’s not clear to me why this annotation is needed on resume and catchswitch. Are they just to handle this case:

void fn() noexcept {
  try { a(); } catch (int) {}
}

Where resume (for Itanium C++ ABI) and catchswitch (for MSVC ABI) allows termination regardless of clean-up state?

I’m wondering if you’ve started working on the implementation. Full-disclosure: I have an intern this summer and his planned first project was to pattern match the terminate case to be purely in LSDA to save the relocation + call instruction. I came up with this plan right around the time you sent up this RFC… but I only saw this now when @smeenai saw my plan and pointed me to this.

I think your solution solves this issue in a more effective way which would make doing the pattern matching obsolete. Given that, I’m happy to lend the resources if you want to cooperate on the work.

Finally, is there a specific motivation that prompted this improvement? I think the most measurable gain will be in final binary size with the performance impact being fairly minimal if measurable at all. This could also improve compile time by stripping away dead landing pads very early in the compilation pipeline.

1 Like

MSVC is also on _CxxFrameHandler4 now though LLVM doesn’t support it yet. That being said, the noexcept bit per function still exists and for scoped noexcept I actually implemented a similar mechanism where instead of encoding __std_terminate a more compact integer is generated to indicate that termination should be done by the runtime.

_CxxFrameHandler3 is likely never getting deprecated though so this isn’t a major concern.

As far as SEH I would just not worry about it. There’s no concept of noexcept in SEH and C++EH interop mode is defined by how C++EH is implemented on top of SEH inside windows. If specific behavior is desired it’ll have to be specifically implemented at which point the starting point doesn’t matter much.

Thanks, this seems like a good idea to me. I see resume unwindabort as being the key innovation that makes it all work. This wasn’t something I had ever considered, and I found the desire to avoid synthesizing calls to std::terminate in the backend to be pretty compelling.

I think the last time I thought hard about this problem, I (and perhaps others) had the attitude that LLVM IR is pure and good, and C++ exceptions are inefficient and nobody should use them. Somehow I convinced myself that C++ language designers should have designed an EH mechanism that makes LLVM’s life easy, which really approaches the problem backwards. C++ came first and is a standard, and LLVM IR should accordingly be designed to support a high quality C++ EH implementation. Your proposal represents a major step in that direction.