RFC: Add "call unwindabort" to LLVM IR

This is somewhat of a continuation of some previous discussions regarding the inefficiency of Clang/LLVM’s current mechanism for implementing C++ “noexcept”:
Feb 2017: [llvm-dev] help me understand how nounwind attribute on functions works?
Dec 2020: [llvm-dev] Catching exceptions while unwinding through -fno-exceptions code

Problem Statement

Currently, adding “noexcept” to a C++ function is relatively cheap (binary-size and performance of compiled-code) in GCC, but is not in Clang. I’d like to fix this – a function annotated “noexcept” should in no case generate worse code than a function not so marked.

This is only feasible to fix because the behavior required of noexcept is quite relaxed compared to a normal “catch” block. In particular, the runtime is not required to unwind the stack and call destructors prior to calling std::terminate – yet may do so if it wishes.

E.g. in this example program, ~Foo is allowed to be called 0, 1, or 2 times, depending on your compiler/runtime’s implementation, or even on whether functions are inlined.

  #include <stdio.h>
  struct Foo { ~Foo() { puts("~Foo"); } };
  static void a() { Foo f; throw 1; }
  void b() noexcept { Foo g; a(); }
  int main() { b(); }

However, Clang effectively implements “noexcept” as if it were a “catch (...)” surrounding the body of the function, and does not take advantage of the flexibility. (The catch-like behavior was required by C++ for “throw()” prior to C++17, but when noexcept was introduced in C++11, it intentionally did not have that requirement.)

I already committed ⚙ D113620 Skip exception cleanups when the innermost scope is EHTerminateScope. to stop emitting EH cleanups when the current scope is a noexcept termination. This was an easy change, and reduces the amount of landingpad code generated along the path to termination, but doesn’t fully solve the problem.

Background

In the underlying EH metadata for the Itanium-ABI GCC LSDA format (used by e.g.__gxx_personality_v0), every function is associated with a call-site map. This map associates a range of instruction-pointers to the address of a landingpad, and an action table. The landingpad IR instruction is nearly a direct encoding of that action table: whether the address should be invoked during unwind as a “cleanup”, and a set of types for which this block should be the catch handler target.

However, there’s one further sort of action supported in the underlying format, which LLVM doesn’t yet expose in IR: termination. Termination is represented by an instruction-pointer being omitted from the callsite table, and indicates to the personality that std::terminate should be called. This is how GCC implements noexcept today, and this is what I propose to add to LLVM IR.

There is a complication with this representation, however: given a noexcept function with a try/catch that catches some types, you still need to terminate for any others. But, you cannot omit the call from the callsite table – you need a callsite mapping to specify how to handle the types you do intend to catch. E.g.:

void fn() noexcept {
  try { a(); } catch (int) {}
}

For this case, GCC emits an action table which specifies that it catches int, and is marked as having a cleanup (in order to handle any other type passing through). The landingpad code then manually calls std::terminate for non-matching types. This is not ideal, since you don’t get the benefit of not unwinding the stack before terminating mentioned above, but is acceptable. Per comment 6 in above GCC bug, we could consider addressing that in the future.

Note that there’s also an additional side-benefit for debuggability, coincidentally requested in https://llvm.org/PR53062, after I’d already started working on this. By distinguishing “catch” from “terminate”, it enables the unwinder to terminate before unwinding the stack, similarly to what happens if you throw an exception without any matching catch in scope. That allows a diagnostic backtrace to pinpoint the error location. And, in fact, this proposal will cause that to just work with libcxxabi’s unwinder. Unfortunately, libstdc++'s unwinder actually defers termination until phase2 of the unwind, so the stack will be partially unwound if there are any intervening cleanups. That problem also affects GCC-compiled programs, see https://gcc.gnu.org/PR55918.

LLVM IR Proposal

I propose to introduce a new IR keyword for call instructions: “unwindabort”.

We will likewise need to add “unwindabort” to other unwinding instructions: “resume” and “catchswitch”. (But possibly not “cleanupret”, see below.)

Examples:

  resume unwindabort { i8*, i32 } %exn
  catchswitch within %parent [label %handler1] unwindabort

You might ask: doesn’t resume unconditionally unwind, and therefore doesn’t resume unwindabort unconditionally abort? Yes, that’s correct. It will. An alternative would be to convert resume to call @__cxa_begin_catch; call @_ZSt9terminatev. However, synthesizing such a call in the backend is ugly (and, has been a sticking-point in some previous conversations).

Thus, this proposal uses resume – not just at the IR level, but all the way through lowering – resume unwindabort is ultimately lowered to “call unwindabort @_Unwind_Resume”. This lets the personality function handle termination the same way it otherwise would.

Name/Syntax Bikeshedding:

Some rationale for some of the somewhat-arbitrary choices I’ve made:

  • I have made “unwindabort” an explicit part of the syntax for call, rather than an attribute. This seems appropriate, given that it needs to also be added to other non-call instructions, and that it is not usable on function definitions/declarations. It would be a viable alternative to use a function-attribute – but, only for call, not on resume or catchswitch.
  • I placed the token “unwindabort” token after “call” and before fast-math-flags. This seems reasonable, since that’s where all the other non-attribute syntax goes on the call instruction (e.g. “tail call unwindabort ninf preserve_allcc zeroext addrspace(1) i8* @foo(...) #3”.
  • I placed the token at the end for catchswitch, because that’s where the other “unwind to” phrases went.
  • There are other possible names instead of “unwindabort”, e.g. abortonunwind, terminateonunwind, stopunwind, endunwind. I didn’t like those as much.

Rejected Alternative: Pattern Matching

In previous discussions, it has been proposed that we could, in the backend, pattern-match a landing-pad which only calls std::terminate, and instead of generating code, generate the correct LSDA metadata.

However, today, we don’t actually have enough information in IR to be able to fully implement the desired semantics with pattern-matching. When we’re inlining a function with EH cleanups into a noexcept function, we are allowed to (and want to) drop the cleanups from the inlined body.

For this to be workable, we’d need to be able to effectively distinguish (at the IR level)
void foo() noexcept { xxx(); }
from
void foo() { try { xxx(); } catch (...) { __clang_call_terminate(); }

In the former case, we can drop the exceptional cleanups within xxx() when inlining xxx into foo. In the latter case, we must not. However, with today’s IR, these end up looking effectively equivalent.

Thus, pattern matching the existing IR isn’t a complete answer. We want something new in IR to indicate the allowed difference in semantics.

Rejected Alternative: Add unwindabort to invoke

Instead of extending “call”, we could extend invoke – e.g. “invoke void @x to label X unwindabort” – replacing “unwind label”. I don’t prefer this, however – I don’t think it makes anything simpler. The purpose of “invoke” is to describe the possibility of exceptional local flow control, in order to ensure that the proper local variables are accessible (and remain live!) in the landing-pad. That is unnecessary for unwindabort. All the handling for “call” is already aware that calls can either return normally, or unwind out of the function. The introduction of “unwindabort” doesn’t fundamentally change that – it just adjusts the action taken upon unwinding. Introducing this to “invoke”, however, would mean teaching everywhere to deal with a special kind of invoke which doesn’t have an exceptional destination block.

This will be much more complicated, with no gain I can see.

Rejected Alternative: Add unwindabort to landingpad (but not call)

If we did consider implementing unwindabort on landingpad (e.g. as per the “future extension” section below), it raises the question of whether we could simply use that instead of call unwindabort. That is:

   invoke void @a to label %invoke.cont unwind label %lpad
  lpad:
   landingpad { i8*, i32 } unwindabort
   unreachable

While that should work, I think we do actually want “call unwindabort” instead.

  1. “Call” is more amenable to optimizations than “invoke”, as mentioned before.
  2. Having this as the canonical form is weird, since the landingpad is unreachable.
  3. If we wanted to support this without the ABI extension, we’d only be able to actually support an “unwindabort” when it’s by itself without any other landingpad clauses. That’s weird.

Clang-side

As the support for this feature depends on both exception-handling style, and personality function, Clang will keep the ability to generate its existing explicit-catch code, and use call unwindabort only when it’s known to work.

I have verified that it works for at least the _gxx_personality_v0, _gxx_personality_sj0, and __objc_personality_v0 personalities in the dwarf EH scheme. I’m pretty sure this can work for all of the “popular” ones (and plan to verify before enabling for those), but some of the more oddball ones (e.g. Non-apple ObjC, Windows non-C++ SEH) probably cannot support it, and will use the existing codegen indefinitely.

Other EH schemes:

SJLJ EH:

The setjmp/longjmp EH mechanism (with _gxx_personality_sj0 personality) uses a very similar – but not quite identical – table format to the normal Itanium unwind info. However, the LSDA callsite mapping doesn’t embed IP ranges. Instead, it contains a mapping from “callsite index” to action. The proper callsite index is stored to a special memory location before each call, and retrieved by the personality function. Landing-pads are indexed starting with 1: the callsite-indices 0 and -1 indicate special behavior: 0 means “call std::terminate”, and -1 means “propagate to caller”. LLVM currently doesn’t use 0. The unwindabort feature maps precisely to that. This is easy.

Windows EH:

The Windows EH handling in LLVM IR is significantly different than Itanium EH handling – instead of “landingpad” and “resume” IR, it uses a set of “funclet” IR instructions. The proposal will also work there, with some slight additional work.

Two of those additional instructions can unwind to the caller frame: catchswitch and cleanupret. Each of those could have the “unwindabort” keyword added.

However, adding “cleanupret unwindabort” is not really necessary, because the entire cleanup may simply be removed instead. With the funclet IR, we always know that a cleanuppad is a cleanup, and can always associate it with its cleanupret. Thus, anytime we’d want to create a “cleanupret unwindabort” instruction, instead we can propagate the unwindabort into everything unwinding into the cleanuppad, then deleting the cleanuppad. (Although, if it turns out to be convenient to allow for a temporary “cleanupret unwindabort” to be created, so that such propagation can be deferred to a separate pass, we could do that too).

Catchswitch has an “unwind” edge, which indicates where to continue unwinding to, if none of the catch clauses match. This unwind-edge is also required to be the same as the unwind-edge used by any other invoke within the body of a catchswitch, unless it’s creating a new nested try/catch block. (There can be only a single “upwards” unwind target.) We do need to add “unwindabort” there.

Sidenote: the cleanuppad has some interesting semantics. In Itanium unwinding, you are forbidden to unwind out of a cleanup – any nested throws must be caught and handled. However, the code must ensure that by itself (today in Clang: via invoke to a catch-all landingpad that calls terminate. After these changes: via marking any such calls unwindabort). However, in Windows EH, that happens automatically: all attempts to unwind out of a cleanuppad other than via “cleanupret” implicitly acts as “unwindabort”!

At the low level:
MS’s _CxxFrameHandler3 personality function reads a flag bit in the per-function metadata to determine that a function is nounwind. If the bit is set, any attempt to unwind past the frame will instead terminate. This is not an exact match for the proposed semantics, as it’s a per-function flag, rather than per-callsite. Unfortunately, per-function is insufficient in the face of inlining a noexcept function into a not-noexcept function. This is true even for MSVC. What they do – and what we can also do – is to use the function flag only when no callsite can unwind to caller. If anything might unwind to the caller, the calls which should terminate are assigned to a state entry in the cleanup map which points directly to the external symbol “__std_terminate” as the cleanup handler (as if that was a cleanuppad funclet, rather than an external function!).

Frames with the Structured Exception Handling (SEH) personality function, or compilations with SEH-interop enabled potentially pose a set of additional issues. Clang does not handle SEH-interop mode well today (e.g. it completely ignores noexcept!). I plan to defer worrying about either of these, falling back to the existing handling.

Webassembly:

I haven’t looked into it much. It looks like it’s significantly different than anything else – including windows – and it currently has 3 different unwinding schemes (enable-emscripten-cxx-exceptions, enable-emscripten-sjlj, and wasm-enable-eh). Needs investigation, but should be fine with the fallback in the meantime.

Other DWARF-style EH personalities:

There are some personalities used for C code (e.g. __gcc_personality_v0), which currently don’t support anything except cleanups. In these, “landingpad … catch” doesn’t work at all. And “call unwindabort” will also not work: the personality interprets a missing callsite as “continue unwind”, rather than “terminate”. As these routines are only used to implement attribute((cleanup)) for C (not C++) code which enables -fexceptions, this is not really an issue. It would only matter if we add a noexcept mechanism for C code in the future.

Further ideas

Potential future extension: Add a new “unwindabort” landingpad clause

As discussed previously, in the Itanium ABI, we cannot currently represent “catch some types, otherwise terminate” in the LSDA metadata. It seems worthwhile considering how that might show up in IR if the underlying capability is added to the ABI.

I’ll just say, however, that it’s not really clear to me that actually making such an ABI change is worth the churn and compatibility issues. It would allow for better stacktraces in this edge-case, but there isn’t really potential for code-size savings, because the cleanups must be emitted for the “real” catch clauses in any case. Thus, this entire section is something I do not currently expect will actually be implemented.

However, if such an ABI extension were to occur, here’s how I propose to utilize it.

Add “unwindabort” as a fourth kind of clause to “landingpad” (beside “cleanup”, “catch”, and “filter”). This would lower to the new kind of action in the LSDA, and would indicate that if none of the previous catch or filter clauses caught the exception, the program should terminate in the handler-search phase, without beginning to unwind – and neither continuing to search upwards, nor transferring control to the landingpad.

Now the previous example

  void fn() noexcept {
    try { a(); } catch (int) {}
  }

can be turned into:

   invoke void @a to label %invoke.cont unwind label %lpad
 lpad:
   landingpad { i8*, i32 } catch i8* @_ZTIi unwindabort
   ...check type matches/handle int catch...

Sharing empty LSDA across functions

After this proposal, many functions could have unique copies of the exact same LSDA:

GCC_except_table0:
.Lexception0:
        .byte   255                             # @LPStart Encoding = omit
        .byte   255                             # @TType Encoding = omit
        .byte   1                               # Call site Encoding = uleb128
        .uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
.Lcst_end0:
        .p2align        2

It might be useful to emit this “empty” table only a single time, rather than once per function.

Improving mixing of -fno-exceptions and -fexceptions.

Once we have a mechanism which makes C++ “noexcept” cheap, it would be interesting to explore a mode where disabling exceptions means “unwinding through no-exceptions frame is guaranteed to abort”, rather than “undefined behavior if unwinding through a no-exceptions frame”. You might call it -fno-exceptions=noexcept, or something along those lines, as the effect would be to cause every decl to be marked noexcept. This could allow the mixing of exceptions/no-exceptions modes within a single binary to be significantly safer – potentially without discarding the performance/codesize advantages of the existing -fno-exceptions mode. If we actually implement it by adding noexcept to the Decl, it also may be feasible to allow a #pragma to enable/disable exceptions within a TU – or, to import a module built with exceptions enabled and use it from a TU with exceptions disabled (and vice versa).

4 Likes

Nice writeup! This is definitely an area where we can take advantage of the liberal language in the C++ standard.

To make sure we’re on the same page, using an example program: Compiler Explorer this is what I think the proposal would do:

define dso_local void @_Z3foov() #0 personality i8* bitcast (i32 (...)* @__gxx_personality_v0 to i8*) {
  %1 = alloca %struct.A, align 1
  %2 = alloca i8*, align 8
  %3 = alloca i32, align 4
// Replace with call
;  invoke void @_Z8canThrowv()
;          to label %4 unwind label %5
  call unwindabort void @_Z8canThrowv()

4:                                                ; preds = %0
  call void @_ZN1AD1Ev(%struct.A* noundef nonnull align 1 dereferenceable(1) %1) #4
  ret void

// Destructor call is completely removed, terminate call encoded as hole in call-site table
;5:                                                ; preds = %0
;  %6 = landingpad { i8*, i32 }
;          catch i8* null
;  %7 = extractvalue { i8*, i32 } %6, 0
;  store i8* %7, i8** %2, align 8
;  %8 = extractvalue { i8*, i32 } %6, 1
;  store i32 %8, i32* %3, align 4
;  call void @_ZN1AD1Ev(%struct.A* noundef nonnull align 1 dereferenceable(1) %1) #4
;  br label %9
;
;9:                                                ; preds = %5
;  %10 = load i8*, i8** %2, align 8
;  call void @__clang_call_terminate(i8* %10) #5
;  unreachable
}

I purposefully pulled from Clang-14 before your change in ⚙ D113620 Skip exception cleanups when the innermost scope is EHTerminateScope. since one of the major improvements unwindabort has is it persists through inlining.

call unwindabort is required so that we can track which call-site entries to omit but it’s not clear to me why this annotation is needed on resume and catchswitch. Are they just to handle this case:

void fn() noexcept {
  try { a(); } catch (int) {}
}

Where resume (for Itanium C++ ABI) and catchswitch (for MSVC ABI) allows termination regardless of clean-up state?

I’m wondering if you’ve started working on the implementation. Full-disclosure: I have an intern this summer and his planned first project was to pattern match the terminate case to be purely in LSDA to save the relocation + call instruction. I came up with this plan right around the time you sent up this RFC… but I only saw this now when @smeenai saw my plan and pointed me to this.

I think your solution solves this issue in a more effective way which would make doing the pattern matching obsolete. Given that, I’m happy to lend the resources if you want to cooperate on the work.

Finally, is there a specific motivation that prompted this improvement? I think the most measurable gain will be in final binary size with the performance impact being fairly minimal if measurable at all. This could also improve compile time by stripping away dead landing pads very early in the compilation pipeline.

1 Like

MSVC is also on _CxxFrameHandler4 now though LLVM doesn’t support it yet. That being said, the noexcept bit per function still exists and for scoped noexcept I actually implemented a similar mechanism where instead of encoding __std_terminate a more compact integer is generated to indicate that termination should be done by the runtime.

_CxxFrameHandler3 is likely never getting deprecated though so this isn’t a major concern.

As far as SEH I would just not worry about it. There’s no concept of noexcept in SEH and C++EH interop mode is defined by how C++EH is implemented on top of SEH inside windows. If specific behavior is desired it’ll have to be specifically implemented at which point the starting point doesn’t matter much.

1 Like

Thanks, this seems like a good idea to me. I see resume unwindabort as being the key innovation that makes it all work. This wasn’t something I had ever considered, and I found the desire to avoid synthesizing calls to std::terminate in the backend to be pretty compelling.

I think the last time I thought hard about this problem, I (and perhaps others) had the attitude that LLVM IR is pure and good, and C++ exceptions are inefficient and nobody should use them. Somehow I convinced myself that C++ language designers should have designed an EH mechanism that makes LLVM’s life easy, which really approaches the problem backwards. C++ came first and is a standard, and LLVM IR should accordingly be designed to support a high quality C++ EH implementation. Your proposal represents a major step in that direction.

1 Like

Any update to share on this work?

1 Like

@jyknight mentioned this work in Why doesn't `unwind_phase2` skip the cleanup frame after an `_Unwind_Resume`? - #3 by jyknight, so it’s definitely still on his radar :slight_smile:

1 Like

I’m very happy someone is looking at taking advantage of this; it’s been a silly suboptimality in our code-generation for a long time. Note that you don’t need noexcept to take advantage of this, because the same properties apply to destructors during unwind — although, of course, in C++ 11 and beyond that rule usually has no added effect because of the implicit noexcept-ness of destructors.

It seems to me that this is the preferable design, although I would tweak it slightly to say that the landing pad ought to contain the call to std::terminate:

  invoke void @a to label %invoke.cont unwind label %lpad
lpad:
  landingpad { i8*, i32 } terminate
  call noreturn void @clang_call_terminate()
  unreachable

This would address your second and third objections to this design, because the landing pad can now actually be correctly chained to from another landing pad, e.g. if we need to inline into this invoke when the callee contains an invoke with non-exhaustive catches. You could use the same resume terminate trick instead of clang_call_terminate if you want.

The only pattern-matching this would require is that we’d want to make sure we didn’t actually emit the landing pad (or the blocks it dominates) if we were able to apply the special-case representation to all uses of the landing pad. And if the personality function ever gains the ability to encode termination as an action, we’d be able to take advantage of it without further changes to the representation.

The main thing you’d lose vs. your attribute approach is that blocks containing calls within noexcept contexts would still be split.

I’ve uploaded an initial patch series:

There’s still some more work to do, but I think this is most of the way there, for itanium unwind dwarf eh (other EH modes are all TODO).

2 Likes

There is one potential issue with unwindabort attribute.
If it is accidentally lost as a result of transformation, this will change the behavior of the program. If unwindabort is lost, the call becomes potentially throwing. An exception that should have caused program termination will propagate further and can be caught by a catch up the call stack.
It may make sense to disable generation of unwindabort calls by default.

It’s not an attribute, those can be dropped in general.
It’s a marker, a semantically-significant one, dropping it,
unless we can deduce that the callee never unwinds,
would be a miscompile in the first place.

Right, that’s why one has to be careful when e.g. duplicating such calls.

I’m interested in seeing this new IR feature go forward, and I don’t see major objections to the design or lots of review activity on the posted patches. If anyone else wants to see this happen, voicing your approval would help move this forward. :slight_smile:

The patches are marked as WIP. For me this means they are not ready for review.

My main concern here is that it must be tested thoroughly before enabling it by default. Losing an unwindabort on a call may change the behavior of the program, and such kind of bugs are extremely hard (read = almost impossible) to track down.

Other than that, I think this is a nice size optimization.

I was really excited to see the patches go up :slight_smile: I haven’t gotten a lot of time to play with them yet, unfortunately; my initial experiments actually showed size increases in various places that I haven’t been able to dig into (plus a coroutine-related crash that’s blocking full app measurements). I’m hoping to have some more time over the next few weeks to experiment with this more thoroughly and provide feedback.

1 Like

I’d also like to voice a lot of support for seeing this move forward. Indeed, the libc++ PSTL turns any exception thrown from a “thread” into a call to std::terminate() (the Standard specifies that). The best way to do this is to make the function that invokes the user code be noexcept, however this loses track of the location where the exception was originally thrown with the way Clang currently implements noexcept.

Long story short, this means that folks using the PSTL will get extremely un-actionable crash reports whenever their program throws inside a PSTL algorithm. There will be absolutely no information explaining where the exception was thrown in the first place, they will only know that the program terminated due to an exception of type X being thrown. It would be really sweet to improve this, unfortunately the backend changes are not something I’m qualified to review.

1 Like

I commented above that I thought it would be better to just do this in landing pads rather than having an attribute on calls. I’m not sure if that discussion was continued elsewhere (which could certainly have happened, and maybe even included me — it was long enough ago to have slipped completely from my mind).

Does this mean that we would have lost the context of where the original exception was thrown at the point where we call std::terminate()? If so, that would be really unfortunate: as I explained in my post above about PSTL, it is useful to be able to terminate the program while still having access to the original place where the exception was thrown from the terminate handler (so that this terminate handler can e.g. call backtrace and print it). This is a concrete problem for PSTL but also for libdispatch and potentially any other library that wants to turn exception throws into a program termination. (Technically, we could also store the backtrace in the exception object all the time but that’s quite a change and I’m not sure it can even be done with the ABI constraints)

If not, then I didn’t understand what you meant by “split” but I’d have no objection to this being implemented with landing pads. Basically I think the only important thing for the library is the ability to get the backtrace from the terminate handler as explained above.

No. This is just about the LLVM IR design and should have no impact on what gets emitted into the LSDA or what happens at runtime. In the proposal, you can put unwindabort on a call instruction in IR, which would effectively just be a shorthand for an invoke instruction leading to a landing pad that only has an unwindabort action (except that the proposal does not add the ability to express that). Because the instruction remains a call, it is not a terminator and so the basic block does not have to be “split”: you can have call unwindabort in the middle of a block.

@jyknight libc++abi’s __cxx_personality_v0 eagerly terminates when it sees things in the LSDA that don’t make sense, which gives me hope that the ABI extension to support terminate handlers as more general EH actions (rather than by just omitting coverage for a specific PC) might be relatively painless, i.e. just formally stating that some otherwise-ill-formed bit of LSDA (e.g. an out-of-bounds index in the types table) is the canonical representation for a terminate action, in a way that would work on existing runtimes without changes. Can you check if there’s something that would work for this across both libc++abi and libsupc++?

I think this is actually important in either representation. Inlining an invoke into a call unwindabort has to synthesize a landing pad for resumes in the callee to continue to, and it seems much more tolerable if that landing pad can just be unreachable because we can rely on the personality function to never actually reach it. I guess the call unwindabort @_Unwind_Resume idea works, but the unreachable thing seems a whole lot cleaner.

Can you check if there’s something that would work for this across both libc++abi and libsupc++ ?

If I am reading them correctly, I think having a null landing pad in the call site table section of the LSDA causes both implementations to call terminate. This doesn’t appear to be documented behavior in the Itanium spec, but that said, it’s hard to see what else an implementation could do there. Seems like a good candidate to me.