RFC: New EH representation for MSVC compatibility

After a long tale of sorrow and woe, my colleagues and I stand here before you defeated. The Itanium EH representation is not amenable to implementing MSVC-compatible exceptions. We need a new representation that preserves information about how try-catch blocks are nested.

WinEH background

I like the way this sorts out with regard to funclet code generation. It feels very natural for Windows EH, though obviously not as natural for non-Windows targets and I think it is likely to block some optimizations that are currently possible with those targets.

If the unwind label is missing, then control leaves the function after the EH action is completed. If a function is inlined, EH blocks with missing unwind labels are wired up to the unwind label used by the inlined call site.

Is this saying that a “missing” unwind label corresponds to telling the runtime to continue the search at the next frame?

Your example looks wrong in this regard, unless I’m misunderstanding it. It looks like any exceptions that aren’t caught in that function will lead to a terminate call.

Invokes that are reached after a catchblock without following any unwind edges must transitively unwind to the first catchend block that the catchblock unwinds to.

I’m not sure I understand this correctly. In particular, I’m confused about the roles of resume and catchend.

%val = cleanupblock unwind label %nextaction

Why isn’t this a terminator? It seems like it performs the same sort of role as catchblock, except presumably it is always entered. I suppose that’s probably the answer to my question, but it strikes me as an ambiguity in the scheme. The catchblock instruction is more or less a conditional branch whereas the cleanupblock is more like a label with a hint as to an unconditional branch that will happen later. And I guess that’s another thing that bothers me – a resume instruction at the end of a catch implementation means something subtly different than a resume instruction at the end of a cleanup implementation.

Hi,

Newbie here. This must be a dumb question, but it's not something I can understand from reading the design documents and RFCs.

Why don't we write and use our own personality function, and then we avoid all these restrictions on the interval tables? On Windows, we would still have to catch exceptions with SEH, of course. But SEH should be language-independent, in the sense that it concerns only unwinding for the "low level" parts of the ABI, like restoring non-volatile registers. It doesn't seem to make sense that LLVM, being a language-independent IR, should concern itself with personality functions specific to Microsoft's C++ run-time library.

I understand we want to link compatibility with object code from Visual Studio, but I didn't think that the personality-specific unwind tables were actually an ABI "artifact". I mean, let's say you compile a function in Visual Studio, the result is a function with some mangled symbol that other object code can make references through the linker. But the other object code never incestuously refers to the unwind tables of the function it calls, right?

I'm speaking from the point of view of an implementor of a new programming language who wants to interoperate with C++. I've already got code that can decode the Microsoft RTTI info coming from C++ exceptions, and the pointers to those can be extracted with SEH GetExceptionPointers, etc. To support catching exceptions in my language, or at least calling cleanups while unwinding, I really don't care about the C++ personality function. After all my language might not even have the concept of (nested) try/catch blocks. The custom personality function does have to be supplied as part of the run-time, but as a frontend implementor I'm prepared to have to write a run-time anyway.

Steve

Right, doing our own personality function is possible, but still has half
the challenge of using __CxxFrameHandler3, and introduces a new runtime
dependency that isn't there currently. Given that it won't save that much
work, I'd rather not introduce a dependency that wasn't there before.

The reason it's still hard is that you have to split the main function up
into more than one subfunction. The exception object is allocated in the
frame of the function calling __CxxThrow, and it has to stay alive until
control leaves the catch block receiving the exception. This is different
from Itanium, where the exception object is constructed in heap memory and
the pointer is saved in TLS. If this were not the case, we'd use the
__gxx_personaltity_v0-style landingpad approach and make a new personality
variant that understands MS RTTI.

We could try to do all this outlining in Clang, but that blocks a lot of
LLVM optimizations. Any object with a destructor (std::string) is now
escaped into the funclet that calls the destructor, and simple
transformations (SROA) require interprocedural analysis. This affects the
code on the normal code path and not just the exceptional path. While EH
constructs like try / catch are fairly rare in C++, destructor cleanups are
very, very common, and I'd rather not pessimize so much code.

We already have something like what you describe in the form of mingw support. It runs on Windows and handles exceptions using a DWARF-style personality function and (I think) an LLVM-provided implementation of the libc++abi library.

What this doesn't do is provide interoperability with MSVC-compiled objects. For instance, you can't throw an exception from MSVC-compiled code and catch it with clang/LLVM-compiled code or vice versa. With the (too fragile) implementation we have in place right now you can do that (at least in cases that don't break for other reasons), and we want to be able to continue that capability with a new, more robust, solution.

-Andy

Are you guys talking specifically about Win32 EH here? AFAIK, Win64
EH works with gcc-style personality routines just fine.

No, we're talking about 32- and 64-bit programs. The goal is specifically to get these programs to work with the MSVC runtime. If the MSVC runtime starts handling an exception (for instance, within a library compiled with MSVC) it is only going to dispatch that exception to a handler that it is able to recognize.

If all you are interested in is handling exceptions within a closed system, then there are certainly a lot of ways it can be accomplished. It's the desire for MSVC compatibility that constrains the implementation.

Right, doing our own personality function is possible, but still has half the challenge of using __CxxFrameHandler3, and introduces a new runtime dependency that isn't there currently. Given that it won't save that much work, I'd rather not introduce a dependency that wasn't there before.

The reason it's still hard is that you have to split the main function up into more than one subfunction.

I see, but I thought are able to outline cleanup code already?

And that the hiccup you are encountering is because __CxxFrameHandler3 requires unwind tables with properly-ordered state transitions? The compiler SEH personality (_C_specific_handler) doesn't have that, right? If you could manage __try, __finally already, doesn't that provide the solution?

Let me be precise. Let's take your example with the "ambiguous IR lowering":

void test1() {
// EH state = -1
try {
// EH state = 0
try {
// EH state = 1
throw 1;
} catch(...) {
// EH state = 2
throw;
}
// EH state = 0
} catch (...) {
// EH state = 3
}
// EH state = -1
}

If I were "lowering" to compiler SEH, I would do something like this:

If I were "lowering" to compiler SEH, I would do something like this:

  __try {
    // [0]
    // [1]
    __try {
      // [2]
      throw 1;
      // [3]
    } __except( MyCxxFilter1() ) {
      // [4]
      throw;
      // [5]
    }
    // [6]
    // [7]
  } __except( MyCxxFilter2() ) {
    // [8]
    // [9]
  }
  // [10]
  // [11]

My scope tables for _C_specific_handler look like this:

  BeginAddress EndAddress HandlerAddress JumpTarget
  [0] [1] MyCxxFilter2 [8]
  [2] [3] MyCxxFilter1 [4]
  [4] [5] MyCxxFilter2 [8]
  [6] [7] MyCxxFilter2 [8]

I'm "cheating" in that I can look at the source code,
but again, you already can lower __try, __except already using
_C_specific_handler. There are no state transitions encoded
in the compiler SEH scope table so they aren't an issue...?

Now there is a subtle problem in my "lowering" in that the
there may be local objects with destructors, that have to
be lowered to __try, __finally. Microsoft's compiler SEH,
and _C_specific_handler, does not allow a __try block
to have both __except and __finally following. That's why
I suggest, writing a personality function, replacing
_C_specific_handler that does allow __finally + __except.

The exception object is allocated in the frame of the function calling __CxxThrow, and it has to stay alive until control leaves the catch block receiving the exception.
This is different from Itanium, where the exception object is constructed in heap memory and the pointer is saved in TLS. If this were not the case, we'd use the __gxx_personaltity_v0-style landingpad approach and make a new personality variant that understands MS RTTI.

I'm surprised, I really want to check this myself later this week. I always thought that MSVCRT always copied your exception object because I have always seen it invoking the copy constructor on throw X. (It was a pain in my case because I didn't really want my exception objects to be copyable, only movable, and at least VS 2010 still insisted that I implement a copy constructor.)

Furthermore, the "catch info" in the MS ABI does have a field for the destructor that the catch block has to call. It's not theoretical, I've got code that calls that function pointer so I can properly catch C++ exceptions from a SEH handler. Though I might be mistaken in that the field points to just an in-place destructor and not a deleting destructor.

Also, I thought the stack already is unwinded completely when you reach the beginning of the catch block (but not a __finally block, i.e. the cleanup code). At least, that's the impression I get from reading reverse-engineered source code for the personality functions and the Windows API RtlUnwindEx.

We could try to do all this outlining in Clang, but that blocks a lot of LLVM optimizations. Any object with a destructor (std::string) is now escaped into the funclet that calls the destructor, and simple transformations (SROA) require interprocedural analysis. This affects the code on the normal code path and not just the exceptional path. While EH constructs like try / catch are fairly rare in C++, destructor cleanups are very, very common, and I'd rather not pessimize so much code.

Right, but __CxxFrameHandler3 already forces you to outline destructor cleanups into funclets. So if you wanted to stop doing that you have to write your own personality function right?

What I am saying is, if you can design the personality function so that it works naturally with LLVM IR --- which can't see the source-level scopes --- that seems a whole lot less work versus:

* Changing the existing Itanium-based EH model in LLVM
* Incurring the wrath of people who like the Itanium model
* Having to maintain backwards compatibility or provide an upgrade path

Also, I think, if we want to eventually support trapped operations (some kind of invoke div_with_trap mentioned in another thread), wouldn't it be way easier to implement and optimize if the personality function can be designed in the right way?

Steve

I like the way this sorts out with regard to funclet code generation.
It feels very natural for Windows EH, though obviously not as natural for
non-Windows targets and I think it is likely to block some optimizations
that are currently possible with those targets.

Right, it will block some of today's optimizations by default. I'm OK with
this because we can add those optimizations back by checking if the
personality is Itanium-family (sjlj, arm, or dwarf), and optimizing EH
codepaths is not usually performance critical.

> If the unwind label is missing, then control leaves the function after
the EH action is completed. If a function is inlined, EH blocks with
missing unwind labels are wired up to the unwind label used by the inlined
call site.

Is this saying that a “missing” unwind label corresponds to telling the
runtime to continue the search at the next frame?

Yep. For the C++ data structure it would simply be a missing or null
operand.

Your example looks wrong in this regard, unless I’m misunderstanding it.
It looks like any exceptions that aren’t caught in that function will lead
to a terminate call.

Well, those are the intended semantics of noexcept, unless I'm mistaken.
And the inliner *should* wire up the unwind edge of the terminateblock to
the unwind edge of the inlined invoke instruction, because it's natural to
lower terminateblock to a catch-all plus termination call block. I wanted
to express that as data, though, so that in the common case that the
noexcept function is not inlined, we can simply flip the "noexcept" bit in
the EH info. There's a similar optimization we can do for Itanium that we
miss today.

Invokes that are reached after a catchblock without following any unwind
edges must transitively unwind to the first catchend block that the
catchblock unwinds to.

I’m not sure I understand this correctly. In particular, I’m confused
about the roles of resume and catchend.

catchendblock is really there to support figuring out which calls were
inside the catch scope. resume has two roles: moving to the next EH action
after a cleanup, and transitioning from the catch block back to normal
control flow. Some of my coworkers said it should be split into two
instructions for each purpose, and I could go either way.

> %val = cleanupblock <valty> unwind label %nextaction

Why isn’t this a terminator? It seems like it performs the same sort of
role as catchblock, except presumably it is always entered. I suppose
that’s probably the answer to my question, but it strikes me as an
ambiguity in the scheme. The catchblock instruction is more or less a
conditional branch whereas the cleanupblock is more like a label with a
hint as to an unconditional branch that will happen later. And I guess
that’s another thing that bothers me -- a resume instruction at the end of
a catch implementation means something subtly different than a resume
instruction at the end of a cleanup implementation.

Yeah, reusing the resume instruction for both these things might not be
good. I liked not having to add more terminator instructions, though. I
think most optimizations will not care about the differences between the
two kinds of resume. For CFG formation purposes, it either has one
successor or none, and that's enough for most users.

I felt that cleanupblock should not be a terminator because it keeps the IR
more concise. The smaller an IR construct is, the more people seem to
understand it, so I tried to go with that.

Hi,

Thanks for sending this out. We’re looking forward to seeing this come about, since we need funclet separation for LLILC as well (and I have cycles to spend on it, if that would be helpful).

Some questions about the new proposal:

  • Do the new forms of resume have any implied read/write side-effects, or do they work just like a branch? In particular, I’m wondering what prevents reordering a call across a resume. Is this just something that code motion optimizations are expected to check for explicitly to avoid introducing UB per the “Executing such an invoke [or call] that does not transitively unwind to the correct catchend block has undefined behavior” rule?

  • Does LLVM already have other examples of terminators that are glued to the top of their basic blocks, or will these be the first? I ask because it’s a somewhat nonstandard thing (a block in the CFG that can’t have instructions added to it) that any code placement algorithms (PRE, PGO probe insertion, Phi elimination, RA spill/copy placement, etc.) may need to be adjusted for. The adjustments aren’t terrible (conceptually it’s no worse than having unsplittable edges from each of the block’s preds to each of its succs), but it’s something to be aware of.

  • Since this will require auditing any code with special processing of resume instructions to make sure it handles the new resume forms correctly, I wonder if it might be helpful to give resume (or the new forms of it) a different name, since then it would be immediately clear which code has/hasn’t been updated to the new model.

  • Is the idea that a resume (of the sort that resumes normal execution) ends only one catch/cleanup, or that it can end any number of them? Did you consider having it end a single one, and giving it a source that references (in a non-flow-edge-inducing way) the related catchend? If you did that, then:

  • The code to find a funclet region could terminate with confidence when it reaches this sort of resume, and

  • Resumes which exit different catches would have different sources and thus couldn’t be merged, reducing the need to undo tail-merging with code duplication in EH preparation (by blocking the tail-merging in the first place)

  • What is the plan for cleanup/__finally code that may be executed on either normal paths or EH paths? One could imagine a number of options here:
  • require the IR producer to duplicate code for EH vs non-EH paths

  • duplicate code for EH vs non-EH paths during EH preparation

  • use resume to exit these even on the non-EH paths; code doesn’t need to be duplicated (but could and often would be as an optimization for hot/non-EH paths), and normal paths could call the funclet at the end of the day

and it isn’t clear to me which you’re suggesting. Requiring duplication can worst-case quadratically expand the code (in that if you have n levels of cleanup-inside-cleanup-inside-cleanup-…, and each cleanup has k code bytes outside the next-inner cleanup, after duplication you’ll have kn + k(n-1) + … or O(kn^2) bytes total [compared to kn before duplication]), which I’d think could potentially be a problem in pathological inputs.

Thanks

-Joseph

I skimmed the source code in libgcc of that personality function. It's rather tricky in that it threads the SEH personality function through an existing Itanium-style personality function. I agree completely that code might not be interoperable with MSVC, though I can't tell for sure. But I wasn't thinking of threading an Itanium-style personality. I was thinking of a personality that still adhered to SEH semantics as much as possible but lift the restrictions of _CxxFrameHandler3 that block what you're doing so far.

Even the problem that Reid mentioned about _CxxThrowException putting the exception object in the wrong place, I think, can be worked around with a new personality. The personality just has to copy the exception object into the stack frame of the function with the catch block (i.e. "landing pad" in Itanium parlance) before RtlUnwindEx transfers control to the landing pad. Obviously, you have to pre-allocate for the size of the exception object, I guess, in your WinEHPrepare pass. Obviously _CxxFrameHandler3 does not do that but we could.

Steve

Should have thought more before opening my mouth :slight_smile:
Scratch that because it won't work with rethrows unless I get to patch the address of the in-flight exception object. Damn global variables.

Right, doing our own personality function is possible, but still has half
the challenge of using __CxxFrameHandler3, and introduces a new runtime
dependency that isn't there currently. Given that it won't save that much
work, I'd rather not introduce a dependency that wasn't there before.

The reason it's still hard is that you have to split the main function up
into more than one subfunction.

I see, but I thought are able to outline cleanup code already?

We can, but frankly it's unreliable. The new representation should help
make the job easier.

And that the hiccup you are encountering is because __CxxFrameHandler3
requires unwind tables with properly-ordered state transitions? The
compiler SEH personality (_C_specific_handler) doesn't have that, right? If
you could manage __try, __finally already, doesn't that provide the
solution?

Right, __CxxFrameHandler3 is a lot more constraining than
__C_specific_handler. The SEH personality doesn't let you rethrow
exceptions, so once you catch the exception you're done, you're in the
parent function. My understanding is that C++ works by having an active
catch handler on the stack.

Let me be precise. Let's take your example with the "ambiguous IR
lowering":

I snipped the example, but in general, yes, I agree we could do another
personality with a less restrictive table format. I'm still not convinced
it's worth it.

The exception object is allocated in the frame of the function calling

__CxxThrow, and it has to stay alive until control leaves the catch block
receiving the exception.
This is different from Itanium, where the exception object is constructed
in heap memory and the pointer is saved in TLS. If this were not the case,
we'd use the __gxx_personaltity_v0-style landingpad approach and make a new
personality variant that understands MS RTTI.

I'm surprised, I really want to check this myself later this week. I
always thought that MSVCRT always copied your exception object because I
have always seen it invoking the copy constructor on throw X. (It was a
pain in my case because I didn't really want my exception objects to be
copyable, only movable, and at least VS 2010 still insisted that I
implement a copy constructor.)

Right, the type does have to be copyable. I think it's supposed to be
copied as part of the throw-expression, but if not, then it has to go fill
out the CatchableType tables, which have copy constructors in them. Anyway,
I might be wrong about where precisely the exception lives in memory, but
I'm sure the catches are outlined to support rethrow.

Furthermore, the "catch info" in the MS ABI does have a field for the
destructor that the catch block has to call. It's not theoretical, I've got
code that calls that function pointer so I can properly catch C++
exceptions from a SEH handler. Though I might be mistaken in that the field
points to just an in-place destructor and not a deleting destructor.

Yep.

Also, I thought the stack already is unwinded completely when you reach
the beginning of the catch block (but not a __finally block, i.e. the
cleanup code). At least, that's the impression I get from reading
reverse-engineered source code for the personality functions and the
Windows API RtlUnwindEx.

For __try / __except, yes, the stack is unwound at the point of the
__except. For try / catch, the stack unwinds after you leave the catch body
by fallthrough, goto, break, continue, return or whatever else you like,
because after that point you cannot rethrow anymore.

We could try to do all this outlining in Clang, but that blocks a lot of

LLVM optimizations. Any object with a destructor (std::string) is now
escaped into the funclet that calls the destructor, and simple
transformations (SROA) require interprocedural analysis. This affects the
code on the normal code path and not just the exceptional path. While EH
constructs like try / catch are fairly rare in C++, destructor cleanups are
very, very common, and I'd rather not pessimize so much code.

Right, but __CxxFrameHandler3 already forces you to outline destructor
cleanups into funclets. So if you wanted to stop doing that you have to
write your own personality function right?

No, I believe if we want to be able ABI compatible, we need to outline at
least destructor cleanups, regardless of what personality we use.

What I am saying is, if you can design the personality function so that it
works naturally with LLVM IR --- which can't see the source-level scopes
--- that seems a whole lot less work versus:

* Changing the existing Itanium-based EH model in LLVM
* Incurring the wrath of people who like the Itanium model
* Having to maintain backwards compatibility or provide an upgrade path

So, the nice thing about this design is that there are no scopes in normal
control flow. The scoping is all built into the EH blocks, which most
optimization passes don't care about. If you do a quick search through
lib/Transforms, you'll see there are very few passes that operate on
LandingPadInst and ResumeInst. Changing these instructions is actually
relatively cheap, if we can agree on the new semantics.

Also, I think, if we want to eventually support trapped operations (some
kind of invoke div_with_trap mentioned in another thread), wouldn't it be
way easier to implement and optimize if the personality function can be
designed in the right way?

Right, asynch exceptions are definitely something that users keep asking
for, so I'd like to see it done right if we want to do it at all. I think
this change is separable, though. Asynch exceptions have a lot more to do
with how you represent the potentially trapping operations (BB unwind
labels, lots of invoked-intrinsics, more instructions) than how you
represent the things to do on exception.

Thanks for taking a look!

Hi,

Thanks for sending this out. We're looking forward to seeing this come
about, since we need funclet separation for LLILC as well (and I have
cycles to spend on it, if that would be helpful).

Some questions about the new proposal:

- Do the new forms of resume have any implied read/write side-effects, or
do they work just like a branch? In particular, I'm wondering what
prevents reordering a call across a resume. Is this just something that
code motion optimizations are expected to check for explicitly to avoid
introducing UB per the "Executing such an invoke [or call] that does not
transitively unwind to the correct catchend block has undefined behavior"
rule?

Yes, crossing a resume from a catchblock ends the lifetime of the exception
object, so I'd say that's a "writes escaped memory" constraint. That said,
a resume after a cleanupblock doesn't, but I'm not sure it's worth having
this kind of fine-grained analysis. I'm OK teaching SimplifyCFG to combine
cleanupblocks and leaving it at that.

- Does LLVM already have other examples of terminators that are glued to
the top of their basic blocks, or will these be the first? I ask because
it's a somewhat nonstandard thing (a block in the CFG that can't have
instructions added to it) that any code placement algorithms (PRE, PGO
probe insertion, Phi elimination, RA spill/copy placement, etc.) may need
to be adjusted for. The adjustments aren't terrible (conceptually it's no
worse than having unsplittable edges from each of the block's preds to each
of its succs), but it's something to be aware of.

No, LLVM doesn't have anything like this yet. It does have unsplittable
critical edges, which can come from indirectbr and the unwind edge of an
invoke. I don't think it'll be too hard to teach transforms how to deal
with one more, but maybe that's unrealistic youthful optimism. :slight_smile:

- Since this will require auditing any code with special processing of

resume instructions to make sure it handles the new resume forms correctly,
I wonder if it might be helpful to give resume (or the new forms of it) a
different name, since then it would be immediately clear which code
has/hasn't been updated to the new model.

There aren't that many references to ResumeInst across LLVM, so I'm not too
scared. I'm not married to reusing 'resume', other candidate names include
'unwind' and 'continue', and I'd like more ideas.

- Is the idea that a resume (of the sort that resumes normal execution)
ends only one catch/cleanup, or that it can end any number of them? Did
you consider having it end a single one, and giving it a source that
references (in a non-flow-edge-inducing way) the related catchend? If you
did that, then:

+ The code to find a funclet region could terminate with confidence when
it reaches this sort of resume, and

+ Resumes which exit different catches would have different sources and
thus couldn't be merged, reducing the need to undo tail-merging with code
duplication in EH preparation (by blocking the tail-merging in the first
place)

We already have something like this for cleanupblocks because the resume
target and unwind label of the cleanupblock must match. It isn't as strong
as having a reference to the catchblock itself, because tail merging could
kick in like you mention. Undoing this would be and currently is the job of
WinEHPrepare. I guess I felt like the extra representational complexity
wasn't worth the confidence that it would buy us.

- What is the plan for cleanup/__finally code that may be executed on
either normal paths or EH paths? One could imagine a number of options
here:

+ require the IR producer to duplicate code for EH vs non-EH paths

+ duplicate code for EH vs non-EH paths during EH preparation

+ use resume to exit these even on the non-EH paths; code doesn't need to
be duplicated (but could and often would be as an optimization for
hot/non-EH paths), and normal paths could call the funclet at the end of
the day

and it isn't clear to me which you're suggesting. Requiring duplication
can worst-case quadratically expand the code (in that if you have n levels
of cleanup-inside-cleanup-inside-cleanup-…, and each cleanup has k code
bytes outside the next-inner cleanup, after duplication you'll have k*n +
k*(n-1) + … or O(k*n^2) bytes total [compared to k*n before duplication]),
which I'd think could potentially be a problem in pathological inputs.

I want to have separate normal and exceptional codepaths, but at -O0 all
the cleanup work should be bundled up in a function that gets called from
both those paths.

Today, for C++ destructors, we emit two calls to the destructor: one on the
normal path and one on the EH path. For __finally, we outline the finally
body early in clang and emit two calls to it as before, but passing in the
frameaddress as an argument. I think this is a great place to be. It keeps
our -O0 code size small, simplifies the implementation, and allows us to
inline one or both call sites if we think it's profitable.

Hi Reid,

Right, __CxxFrameHandler3 is a lot more constraining than __C_specific_handler. The SEH personality doesn't let you rethrow exceptions, so once you catch the exception you're done, you're in the parent function. My understanding is that C++ works by having an active catch handler on the stack.

Okay, I checked the Wine source code for __CxxFrameHandler3. I stand corrected.

While we are on the topic of Windows EH, I like to know your (and others', of course) thoughts on the following. It's my wishlist as a frontend implementor :slight_smile:

- Win32 (x86) frame-based SEH

For __CxxFrameHandler3, since destructors and catch blocks execute as funclets while the throwing function's stack frame is still active, it's not going to be a problem right?

But for __C_specific_handler, I see a potential issue versus x86-64, in that RtlUnwind can't restore non-volatile registers, so when registers are garbage when control is transferred to the landing pad. When I read the Itanium ABI documentation, it says that landing pads do get non-volatile registers restored, so I guess that's probably the working assumption of LLVM.

__C_specific_handler's registration frame saves down EBP, but no other registers, even ESP. If we use dynamic alloca or frame pointer omission, we are dead in the water, right?

- Writing one's own personality functions

This makes a lot of sense if one is implementing a different language than C++ that has exceptions, and is prepared to provide their own run-time support.

Say, if the language supports resuming from exceptions, or can query type information in more flexible ways than C++'s std::type_info matching. Does it really make sense for the backend, LLVM, to hard-code knowledge about the language-specific data area (LSDA)? Even in the Itanium ABI it's explicitly stated that the personality is specific to the source language, yet multiple personalities can interoperate in the same program. Ideally, I would prefer the backend to take control of everything to do with arranging the landing pads, branches within landing pads, and so on, but NOT the language-dependent exception matching.

Taken to the extreme, LLVM would have to expose tables that the LLVM client would have to translate to their own formats, like the garbage collection "unwind" tables. If that's too complicated at least it would be nice to supply custom filter functions for catch clauses. Inspired by SEH filters obviously, but we might devise a slightly more portable version.

Even for C++ I actually wouldn't mind being able to arbitrarily replace the personality, and/or the runtime functions for throwing and resuming. In my C++ source code I always throw exceptions wrapped in a macro, because I want to instrument all my throw statements. In particular, I can construct a reliable stack trace on the spot with RtlVirtualUnwind (or walking the EBP chain on x86). It would be a nice bonus if we could implement this kind of instrumentation with Clang. Encouragement to switch from MSVC :slight_smile:

Steve

Hi Reid,

Right, __CxxFrameHandler3 is a lot more constraining than

__C_specific_handler. The SEH personality doesn't let you rethrow
exceptions, so once you catch the exception you're done, you're in the
parent function. My understanding is that C++ works by having an active
catch handler on the stack.

Okay, I checked the Wine source code for __CxxFrameHandler3. I stand
corrected.

While we are on the topic of Windows EH, I like to know your (and others',
of course) thoughts on the following. It's my wishlist as a frontend
implementor :slight_smile:

- Win32 (x86) frame-based SEH

For __CxxFrameHandler3, since destructors and catch blocks execute as
funclets while the throwing function's stack frame is still active, it's
not going to be a problem right?

My understanding is that __CxxFrameHandler3 does something like the
following:

for (void (*Cleanup)(bool, void*) : Cleanups) {
  __try {
    Cleanup(/*AbnormalTermination=*/true, EstablisherFrame);
  } __except(1) {
    std::terminate(); // can't rethrow
  }
}
__try {
  CallCatchBlock();
} __except(__CxxDetectRethrow(), EXCEPTION_CONTINUE_SEARCH) {
}

So I guess it's not really that the catch block has an active frame, and
more that __CxxFrameHandler3 is there saying "hey, I saw a rethrow
exception go by during phase 1, here's what that exception was supposed to
be".

But for __C_specific_handler, I see a potential issue versus x86-64, in

that RtlUnwind can't restore non-volatile registers, so when registers are
garbage when control is transferred to the landing pad. When I read the
Itanium ABI documentation, it says that landing pads do get non-volatile
registers restored, so I guess that's probably the working assumption of
LLVM.

That's pretty frustrating, given that the xdata unwinder already knows
where the non-volatile registers are saved. Anyway, I think it can be
overcome in the backend with the right register allocation constraints.

__C_specific_handler's registration frame saves down EBP, but no other
registers, even ESP. If we use dynamic alloca or frame pointer omission, we
are dead in the water, right?

Are you sure the unwinder doesn't restore RSP? Anyway, the address of a
dynamic alloca can easily be spilled to the stack and reloaded.

- Writing one's own personality functions

This makes a lot of sense if one is implementing a different language than
C++ that has exceptions, and is prepared to provide their own run-time
support.

Say, if the language supports resuming from exceptions, or can query type
information in more flexible ways than C++'s std::type_info matching. Does
it really make sense for the backend, LLVM, to hard-code knowledge about
the language-specific data area (LSDA)? Even in the Itanium ABI it's
explicitly stated that the personality is specific to the source language,
yet multiple personalities can interoperate in the same program. Ideally, I
would prefer the backend to take control of everything to do with arranging
the landing pads, branches within landing pads, and so on, but NOT the
language-dependent exception matching.

Taken to the extreme, LLVM would have to expose tables that the LLVM
client would have to translate to their own formats, like the garbage
collection "unwind" tables. If that's too complicated at least it would be
nice to supply custom filter functions for catch clauses. Inspired by SEH
filters obviously, but we might devise a slightly more portable version.

I think LLVM has to know about the table format and landingpad PC values,
because that's its business. The RTTI data, though, is completely between
the frontend and the EH personality. I could imagine a personality that
uses an Itanium LSDA, but the RTTI pointers are really pointers to
functions that get called during phase 1 to implement SEH filters. The new
representation will actually allow you to pass more data here to support
passing in "adjectives" as required for MSVC, but LLVM will have to know
where to put it in the table and there's no way to avoid that.

I hadn’t noticed the “noexcept” specifier in your example. That clears up part of my concerns, but I still have some problems.

With regard to the multiple meanings of ‘resume’ I am more concerning about developers who are reading the IR understanding it than about passes operating on it. Apart from making it harder to debug problems related to control flow at resume instructions I think this makes it more likely that code which mishandles it will be introduced down the road. If I’m reading things correctly, a resume instruction in your proposal could mean:

a) We’re done handling this exception, continue normal execution at this label.

b) We’re done handling this exception, continue execution in an enclosing catch handler at this label.

c) We’re done executing this termination handler, check the catch handler at this label to see if it can handle the current exception.

d) We’re done executing this termination handler, now execute the termination handler at this label.

e) We’re done executing this termination handler, continue handling the exception in the runtime.

I suppose (a) and (b) are more or less the same and it doesn’t entirely matter whether the destination is in normal code or exception code. In practical terms (c) and (d) may be the same also, but logically, in terms of how the runtime works, they are different. I’m pretty sure there’s a gap in my understanding of your proposal because I don’t understand how e() is represented at all.

As an exercise, I tried to work through the IR that would be produced in the non-optimized case for the following code:

void test() {

try {

Obj o1;

try {

f();

} catch (int) {}

Obj o2;

try {

g();

} catch (int) {}

h();

} catch (int) {}

}

Here’s what I came up with:

define void @foo() personality i32 (…)* @__CxxFrameHandler3 {

%e.addr = alloca i32

invoke void @f(i32 1)

to label %cont1 unwind label %cleanup.Obj

cont1:

invoke void @g(i32 2)

to label %cont2 unwind label %cleanup.Obj.1

cont2:

invoke void @h(i32 2)

to label %cont3 unwind label %cleanup.Obj.2

cont3:

call void @~Obj()

call void @~Obj()

br label %return

return:

ret void

cleanup.Obj:

cleanupblock unwind label %maycatch.int

call void @~Obj()

resume label %maycatch.int

maycatch.int:

catchblock void [i8* @typeid.int, i32 7, i32* %e.addr]

to label %catch.int unwind label %catchend

catch.int:

resume label %cont1

catchend:

resume

cleanup.Obj.1:

cleanupblock unwind label %maycatch.int.1

call void @~Obj()

call void @~Obj()

resume label %maycatch.int.1

maycatch.int.1:

catchblock void [i8* @typeid.int, i32 7, i32* %e.addr]

to label %catch.int.1 unwind label %catchend.1

catch.int.1:

resume label %cont2

catchend.1:

resume

cleanup.Obj.2:

cleanupblock unwind label %maycatch.int.2

call void @~Obj()

call void @~Obj()

resume label %maycatch.int.2

maycatch.int.2:

catchblock void [i8* @typeid.int, i32 7, i32* %e.addr]

to label %catch.int.2 unwind label %catchend.2

catch.int.2:

resume label %return

catchend.2:

resume

}

I don’t know if I got that right, but it seems to me that there are a couple of problems with this. Most obviously, there is a good bit of duplicated code here (which the optimization passes will probably want to combine).

More significantly though is that it doesn’t correctly describe what happens if a non-int exception is thrown in any of the called functions. For instance, if a non-int exception is thrown from g() that is caught somewhere further down the stack, the runtime should call a terminate handler that destructs o1 and then call a terminate handler that destructs o2. However, my IR doesn’t describe a terminate handler that destructs just o2 and I don’t know how I could get it to do so within the scheme that you have proposed.

Do you have a way to handle this case that I haven’t perceived?

In a mostly unrelated matter, have you thought about what needs to be done to prevent catchblock blocks from being combined? For example, suppose you have code that looks like this:

void test() {

try {

f();

} catch (int) {

x();

y();

z();

}

try {

g();

} catch (…) {

}

try {

h();

} catch (int) {

x();

y();

z();

}

}

I think it’s very likely that if we don’t do anything to prevent it the IR generated for this will be indistinguishable from the IR generated for this:

void test() {

try {

f();

try {

g();

} catch (…) {

}

h();

} catch (int) {

x();

y();

z();

}

}

In this case that might be OK, but theoretically the calls to f() and h() should get different states and there are almost certainly cases where failing to recognize that will cause problems. What’s more, the same basic pattern arises for this case:

void test() {

try {

f();

} catch (int) {

x();

y();

z();

}

try {

g();

} catch (float) {

}

try {

h();

} catch (int) {

x();

y();

z();

}

}

But in this case, if we get the state numbering wrong an int-exception from g() could end up being incorrectly caught by the xyz handler.

BTW, finding cases like this is the primary reason that I’ve been trying to push my current in-flight patch onto the sinking ship that is our current implementation. I mentioned to you before that the test suite I’m using passes with my proposed patch, but that’s only true with optimizations disabled. With optimizations turned on I’m seeing all kinds of fun things like similar handlers being combined and common instructions being hoisted above a shared(!) eh_begincatch call in if-else paired handlers. I don’t know if it will be worth trying to fix these problems, but seeing them in action has been very instructive.

-Andy

Leaving aside the rest of the thread, I feel the need to refute this point in isolation. I’ve found that optimizing (usually simplifying and eliminating) exception paths ends up being extremely important for my workloads. Failing to optimize exception paths sufficiently tends to indirectly hurt things like inlining for example. Any design which starts with the assumption that optimizing exception paths isn’t important is going to be extremely problematic for me.

optimizing EH codepaths is not usually performance critical.

Leaving aside the rest of the thread, I feel the need to refute this point in isolation. I’ve found that optimizing (usually simplifying and eliminating) exception paths ends up being extremely important for my workloads. Failing to optimize exception paths sufficiently tends to indirectly hurt things like inlining for example. Any design which starts with the assumption that optimizing exception paths isn’t important is going to be extremely problematic for me.

That’s interesting.

I wasn’t thinking about performance so much as code size in my original comment. I’ve been looking at IR recently where code from multiple exception handlers was combined while still maintaining the basic control flow of the EH code. This kind of optimization is wreaking havoc for our current MSVC compatible EH implementation (hence the redesign), but I guess the Itanium ABI scheme doesn’t have a problem with it.

I suppose that is closely related to your concerns about inlining, I just hadn’t made the connection.

In theory the funclets should be able to share code blocks without any problem. The entry and exit points are the critical parts that make them funclets. I’m just not sure how we can get the optimization passes to recognize this fact while still meeting the MSVC runtime constraints. Reid’s proposal of separate catch blocks should help with that, but I’m still not sure we’ll want to use this representation for targets that don’t need it.

> optimizing EH codepaths is not usually performance critical.

>> Leaving aside the rest of the thread, I feel the need to refute this
point in isolation. I've found that optimizing (usually simplifying and
eliminating) exception paths ends up being *extremely* important for my
workloads. Failing to optimize exception paths sufficiently tends to
indirectly hurt things like inlining for example. Any design which starts
with the assumption that optimizing exception paths isn't important is
going to be extremely problematic for me.

On the whole, the whole reason we've gone down this path is to support
stronger analysis of EH paths, but I always think about it in terms of
supporting simplification of the normal control flow path. Consider
unique_ptr:

void f() {
  std::unique_ptr<int> p(new int(42));
  g(p.get());
}

This representation should support removing the heap allocation here by
inlining the destructor on the normal path and EH path and promoting the
heap allocation to a stack allocation. If our representation required early
outlining, this would not be possible, or at least it would require
inter-procedural analysis.

That’s interesting.

I wasn’t thinking about performance so much as code size in my original
comment. I’ve been looking at IR recently where code from multiple
exception handlers was combined while still maintaining the basic control
flow of the EH code. This kind of optimization is wreaking havoc for our
current MSVC compatible EH implementation (hence the redesign), but I guess
the Itanium ABI scheme doesn’t have a problem with it.

I suppose that is closely related to your concerns about inlining, I just
hadn’t made the connection.

In theory the funclets should be able to share code blocks without any
problem. The entry and exit points are the critical parts that make them
funclets. I’m just not sure how we can get the optimization passes to
recognize this fact while still meeting the MSVC runtime constraints.
Reid’s proposal of separate catch blocks should help with that, but I’m
still not sure we’ll want to use this representation for targets that don’t
need it.

I think sharing code between funclets would require some extreme gymnastics
to generate the right pdata and xdata, but I suppose it's not too different
from what MSVC 2015 requires for coroutines.