RFC: Native Windows C++ exception handling

I am working on adding support for C++ exception handling when compiling for a native Windows target (that is a target with “MSVC” specified as the environment). Because of differences between how the native Windows runtime handles exceptions and the Itanium-based model used by current LLVM exception handling code, I believe this will require some extensions to the LLVM IR, though I’m trying to leverage the existing mechanisms as much as possible.

I’ll discuss this below in more detail, but the summary is that I’m going to propose an extension to the syntax of the landing pad instruction to enable landing pad clauses to be outlined as external functions, and I’d like to introduce two new intrsinsics, llvm.eh.begin.catch and llvm.eh.end.catch, to replace calls to the libc++abi __cxa_begin_catch and __cxa_end_catch functions.

Currently, LLVM supports 64-bit Windows exception handling for MinGW targets using a custom personality function and the libc++abi library. There are also LLVM clients, such as ldc, that provide Windows exception handling similar to what I am proposing by providing their own custom personality function. However, what I would like is to support Windows C++ exception handling using the __CxxFrameHandler3 function provided by the native Windows runtime library.

Some of the primary challenges in supporting native Windows C++ exception handling are:

  1. Catch and unwind handlers are called in a different frame context than the original function in which they are defined.

  2. Windows exception handling is state driven rather than landing pad based. The compiler must generate a table for each function mapping IP addresses within that function to the EH state at that address. When an exception is thrown the runtime uses this table to determine which unwind and catch handlers should be invoked.

  3. Windows catch and unwind handling is implemented using a series of calls to discrete handlers rather than a jump to a landing pad which uses runtime decisions to reach all relevant handler blocks as is done in LLVM’s existing implementations. LLVM’s current landing pad structure frequently results in in catch handling blocks and cleanup blocks which are shared by multiple landing pads. Windows expects each catch handler and unwind handler to be defined in a single location. The runtime then determines which handlers should be called based on the EH state when an exception is thrown and makes a series of calls when multiple handlers are needed.

The first challenge is relatively easy to address. The Microsoft C++ compiler creates a psuedo-function for handlers which it embeds in the body of the parent function, but for LLVM I would like to try simply outlining the handler bodies into fully external functions. The task of outlining the handler code is somewhat straightforward and can be done with the existing IR. However, I need a way to link the landing pads from the parent function to the outlined handlers. I propose doing this by extending the syntax of the landing pad instruction to allow the address of an outlined handler to be attached to catch and cleanup clauses.

The current syntax for landingpad is:

= landingpad personality <pers_fn> +

= landingpad personality <pers_fn> cleanup *

:= catch

:= filter

I’d like to change that to:

= landingpad personality <pers_fn> +

= landingpad personality <pers_fn> cleanup [at handler] *

:= catch [at handler]

:= filter

Outlined handlers will reference frame variables from the parent function using the llvm.frameallocate and llvm.framerecover functions. Any frame variable which must be referenced from a catch or cleanup handler will be moved into a block allocated by llvm.frameallocate. When the handlers are called, the parent function’s frame pointer is passed as the second argument to the call. The handlers will use this frame pointer to find the frame allocation block from the parent function. The frame allocation block will also contain space for an exception state variable and an exception object pointer. These values are maintained by the runtime library.

Current LLVM landing blocks use calls to __Cxa_begin_catch to get a pointer to the object associated with the exception. This function is provided by the libc++abi library and is specific to the personality function being used. I would like to introduce a new intrinsic (llvm.eh.being.catch) which accomplishes the same result in a personality-function independent way. For consistency, I also propose introducing llvm.eh.end.catch to replace calls to __cxa_end_catch.

I am attaching several examples showing the outlining transformation I am proposing. Note that for simplicity I’ve used Linux type information in these examples, but the final implementation will need to use Microsoft-style RTTI. I believe clang already has support for that.

The ‘simple.ll’ example shows a function with a single catch-all handler. The ‘catch-type.ll’ example shows a function which catches a specific type of exception. The ‘min-unwind.ll’ example shows a function which has no exception handlers but which requires an unwind handler. The ‘nested.ll’ example shows a function which has nested try blocks.

The nested example illustrates the challenge mentioned above with regard to inter-mingled handlers. I think I know how I will accomplish the outlining shown in that example and generate the state tables needed by the __CxxFrameHandler3 personality function, but I’m going to skip discussion of the details for now.

However, I do want to at least open discussion of the problem of EH state handling. The native Windows C++ exception handling essentially needs an EH state assigned to each basic block. I have an idea for how I might be able to infer the EH states based on the targets of invoke instructions. I think I can make this work in a way that will produce correct results for synchronous C++ exception handling. However, I don’t think I can get it to map exactly to the actual C++ scopes in the original source code. For this reason, assuming we would like to support asynchronous C++ exception handling at some future time, I think it may be preferable to have the EH states embedded by the front end, possibly as metadata. I haven’t thought through all of the possible problems here, and I am open to suggestions.

catch-type.ll (3.14 KB)

min-unwind.ll (3.79 KB)

nested.ll (12.3 KB)

simple.ll (2.06 KB)

First, thanks for the help in this area. :slight_smile:

I agree there’s a big conceptual mismatch between the table-driven EH strategy MSVC uses control-driven landing pad strategy used in the Itanium C++ EH document. We need some way to represent these tables in IR, or we ask the backend for too much.

I think your IR extension is insufficient. I’m looking at the ‘nested.ll’ case, and I think I see a problem:

void test() {
try {
Outer outer;
try {
Inner inner;
do_inner_thing();
} catch (int) {
handle_int();
}
} catch (float) {
handle_float();

}
keep_going();
}

This is the landing pad for the do_inner_thing() invoke:

lpad3: ; preds = %invoke.cont2
%8 = landingpad { i8*, i32 } personality i8* bitcast (i32 (…)* @__CxxFrameHandler3 to i8*)
cleanup at @_Z4testv.unwind.1 ; ~Inner
catch i8* bitcast (i8** @_ZTIi to i8*) at @_Z4testv.catch.1 ; handle_int()
catch i8* bitcast (i8** @_ZTIf to i8*) at @_Z4testv.catch.0 ; handle_float()

The @_Z4testv.unwind.1 helper just calls ~Inner(), but not ~Outer.

I think we’d need something more complex like this:

lpad3: ; preds = %invoke.cont2
%8 = landingpad { i8*, i32 } personality i8* bitcast (i32 (…)* @__CxxFrameHandler3 to i8*)
cleanup at @_Z4testv.unwind.1 ; ~Inner
catch i8* bitcast (i8** @_ZTIi to i8*) at @_Z4testv.catch.1 ; handle_int(), returns to code before non-exceptional call to ~Outer
cleanup at @_Z4testv.unwind.0 ; ~Outer

catch i8* bitcast (i8** @_ZTIf to i8*) at @_Z4testv.catch.0 ; handle_float()

Maybe we could make these changes, but once we do that it’s not clear to me that we should be using the same landing pad instruction anymore.

Even if we did make these changes, we’ll have landing pads like:

invoke @do_inner_thing() … unwind label %lpad1
invoke @~Inner() … unwind label %lpad2

invoke @~Outer() … unwind label %lpad3

lpad1:
landingpad …
cleanup at ~Inner
catch @“typeinfo for int” at @handle_int
cleanup at ~Outer

catch @“typeinfo for float” at @handle_float

lpad2:
landingpad …
catch @“typeinfo for int” at @handle_int
cleanup at ~Outer

catch @“typeinfo for float” at @handle_float

lpad3:
landingpad …
catch @“typeinfo for float” at @handle_float

This is still pretty far away from the tables that _CxxFrameHandler3 wants.

I’m starting to think that what we really want to do is build an internal LLVM global constant representing the action table at preparation time. I don’t fully understand the .xdata that MSVC emits for _CxxFrameHandler3 yet, but I suspect that 90% of it can be prepared ahead of time without using PC values. All landing pads can be assigned a state number, and we can insert a call to an intrinsic (@llvm.eh.state(i32 %n) maybe?) in each landingpad. After that, we can truncate the block with unreachable and prune away any unreachable blocks. Now all the backend has to do is fill in a table mapping PC to state.

What do you think? One downside is that ConstantStructs are kind of expensive.

Hi Reid,

Thanks for the input.

You wrote:

The @_Z4testv.unwind.1 helper just calls ~Inner(), but not ~Outer.

That’s actually intentional. The thing to keep in mind is that all of the landing pads are going to be effectively removed by the time the final object image is generated. They are just there to facilitate the table generation, and in the __CxxFrameHandler3 case they don’t mean quite the same thing that they mean elsewhere. (Maybe that’s an argument for using a different construct.)

There’s also a good chance that I’m being inconsistent in how I chose to represent catch clauses versus cleanup clauses.

In the case you refer to, if an exception thrown by do_inner_thing() is caught by the int exception handler at lpad3, then we only want ~Iinner() to be called. If it is caught instead by the float handler, that will result in a different state transition and the call to ~Outer() will be made by the lpad1 cleanup.

Having actually written that explanation now, I see the confusion inherent in my original proposal. I think it will help if I explain the handling of that case a bit more in terms of the .xdata tables that will eventually need to be generated, and maybe then we can converge on a reasonable solution.

First, we need to assign EH states to all of the code. Ideally that would end up looking something like this:

void test() {

// eh_state = -1

try {

// eh_state = 0

Outer outer;

// eh_state = 1

try {

// eh_state = 2

Inner inner;

// eh_state = 3

do_inner_thing();

// eh_state = 2 (because we’re going to destruct inner here)

} catch (int) {

// eh_state = 4

handle_int();

}

// eh_state = 0 (because we’re going to destruct outer here)

} catch (float) {

// eh_state = 5

handle_float();

}

// eh_state = -1;

keep_going();

}

Basically, the EH state needs to change any time we enter a new scope or construct a new object that needs to be destructed and catch handlers get their own state. Then things get peeled away as the conditions that created the state go away.

I don’t think I have enough information in my proposed extension to distinguish between eh_state=0 and eh_state=-1, but I think that’s OK. I’m going to proceeded with my explanation based on the values above since that’s the ideal case.

Now for the .xdata, we need:

  1. A table that maps EH states to unwind handlers.

  2. A table that maps catch handlers to the EH states they cover.

  3. A table that maps IP addresses to EH states.

I think the third table is self-evident from my annotated code above.

The unwind table will look something like this:

State → Handler → Next State

For the sake of putting a more-or-less concrete alternative on the table, here’s a possible way that we could structure the IR generated by the front end that would allow us to create the necessary .xdata tables and outline the handlers. I haven’t thought this through as thoroughly as I did my previous proposal, but I think it is basically sound and it only requires a bunch of new intrinsics. Something additional would be needed to represent the form of the .xdata after outlining, but that’s an easy detail to work out if this seems like a preferable approach.

void test()

{

try {

Outer outer;

try {

Inner inner;

do_inner_thing();

} catch (int) {

handle_int();

}

} catch (float) {

handle_float();

}

keep_going();

}

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; IR to be generated by clang

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Function Attrs: uwtable

define void @_Z4testv() #0 {

entry:

%outer = alloca %class.Outer, align 1

%inner = alloca %class.Inner, align 1

call void @llvm.eh.setehstate(i32 0)

call void @_ZN5OuterC1Ev(%class.Outer* %outer)

call void @llvm.eh.setehstate(i32 1)

call void @llvm.eh.setehstate(i32 2)

call void @_ZN5InnerC1Ev(%class.Inner* %inner)

call void @llvm.eh.setehstate(i32 3)

call void @_Z14do_inner_thingv()

call void @llvm.eh.setehstate(i32 2)

call void @_ZN5InnerD1Ev(%class.Inner* %inner)

br label %catch.2.dest

catch.1.dest:

call void @llvm.eh.setehstate(i32 1)

call void @_ZN5OuterD1Ev(%class.Outer* %outer)

br label %catch.0.dest

catch.2.dest:

call void @llvm.eh.setehstate(i32 0)

call void @llvm.eh.setehstate(i32 -1)

call void @_Z10keep_goingv()

%2 = call i1 @llvm.eh.exceptionstate.placeholder()

br i1 %2, label %eh.normal, label %unwind.handlers

eh.normal:

ret void

unwind.handlers:

%unwind.state = call i32 @llvm.eh.getunwindstate()

br label %unwind.dispatch

unwind.dispatch:

%4 = icmp eq i32 %unwind.state, 1

br i1 %4, label %unwind.handler.1, label %unwind.dispatch.1

unwind.handler.1:

call void @_ZN5OuterD1Ev(%class.Outer* %outer)

unreached ; This is an arbitrary terminator to help with outlining.

unwind.dispatch:

%5 = icmp eq i32 %unwind.state, 3

br i1 %5, label %unwind.handler.3, label %catch.handlers

unwind.handler.3:

call void @_ZN5InnerD1Ev(%class.Inner* %inner)

unreached ; This is an arbitrary terminator to help with outlining.

catch.handlers:

%catch.state = call i32 @llvm.eh.getcatchstate()

br label %catch.dispatch

catch.dispatch.1:

%6 = icmp sge i32 %catch.state, 2

br i1 %6, label %catch.1.lower.true, label %catch.dispatch.2

catch.1.lower.true

%7 = icmp sle i32 %catch.state, 3

br i1 %7, label %catch.1.state.matches, label %catch.dispatch.2

catch.1.state.matches:

%8 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*))

%9 = call i32 @llvm.eh.getcatchselector()

%10 = icmp eq %8, %9

br i1 %10, label %catch.1.handler, label %catch.dispatch.2

catch.1.handler:

call void @llvm.eh.setehstate(i32 4)

call void @_Z10handle_intv()

br catch.1.dest

catch.dispatch.2:

%12 = icmp sge i32 %catch.state, 0

br i1 %12, label %catch.2.lower.true, label %catch.nomatch

catch.2.lower.true

%13 = icmp sle i32 %catch.state, 4

br i1 %13, label %catch.2.state.matches, label %catch.nomatch

catch.2.state.matches:

%14 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIf to i8*))

%15 = call i32 @llvm.eh.getcatchselector()

%16 = icmp eq %14, %15

br i1 %16, label %catch.2.handler, label %catch.nonmatch

catch.2.handler:

call void @llvm.eh.setehstate(i32 5)

call void @_Z12handle_floatv()

br catch.2.dest

catch.nomatch:

unreached

}

I was thinking about this last night, and I came up with a third alternative which I think looks very promising. It’s basically a re-working of the previous alternative to use the landingpad concept rather than arbitrary fake logic, but it uses a single landing pad for the entire function that mimics the logic of the personality function to dispatch unwinding calls and catch handlers.

I believe this is consistent with the semantics and spirit of the existing landingpad mechanism, and it still has the properties that allow the required .xdata information to be easily extracted from the IR. It will require a new representation after the handlers have been outlined, but I have an idea for that which I’ll send out later today.

If we go this way, the inlining code will need to be taught to merge the landingpad of the inlined function, but I think that will be pretty easy.

So, here it is:

void test()

{

try {

Outer outer;

try {

Inner inner;

do_inner_thing();

} catch (int) {

handle_int();

}

} catch (float) {

handle_float();

}

keep_going();

}

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Original

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

; Function Attrs: uwtable

define void @_Z4testv() #0 {

entry:

%outer = alloca %class.Outer, align 1

%inner = alloca %class.Inner, align 1

call void @llvm.eh.setehstate(i32 0)

invoke void @_ZN5OuterC1Ev(%class.Outer* %outer)

to label %invoke.cont unwind label %lpad

invoke.cont:

call void @llvm.eh.setehstate(i32 1)

call void @llvm.eh.setehstate(i32 2)

invoke void @_ZN5InnerC1Ev(%class.Inner* %inner)

to label %invoke.cont1 unwind label %lpad

invoke.cont.1:

call void @llvm.eh.setehstate(i32 3)

invoke void @_Z14do_inner_thingv()

to label %invoke.cont2 unwind label %lpad

invoke.cont2:

call void @llvm.eh.setehstate(i32 2)

invoke void @_ZN5InnerD1Ev(%class.Inner* %inner)

to label %invoke.cont3 unwind label %lpad

invoke.cont3:

call void @llvm.eh.setehstate(i32 1)

invoke void @_ZN5OuterD1Ev(%class.Outer* %outer)

to label %invoke.cont4 unwind label %lpad

invoke.cont4:

call void @llvm.eh.setehstate(i32 0)

call void @llvm.eh.setehstate(i32 -1)

call void @_Z10keep_goingv()

ret void

lpad:

%eh.vals = landingpad { i8*, i32 } personality i8* bitcast (i32 (…)* @__CxxFrameHandler3 to i8*)

cleanup

catch i8* bitcast (i8** @_ZTIi to i8*)

catch i8* bitcast (i8** @_ZTIf to i8*)

%eh.ptrs = extractvalue { i8*, i32 } %eh.vals, 0

%eh.sel = extractvalue { i8*, i32 } %eh.vals, 1

br label %unwind.handlers

unwind.handlers:

%unwind.state = call i32 @llvm.eh.getunwindstate(%eh.ptrs)

br label %unwind.dispatch

unwind.dispatch:

%4 = icmp eq i32 %unwind.state, i32 1

br i1 %4, label %unwind.handler.1, label %unwind.dispatch.1

unwind.handler.1:

call void @_ZN5OuterD1Ev(%class.Outer* %outer)

resume { i8*, i32 } %eh.vals

unwind.dispatch:

%5 = icmp eq i32 %unwind.state, i32 3

br i1 %5, label %unwind.handler.3, label %catch.handlers

unwind.handler.3:

call void @_ZN5InnerD1Ev(%class.Inner* %inner)

resume { i8*, i32 } %eh.vals

catch.handlers:

%catch.state = call i32 @llvm.eh.getcatchstate(i8* %eh.ptrs)

br label %catch.dispatch

catch.dispatch.1:

%6 = icmp sge i32 %catch.state, i32 2

br i1 %6, label %catch.1.lower.true, label %catch.dispatch.2

catch.1.lower.true

%7 = icmp sle i32 %catch.state, i32 3

br i1 %7, label %catch.1.state.matches, label %catch.dispatch.2

catch.1.state.matches:

%sel.1 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIi to i8*))

%matches.1 = icmp eq i32 %sel.1, i32 %eh.sel

br i1 %matches.1, label %catch.1.handler, label %catch.dispatch.2

catch.1.handler:

call void @llvm.eh.setehstate(i32 4)

call void @_Z10handle_intv()

br label %invoke.cont3

catch.dispatch.2:

%8 = icmp sge i32 %catch.state, i32 0

br i1 %8, label %catch.2.lower.true, label %catch.nomatch

catch.2.lower.true

%9 = icmp sle i32 %catch.state, i32 4

br i1 %9, label %catch.2.state.matches, label %catch.nomatch

catch.2.state.matches:

%sel.2 = call i32 @llvm.eh.typeid.for(i8* bitcast (i8** @_ZTIf to i8*))

%matches.2 = icmp eq i32 %sel.2, i32 %eh.sel

br i1 %matches.2, label %catch.2.handler, label %catch.nonmatch

catch.2.handler:

call void @llvm.eh.setehstate(i32 5)

call void @_Z12handle_floatv()

br label %invoke.cont4

catch.nomatch:

resume { i8*, i32 } %eh.vals

}

My original reply got stuck in llvmdev moderation (it hit the 100K limit!), so I’m resending without reply context.

Thanks, Reid. These are good points.

So I guess that does take us back to something more like my original proposal.

I like your suggestion of having some kind of “eh.actions” intrinsic to represent the outlining rather than the extension to landingpad that I had proposed. I was just working on something like that in conjunction with my second alternative idea.

What I’d really like is to have the “eh.actions” intrinsic take a shape that makes it really easy to construct the .xdata table directly from these calls. As I think I mentioned in my original post, I think I have any idea for how to reconstruct functionally correct eh states (at least for synchronous EH purposes) from the invoke and landingpad instructions. I would like to continue, as in my original proposal, limiting the unwind representations to those that are unique to a given landing pad. I think with enough documentation I can make that seem sensible.

I’ll start working on a revised proposal. Let me know if you have any more solid ideas.

-Andy

Thanks, Reid. These are good points.

So I guess that does take us back to something more like my original
proposal.

I like your suggestion of having some kind of “eh.actions” intrinsic to
represent the outlining rather than the extension to landingpad that I had
proposed. I was just working on something like that in conjunction with my
second alternative idea.

Great!

What I’d really like is to have the “eh.actions” intrinsic take a shape
that makes it really easy to construct the .xdata table directly from these
calls. As I think I mentioned in my original post, I think I have any idea
for how to reconstruct functionally correct eh states (at least for
synchronous EH purposes) from the invoke and landingpad instructions. I
would like to continue, as in my original proposal, limiting the unwind
representations to those that are unique to a given landing pad. I think
with enough documentation I can make that seem sensible.

My thinking is that the "eh.actions" list can be transformed into a compact
xdata table later, after we've done machine basic block layout.

I think the algorithm will be something like

1. Input: already laid out MachineFunction
2. Call EHStreamer::computeCallSiteTable to populate a LandingPadInfo
vector sorted by ascending PC values
4. Iterate the LandingPadInfos, comparing the action list of each landing
pad with the previous landing pad, assuming an empty action list at
function start and end.
5. Model the action list as a stack, and compute the common suffix of the
landing pad action lists
6. Each new action pushed represents a new EH state number
7. Pushing a cleanup action adds a state table entry that transitions from
the current EH state to the previous with a cleanup handler
8. Pushing a catch action adds a new unwind table entry with an open range
from the current EH state to an unknown EH state. The state after catching
is... ???
9. Popping a catch action closes an open unwind table range

So, I think the action list is at least not totally crazy. =)

I’ll start working on a revised proposal. Let me know if you have any more

Hi Reid,

I’ve worked through a few scenarios, and I think this is converging. I’m attaching a new example, which extends the one we’ve been working through to handle a few more cases.

I wasn’t sure what you intended the first i32 argument in an llvm.eh.actions entry to be. I’ve been using it as a place to store the eh state that I’m associating with the action, but that’s kind of circular because I’m using the information in these tables to calculate that value. I’ll be able to do this calculation in the MSVCEHPrepare pass before these instructions are added, so I can put it there (it’s nice for readability), but the later pass that generates the tables (assuming we leave that in a later pass) will need information not represented in the action table (at least as I’ve been using it). I’m not 100% sure that this is necessary, but it seemed like the way things ought to be. I’ll experiment more before anything gets set in stone.

In addition to the IP-to-state table that we’ve talked about, we also need an unwind table (which has each state, the state it transitions to when it expires and the unwind handler, if any) and a catch handler table (which has the range of states that can throw to the handlers, the state of the handlers and a map of types handled to their handlers). I’ve got examples of these in the attached example.

So I now have a firm plan for how to compute these tables from the outlined IR.

I think the algorithm you proposed for computing eh states doesn’t quite work. In particular, if multiple catch handlers get added by the same landing pad we’ll want to start by assuming that they have the same state. If they end up getting popped at different times then we’ll need to update the eh state of the one that gets popped first. Unfortunately my example doesn’t cover this case, but I worked through it and my new algorithm (based on your but slightly tweaked) works for that case. Also, unwind handlers get discrete states (they happen when a transition crosses the state, but in the .xdata tables they are represented with a single state). Catch handlers, on the other hand, do get a range.

Anyway, here’s what I’m doing (by hand so far, I don’t have it coded yet).

  1. Start with an empty stack of actions and an empty master list of eh state.

  2. Visit each landing pad in succession, processing the actions at that landing pad back to front.

  3. Pop actions from the current stack that aren’t in the current landing pad’s action table

a. If a catch is popped that had been assumed to have the same state as a catch that isn’t being popped

its state number and all state numbers above it need to be incremented

b. When a catch is popped, the next available eh_state is assigned to its handler

  1. As an action is pushed to the current stack, it is assigned an eh_state in the master list

a. If the action was an unwind or if it was a cacth after an unwind, the next available eh_state is incremented

b. If the action was a catch following a catch that was also just added, it gets the same eh_state as the previous catch

  1. When all landing pads have been handled, the remaining actions are popped and processed as above.

The “next” state for each eh_state can also be computed during the above process by observing the state of the action on the top of the current action stack when the action associated with the state is popped. In the case of catch handlers, I think the next state will always be the same as the next state of the corresponding catch action.

So, I think it makes sense to compute the unwind and catch tables during the MSVCEHPrepare pass, but I wasn’t sure how best to preserve the information once it was computed. Is it reasonable to stick this in metadata?

We can keep fine tuning this if you like, but I think it’s looking solid enough that I’m going to start revising my outlining patch to produce the results in the attached example.

-Andy

nested-2.ll (19.1 KB)

Nice. This seems like a good starting design for some patches.

I’m not sure exactly how step 3.a works. Does it stem from step 4.b which assigns the same eh_state to two catch handlers? It seems like these rules would come into play in these examples:

// ‘catch int’ and ‘catch float’ have the same incoming eh_state range
try { f(); }
catch (int) { g(); }
catch (float) { h(); }

// ‘catch float’ covers more than ‘catch int’, has different eh_state range, we need to do step 3.a when we reach ‘invoke void @g()’
try {
try { f(); }
catch (int) { g(); }
} catch (float) { h(); }

Is that the idea?

So, I think it makes sense to compute the unwind and catch tables during the MSVCEHPrepare pass, but I wasn’t sure how best to preserve the information once it was computed. Is it reasonable to stick this in metadata?

I guess we can do the EH state assignment in the preparation pass, so long as it’s resilient to basic block reordering. A good source program to play with would be:

extern int global;
void g();
void h();

void i();

void f() {
g();
try {
h();
if (__builtin_unlikely(global))
i();
h();
} catch (int) {
}
g();
}

The LLVM IR will preserve source order, but block placement will see the branch hint and rearrange to:

callq g
callq h

testl global(%rip)
jne .L1
.L2:
callq h

callq g
.L1:
callq i
jmp .L2

Perhaps we can assign landing pads EH states during preparation by placing a call to something like @llvm.eh.state there, or making it part of the @llvm.eh.actions call? Then the ip2state table will assign the .L1 block the same state as the range over the h() calls.

I'm not sure exactly how step 3.a works. Does it stem from step 4.b which assigns the same eh_state to two catch handlers? It seems like these rules would come into play in these examples:

Is that the idea?

Yes. That’s the case I’m trying to handle with the 3a fixup. In the final implementation I hope to come up with something cleaner (maybe looking ahead when the state is assigned in the first place), but the result will be the same.

I guess we can do the EH state assignment in the preparation pass, so long as it's resilient to basic block reordering.

I think the EH state assignment (that is, the actual numbering) depends on control flow rather than physical block ordering, so I expect that the prepare pass will be the right place to do it. Then, as you say, the IP to State table will be constructed later.

I guess we can work out the details while reviewing patches, but I’m happy to use some combination of intrinsics as you suggest.

Hi,

Sorry if I’m late to the party. I’m curious whether you need to be concerned about code being moved into the landing pad area with this approach (i.e. whether “real code” might wind up in the eh.actions call’s block which [if I’m following correctly] won’t actually get executed at runtime).

For example, given something like

int foo (int a, int b) {

int x;

try {

x = a + b;

maybe_throw();

return 0;

} catch (int) {

return x;

} catch (float) {

return x + 1;

}

}

Do you need to worry that something like partial deadcode elimination could want to move the definition of x down into the landing pad code (but not all the way down into the catch handlers)?

Or conversely, given something like

int foo(int a, int b) {

try {

maybe_throw();

return 0;

} catch (int) {

return a + b;

} catch (float) {

return (a + b) + 1;

}

}

Do you need to worry that something like very busy expressions could want to hoist the “a + b” computation up into the landing pad code (but not all the way up into the try block)?

Should those sorts of code motion be legal? If not, what’s preventing them, and if so, how will the moved code be executed at runtime?

Thanks

-Joseph

These are exactly the sorts of code transformations we want to allow by delaying the outlining until later. By keeping such code inlined in the parent function until after optimization, we enable a lot of core optimizations like SROA. For example, we should be able to completely eliminate wrappers like unique_ptr that would otherwise stay around due to the pointer escaped to the destructor call that gets executed on exception.

It’s an interesting problem though. If an instruction is in the landing pad block but not inside a begincatch/endcatch pair it will be interpreted as cleanup code. I think that is OK, but it’s something we’ll need to be aware of.

For reference, Joseph’s first scenario will look like this:

; Function Attrs: uwtable

define i32 @_Z3fooii(i32 %a, i32 %b) #0 {

entry:

<…snip…>

store i32 %a, i32* %a.addr, align 4

store i32 %b, i32* %b.addr, align 4

invoke void @_Z11maybe_throwv()

to label %invoke.cont unwind label %lpad

lpad: ; preds = %entry

%4 = landingpad { i8*, i32 } personality i8* bitcast (i32 (…)* @__CxxFrameHandler3 to i8*)

catch i8* bitcast (i8** @_ZTIi to i8*)

catch i8* bitcast (i8** @_ZTIf to i8*)

%5 = extractvalue { i8*, i32 } %4, 0

store i8* %5, i8** %exn.slot

%6 = extractvalue { i8*, i32 } %4, 1

store i32 %6, i32* %ehselector.slot

%2 = load i32* %a.addr, align 4

%3 = load i32* %b.addr, align 4

%add = add nsw i32 %2, %3

store i32 %add, i32* %x, align 4

br label %catch.dispatch

<…snip…>

}

So in this scenario, even though the landingpad doesn’t have a cleanup clause, we’ll need to add one (better, the pass the sunk the add operation should have added one).

But assuming we do move the add instructions into a cleanup handler, this should just work.

-Andy

Ah, ok. So if the outliner sees non-dispatch code in the landing pad area, it can find/create somewhere to put it and an appropriate eh.actions annotation to get an EH table generated that will ensure it gets executed appropriately at run-time (in this example, perform the add before invoking either handler); is that more or less the idea? That makes sense, thanks.

I have the same question about the post-outlining IR. To change the example to one where the bait won’t get outlined, suppose you had

int foo(int a, int b) {

try {

try {

maybe_throw();

return 0;

} catch (int) {

// some code here that gets outlined

}

L1:

return a + b;

} catch (float) {

// some other code here that also gets outlined

}

L2:

return (a + b) + 1;

}

and suppose that nothing gets moved around before outlining. Then, after outlining, the landingpad will be followed by an eh.actions call and then an indirect branch that targets L1 and L2, correct?

Do we need to worry that a late codesize optimization might want to merge the adds by hoisting them up above the indirect branch? If that happened, wouldn’t it get skipped over if an exception were raised?

Thanks

-Joseph

Ah, ok. So if the outliner sees non-dispatch code in the landing pad
area, it can find/create somewhere to put it and an appropriate eh.actions
annotation to get an EH table generated that will ensure it gets executed
appropriately at run-time (in this example, perform the add before invoking
either handler); is that more or less the idea? That makes sense, thanks.

Yep. In the worst case, we could model code before landing pad dispatch as
a cleanup handler, but I think the most common transforms are easily
undone. Consider your example where a + b gets hoisted before the catch
dispatch. Adds have no side effects, so we can freely sink them back down
into the catch handler once we start outlining.

Things that are hard to move, like loads and stores to unknown memory
locations, cannot be hoisted over the llvm.eh.begincatch() call in the
first place. It should act as a memory barrier.

I have the same question about the post-outlining IR. To change the
example to one where the bait won't get outlined, suppose you had

int foo(int a, int b) {

  try {

    try {

      maybe_throw();

      return 0;

    } catch (int) {

      // some code here that gets outlined

    }

    L1:

    return a + b;

  } catch (float) {

    // some other code here that also gets outlined

  }

  L2:

  return (a + b) + 1;

}

and suppose that nothing gets moved around before outlining. Then, after
outlining, the landingpad will be followed by an eh.actions call and then
an indirect branch that targets L1 and L2, correct?

Do we need to worry that a late codesize optimization might want to merge
the adds by hoisting them up above the indirect branch? If that happened,
wouldn't it get skipped over if an exception were raised?

We'd have to hoist a + b to somewhere that dominates L1 and L2. I think the
only BB in your program that dominates is the entry block. In the IR, the
fake 'indirectbr' instruction after the call to @llvm.eh.actions helps keep
the CFG conservatively correct.

We’d have to hoist a + b to somewhere that dominates L1 and L2. I think the only BB in your program that dominates is the entry block

I don’t follow. What path do you see from entry to either L1 or L2 that doesn’t pass through the indirectbr? In order to reach either L1 or L2, the call to maybe_throw() must raise an exception (else we’d return 0 from foo), the exception must be caught by one of the two handlers (else we’d unwind out of foo), and one of the outlined handlers must have executed and returned. Don’t those conditions correspond to the path from entry to the indirectbr?

To clarify, I’m not trying to assert there’s a problem; I’m new to LLVM and trying to understand the model. I’ve worked before on a compiler that similarly used stand-in code to model control flow effected by the runtime/unwinder, and we had these issues with code motion. Our system was closed enough that we could just have the affected optimizations check for the relevant opcodes (we were doing the equivalent of using a special exitlandingpad terminator instead of indirectbr) and avoid pushing code across them; I don’t know if a similar approach is appropriate in LLVM, or if there is (or should be) a way to annotate the block to indicate that it’s one of these cases and new code inserted there would get skipped over at run-time, or if it’s ok to just assume that those sort of optimizations don’t run after EH Preparation, or what. Even if I am misunderstanding the shape of the flow graph in my example as you suggest, is there no cause for concern that some post-outlining pass might try to insert code into that block? Like maybe some sort of late profile instrumentation? It’s great if there’s not; I’m just trying to understand why not.

Thanks

-Joseph

You're right, we would hoist the add to someplace before the notional
indirectbr, but that needs to work anyway. After looking more closely at
the IR, we would probably hoist into the landingpad instead of into the
entry block. Here's the IR I have in my head after optimization and
hoisting but before outlining:

define i32 @foo(i32 %a, i32 %b) {
entry:
  invoke void @maybe_throw()
      to label %ret unwind label %lpad
ret:
  ret i32 0 ; The return case

lpad:
  %ehvals = landingpad { i8*, i32 } personality ...
      catch ... ; typeinfo for int
      catch ... ; typeinfo for float
  %sum = add i32 %a, %b ; hoisted out from catch_int and catch_float via
CSE or something
  ... ; dispatch on the selector, effectively doing a switch

catch_int:
  ret i32 %sum

catch_float:
  %s1 = add i32 %sum, 1
  ret i32 %s1

... ; resume
}

We have a couple options during EH preparation:
1. Outline the add into a cleanup, store the result into the
frameallocation, and access it in the catch handler
2. Sink the add back into the handlers, which we can do because it has no
side effects (better)
3. Sink the add past the catch handlers and past the notional indirectbr
into the blocks referenced by the handler return result

Number 3 is probably the best, and it would look like:

define i8* @int_handler(i8*, i8*) {
  ret i8* basicblockaddr(@func, %catch_int)
}
define i8* @float_handler(i8*, i8*) {
  ret i8* basicblockaddr(@func, %catch_float)
}
define i32 @foo(i32 %a, i32 %b) {
entry:
  invoke void @maybe_throw()
      to label %ret unwind label %lpad
ret:
  ret i32 0 ; The return case

lpad:
  %ehvals = landingpad { i8*, i32 } personality ...
      catch ... ; typeinfo for int
      catch ... ; typeinfo for float
  %rejoin_at = call i8* @llvm.eh.actions(... @int_handler, ...
@float_handler, ...)
  indirectbr i8* %rejoin_at, [ label %catch_int, label %catch_float ]

catch_int:
  %s1 = add i32 %a, %b
  ret i32 %s1

catch_float:
  %s2 = add i32 %a, %b
  %s3 = add i32 %s2, 1
  ret i32 %s3
}

But even if we aren't smart enough to do #3, which could implement #1 by
additional outlining. It's ugly though. =/

In practice, I don't think this situation will occur very often because the
catch blocks look like this:

catch_int:
  call void @llvm.eh.begincatch(i8* %ehptr)
  ... ; user code
  call void @llvm.eh.endcatch()
  ... ; rejoin normal control

Anything we can hoist out of "user code" here can usually be sunk back in.
If it's not trivial like a hoisted store to an unescaped internal global,
then we'll have to emit something that looks like a destructor cleanup.

Yes, it seems like WinEHPrepare should always be able to move landing pad code to somewhere appropriate. Even if there’s some involved code to handle the cases that force #1, it’s great that the complexity is contained in WinEHPrepare.

I’m still confused about the apparent lack of constraints after WinEHPrepare. Can we simply require/assume that WinEHPrepare be run after any passes that may move/insert code into landing pads? Is that documented somewhere?

For a concrete example, consider your snippet showing what the IR would look like after outlining with #3. What’s to stop a later pass from redoing the hoist?

Thanks

-Joseph

Yeah, this is kind of fuzzy. EH preparation just happens very late, after
all the interesting passes (sroa, inlining, etc) have run. There are some
passes that run afterwards, but they are typically lowering passes, and
won't do this kind of hoisting.