Inlining and exception handling in LLVM and GCC

The poor interaction between exception handling and inlining in LLVM is one of
the main motivations for the new exception handling models proposed recently.
Here I give my analysis of the origin of the problem in the hope of clarifying
the situation.

Soon after dwarf exception handling was implemented in LLVM, I noticed that some
programs would fail when compiled at -O3, for example the following:

#include <stdio.h>

struct X { ~X() { printf("Running destructor!\n"); } };

void foo() {
   struct X x;
   throw 1;
}

int main() {
   try {
     foo();
   } catch (int) {
     printf("Caught exception!\n");
   }
   return 0;
}

When the exception is thrown in foo, first the destructor for "x" should be run
outputting "Running destructor!", then the exception should be caught in main,
and "Caught exception!" should be output from the handler. This worked fine
up to -O2, but at -O3 instead the output was:
   terminate called after throwing an instance of 'int'
   Aborted
The difference between -O2 and -O3 was due to inlining: at -O3 foo was being
inlined into main, and for some reason this was causing the C++ runtime to
terminate the program. To explain why, let me first describe how the GCC
inliner deals with exception handling.

What GCC does

The poor interaction between exception handling and inlining in LLVM is one of
the main motivations for the new exception handling models proposed recently.
Here I give my analysis of the origin of the problem in the hope of clarifying
the situation.

Your analysis coincides with the analysis I made when implementing EH
in clang. I decided, however, that in the short term, given the brokenness
of the IR, I would rather ignore the known bad consequences of emitting
cleanups as catch-alls than deal with the uncertainties of whether codegen
could consistently clean up after our broken inliner. I haven’t yet regretted
this choice.

Why does the inliner work the way it does? The logic behind it is plausible
but wrong, and goes something like this. An invoke instruction does not let
exceptions unwind through it (instead control branches to the landing pad).
So when inlining it into another function, there is no need to do anything
with it: it is not a throwing instruction. This logic assumes that inlining
does not affect whether an exception is unwound or not. It is true that if
an exception is unwound then the code produced by the LLVM inliner will
function correctly. But what if it is not unwound in the first place because
whoever is doing the unwinding looks up the call stack and bases its decision
as to whether to unwind (or potentially even what type of exception it unwinds)
on what actions it sees there? Then the whole LLVM approach falls apart. And
that’s exactly what happens in the example program.

Right. Another way of thinking about this is that the LLVM inliner generally
assumes that the call itself is not semantically relevant. Unfortunately,
while that’s often true in practice, there are actually two major semantic effects
tied to call boundaries.

The first is that function return implicitly releases stack-allocated memory;
the pathological case here is a recursive function that calls a helper with
huge stack variables, where inlining the helper makes max recursion
depth plummet. Currently the inliner makes some effort to mitigate this
impact, but mostly by sharing allocas between different inlined functions.

The second is that inlining changes the behavior of anything that wants to
manually walk the call stack, and while most stack walkers don’t rely on any
frame-specific semantics, the personality functions called by libunwind do.
And unfortunately, the inliner currently makes no effort at all to compensate
for breaking these call-specific semantics, even though they’re arguably
way more important in practice than the frame-size issue.

John.

Hi John,

Hi Duncan,

Amazing post, thank you very much!

Now I have a clear mental image of the whole problem and can see where
you were coming from to propose such a change.

Bill's recent exception handling proposal has rules which (if I understand
them right) ensure that you can always find the actions for an invoke without
any trouble (the actions would be listed in a dispatch instruction that is
not allowed to be moved out of the landing pad). Thus with this proposal too
there would also be no problem in teaching the inliner to do the right thing.

If I got it right, you're saying that both proposals are similar in
that they both fix the inlining problem and both specify uniquely the
landing pads for every invoke.

So, in view of their equivalence, I think Bill's proposal is better
for two reasons:

  1. It eases the future elimination of invoke, or at least, the
treatment of current instruction-level exception (as in Java) in a
cleaner way.
  2. It reinforces the idea of having one personality function for
each EH table (ie. per function), especially when inlining code from
different paradigms (if that's possible).

However, your proposal is better in two other accounts:

  1. If we do want to support EH tables with multiple personalities in
the future.
  2. It's less invasive and closer to the problem it meant to fix in
the first place. So it'll be faster and easier to do that way.

It's more of a design decision, IMO.

We'd still have problems — cleanups would have to chain with
_Unwind_Resume_or_Rethrow, because most of the platforms we support rely
on correct compiler use of _Unwind_Resume — but it might be worthwhile.

John.

Hi John,

if I changed the inliner so it does its best to find the eh.selector for an
invoke, and propagates the information onto the things it inlines, would clang
make use of it (i.e. output cleanups rather than turning them into catch-alls)?

We'd still have problems — cleanups would have to chain with
_Unwind_Resume_or_Rethrow, because most of the platforms we support rely
on correct compiler use of _Unwind_Resume — but it might be worthwhile.

OK, I'll see if I can come up with something reasonable (probably next week)
that we can at least play around with.

Ciao, Duncan.

Hi Renato,

Bill's recent exception handling proposal has rules which (if I understand
them right) ensure that you can always find the actions for an invoke without
any trouble (the actions would be listed in a dispatch instruction that is
not allowed to be moved out of the landing pad). Thus with this proposal too
there would also be no problem in teaching the inliner to do the right thing.

If I got it right, you're saying that both proposals are similar in
that they both fix the inlining problem and both specify uniquely the
landing pads for every invoke.

yes, they both make the inlining problem fixable because it is always possible
to find the catch info associated with an invoke (or basic block). In fact any
proposal that has this property would do as far as fixing inlining goes.

In the case of Bill's proposal this property holds because of special rules
about what you are allowed to do with landing pads: you are not allowed to move
a dispatch instruction out of a landing pad etc. These rules could actually
be applied to eh.selector without any IR changes providing much the same effect
without a new instruction. However the reason for not doing this is that it
then makes some optimizations impossible, for example when you inline a call
to _Unwind_Resume through an invoke you would like to turn it into a branch to
the landing pad, but this would result in eh.selector calls that fail the rules.
It's not clear to me if the dispatch instruction has this problem too, I need
to ask Bill about it.

In fact Bill's proposal mostly seems to be about replacing the current explicit
sequence of comparisons that llvm-gcc uses to find which handler to run (I guess
everyone who has looked at LLVM IR with exception handling in it has seen all
the calls to eh.typeid.for, the comparisons and the branching that I'm talking
about here) with a more compact representation that is easier to analyse. As
such, from my point of view it doesn't really have much to do with the inlining
issue, which is not to say it doesn't have merit on other grounds.

An interesting point is that Bill's dispatch instruction is quite analogous to
GCC's gimple_eh_dispatch statement. This statement represents the transfer of
control to the appropriate catch handler for an exception region, and is at a
fairly high level of abstraction. It takes an exception handling try region as
an argument (it can also have an "allowed exceptions" region as an argument,
but there's no added value in discussing that). Since the list of associated
catches is stored in the region info, there is no need for it to explicitly
list tries and where to branch to, while Bill is obliged to put the info that
GCC stores in the region directly into his dispatch instruction.

Once GCC has finished running the inliner it runs the lower_eh_dispatch pass,
which removes all gimple_eh_dispatch statements by turning them into explicit
control flow: a sequence of comparisons and branches (exactly like the code
llvm-gcc/clang produce right now, comparing the selector value with the result
of eh.typeid.for calls). Since my proposal doesn't change this aspect of LLVM,
you can consider that before this pass GCC's representation is like Bill's and
after it is like in my proposal.

So, in view of their equivalence, I think Bill's proposal is better
for two reasons:

   1. It eases the future elimination of invoke, or at least, the
treatment of current instruction-level exception (as in Java) in a
cleaner way.

I don't see what is cleaner about it, except that it is overall at a higher
level of abstraction (see above).

   2. It reinforces the idea of having one personality function for
each EH table (ie. per function), especially when inlining code from
different paradigms (if that's possible).

According to Bill's proposal each dispatch instruction can specify a different
personality function, so it's just the same as my proposal in this respect.

However, your proposal is better in two other accounts:

   1. If we do want to support EH tables with multiple personalities in
the future.

As mentioned above, Bill's proposal also supports multiple personalities per
function.

   2. It's less invasive and closer to the problem it meant to fix in
the first place. So it'll be faster and easier to do that way.

It's more of a design decision, IMO.

Exactly, and this brings up the question of what criteria should be used to
decide which proposal is best. Presumably, all else being equal, whichever
one makes it easiest to optimize the IR. I didn't yet think about the
advantages and disadvantages of each proposal in this respect.

Ciao, Duncan.

Hi Duncan,

I see the similarities with GCC and Bill's proposal, makes sense.

1. It eases the future elimination of invoke, or at least, the
treatment of current instruction-level exception (as in Java) in a
cleaner way.

I don't see what is cleaner about it, except that it is overall at a higher
level of abstraction (see above).

If I got it right, his proposal had the unwind path in the basic block
(rather than in the invoke), so any instructions in that basic block
(including simple calls, without the nounwind attribute) would use
that as the landing pad.

That would make invokes obsolete.

2. It reinforces the idea of having one personality function for
each EH table (ie. per function), especially when inlining code from
different paradigms (if that's possible).

According to Bill's proposal each dispatch instruction can specify a
different
personality function, so it's just the same as my proposal in this respect.

I confess that I didn't read all the emails in all threads, so I have
gotten some things wrong. I thought there could be only one dispatch
per function, but now that you mentioned, it doesn't really make
sense.

cheers,
--renato

Hi Renato,

   1. It eases the future elimination of invoke, or at least, the
treatment of current instruction-level exception (as in Java) in a
cleaner way.

I don't see what is cleaner about it, except that it is overall at a higher
level of abstraction (see above).

If I got it right, his proposal had the unwind path in the basic block
(rather than in the invoke), so any instructions in that basic block
(including simple calls, without the nounwind attribute) would use
that as the landing pad.

That would make invokes obsolete.

I think you took my proposal too literally :slight_smile: It's irrelevant to the idea of it
whether catch info is attached to an invoke or to a basic block. I showed it
attached to invokes to make the proposal more concrete. If we are going to
change the IR then we should try to change it in such a way that we don't have
to change it again when we remove the invoke instruction. This means that if I
start coding my proposal then almost certainly I will attach the info to basic
blocks. In fact it makes sense to do so even before we remove the invoke
instruction: in the short term we would still have invoke, but the catch info
for the invoke would be attached to the basic block containing the invoke,
which is equivalent to attaching it to the invoke. One day we will remove
invoke, and the info will already be in the right spot (on the basic block).

Ciao, Duncan.

I see, in that case, it makes more sense to attach all the information
to the callee block rather than force the artificial association of a
dispatch instruction with a particular basic block.

If I'm not yet as lost as I think I am, that's your proposal, right?

Sorry for being so redundant, is that I stopped reading the thread
after a while but I still think it's an important proposal to get it
right this time.

Sorry, I meant "caller" block.

--renato

Hi Renato,

I do not know EH or details of these proposals. But do not forget, basic blocks are merged/split/deleted, instructions are added/removed/moved/copied/replaced.

Indeed. Having the information in the invoke kinda alleviates this
problem, but removing the invoke syntax will bring that back.

One way to merge two basic blocks is to split into several smaller
basic blocks with same properties and then branch to and from that new
sub-block instead of actually moving the instructions inside the
merged block.

I don't know the effects of block fragmentation in LLVM, but that
would definitely increase it. But having EH info in every instruction
is also too much... :confused:

cheers,
--renato

I like the idea of the landing pad being associated with the basic block. It seems to me that the branch to the landing pad should be viewed as occurring at the beginning of the “earliest” block to branch to that landing pad. No assignments that occur in any block that unwinds to a particular landing pad are valid in that landing pad or any subsequent blocks. Other than that, standard SSA rules apply.

With this view, it would be legal to split or combine consecutive blocks linked to the same landing pad. It would also be legal to move instructions around within a control flow of blocks linked to the same block. To go any further would require that a pass make changes to the landing pads to preserve the validity of relevant assignments.

It would only be legal to specify a landing pad block as the target of an unwind edge. If cleanups need to be chained, they cannot be in the landing pad directly, and must instead be branched to from the landing pad. Given this important distinction it may make sense to make landing pad blocks special in the IR to facility detecting misuse (both pragmatically and visually).

The following pseudo-IR tries to demonstrate the validity of assignments:

main:

%a = no_fail_instruction

test: unwind to lpad

%r = possible_failing_comparison %a, 0

br %r == 0, label true_block, label false_block

true_block: unwind to lpad

%c = no_fail_instruction

%d = potentially_failing_instruction %c

br label either_block

false_block: unwind to lpad

%e = no_fail_instruction

%f = potentially_failing_instruction %e

br label either_block

either_block: unwind to lpad

%g = phi label true_block %d, false_block %f

%h = no_fail_instruction %g

%i = potentially_failing_instruction %h

ret %i

lpad: ; only %a is valid in this landing pad
cleanup a%
ret -1

In this sequence, during the normal flow, every thing works as expected. With the landing pad, dominance is calculated differently and only %a is valid as it is the only assignment guaranteed to occur before any block that unwinds to the landing pad.

–Nathan

p.s. I apologize if this was the intended behavior all along, I have been trying to follow this discussion as best as I can.

Hi Nathan,

I like the idea of the landing pad being associated with the basic block. It
seems to me that the branch to the landing pad should be viewed as occurring at
the beginning of the "earliest" block to branch to that landing pad. No
assignments that occur in any block that unwinds to a particular landing pad are
valid in that landing pad or any subsequent blocks. Other than that, standard
SSA rules apply.

yup - dominance rules of this kind were already discussed in a bunch of emails
earlier.

Ciao,

Duncan.

In the case of Bill’s proposal this property holds because of special rules
about what you are allowed to do with landing pads: you are not allowed to move
a dispatch instruction out of a landing pad etc.

I hope I didn’t misstate this, but you can move it out of a landing pad, however the landing pad must dominate the dispatch. For instance, code like this:

lpad: landingpad
call void @a_cleanup_function()
dispatch …

may inline the @a_cleanup_function() call, which may result in the dispatch no longer being in the landing pad.

These rules could actually
be applied to eh.selector without any IR changes providing much the same effect
without a new instruction. However the reason for not doing this is that it
then makes some optimizations impossible, for example when you inline a call
to _Unwind_Resume through an invoke you would like to turn it into a branch to
the landing pad, but this would result in eh.selector calls that fail the rules.
It’s not clear to me if the dispatch instruction has this problem too, I need
to ask Bill about it.

In fact Bill’s proposal mostly seems to be about replacing the current explicit
sequence of comparisons that llvm-gcc uses to find which handler to run (I guess
everyone who has looked at LLVM IR with exception handling in it has seen all
the calls to eh.typeid.for, the comparisons and the branching that I’m talking
about here) with a more compact representation that is easier to analyse. As
such, from my point of view it doesn’t really have much to do with the inlining
issue, which is not to say it doesn’t have merit on other grounds.

The compactness of information and ease of analysis are what I think of as the key advantages of my proposal. Also, it doesn’t rely upon intrinsics, but builds the EH into the IR explicitly.

An interesting point is that Bill’s dispatch instruction is quite analogous to
GCC’s gimple_eh_dispatch statement. This statement represents the transfer of
control to the appropriate catch handler for an exception region, and is at a
fairly high level of abstraction. It takes an exception handling try region as
an argument (it can also have an “allowed exceptions” region as an argument,
but there’s no added value in discussing that). Since the list of associated
catches is stored in the region info, there is no need for it to explicitly
list tries and where to branch to, while Bill is obliged to put the info that
GCC stores in the region directly into his dispatch instruction.

Once GCC has finished running the inliner it runs the lower_eh_dispatch pass,
which removes all gimple_eh_dispatch statements by turning them into explicit
control flow: a sequence of comparisons and branches (exactly like the code
llvm-gcc/clang produce right now, comparing the selector value with the result
of eh.typeid.for calls). Since my proposal doesn’t change this aspect of LLVM,
you can consider that before this pass GCC’s representation is like Bill’s and
after it is like in my proposal.

Interesting. . . How do they handle LTO with exception handling if the gimple_eh_dispatch statements are all ready lowered?

So, in view of their equivalence, I think Bill’s proposal is better

for two reasons:

  1. It eases the future elimination of invoke, or at least, the

treatment of current instruction-level exception (as in Java) in a

cleaner way.

I don’t see what is cleaner about it, except that it is overall at a higher
level of abstraction (see above).

I also think that both of our proposals are flexible enough to handle the future elimination of invokes and support of basic block throws (as you explained in a future email).

  1. It reinforces the idea of having one personality function for

each EH table (ie. per function), especially when inlining code from

different paradigms (if that’s possible).

According to Bill’s proposal each dispatch instruction can specify a different
personality function, so it’s just the same as my proposal in this respect.

Yeah. Multiple personality functions per function is something Chris has a mild interest in. The devil is in the details, of course.

However, your proposal is better in two other accounts:

  1. If we do want to support EH tables with multiple personalities in

the future.

As mentioned above, Bill’s proposal also supports multiple personalities per
function.

  1. It’s less invasive and closer to the problem it meant to fix in

the first place. So it’ll be faster and easier to do that way.

It’s more of a design decision, IMO.

Exactly, and this brings up the question of what criteria should be used to
decide which proposal is best. Presumably, all else being equal, whichever
one makes it easiest to optimize the IR. I didn’t yet think about the
advantages and disadvantages of each proposal in this respect.

I have my own opinions, of course. But I think there are two criteria which should be met:

  1. Firstly, it must work and work well. I.e., it will solve all of our current EH problems – it’s slow, it doesn’t follow any ABI, it’s fragile, it requires expensive transformations before code-gen – and

  2. It must be flexible enough for future changes.

Here are some of the things I would like to have:

  1. Limited use of intrinsics.
    Reasoning: Intrinsics are indistinguishable from calls. And as such are harder to work with than IR instructions.

  2. EH metadata shouldn’t be encoded in the CFG if at all possible.
    Reasoning: The IR is meant for executable instructions. As such, transformations don’t know about or care about any EH metadata there.

  3. The front-ends should be able to emit the EH data easily.
    Reasoning: The front-end guys are just down the hall from me and I sit with my back to the door. :slight_smile:

-bw