RFC: Exception Handling Proposal Revised

This is a revision of the second exception handling proposal I sent out. You can see it here:

  http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-November/036484.html

After much discussion, there are some changes to the proposal – some significant and some minor. One major point, this proposal does not address the issue of catching an exception thrown from a non-invoke instruction. However if done properly, it should form the foundation for that.

General Model

Okay, but how do we find it?

Looks good, though!

John.

• A landing pad must have exactly one dispatch instruction associated with it,
and it must dominate that instruction.

Okay, but how do we find it?

We can modify the dispatch to point back to the landing pad it's associated with. Something like:

lpad: landingpad
  dispatch region label %lpad ...

or similar.

Looks good, though!

Thanks!

-bw

Okay. So passes (like codegen's EH preparation) that want to track what landing pads go to what dispatches will just have to walk the basic blocks and look for blocks terminated in "dispatch"? That seems mostly reasonable.

One problem I foresee is that it's possible for a dispatch block to become unreachable from its landing pad. If that block is then deleted, we'd lose information about what's supposed to unwind there. This could happen if, e.g., someone had a noreturn destructor. In languages that usefully allow throws from EH destructors (i.e. Ada) I can imagine uses for this, and regardless it's well-formed code that shouldn't cause the world to explode.

John.

The way I understood Duncan's description of the Ada model, even an unconditionally throwing Ada constructor isn't exactly noreturn - or rather, every destructor is surrounded by its own try-catch.
Same in D, btw, which also allows throwing destructors. There, a throw with an in-flight exception appends the new exception to the in-flight chain of exceptions and keeps unwinding.

Sebastian

The unwind edge from an invoke instruction jumps to a landing pad. That landing pad contains code which performs optional cleanups, and then determines which catch handler to call (if any). If no catch handlers are applicable, the exception resumes propagation either to the next enclosing region or out of the function.

Hi Bill,

Looking good already! :wink:

ch.int:

shouldn't the catch handlers have "landingpad" attributes?

;; C's d'tor
c.dtor:

and the cleanups?

yikes:
call void @_ZSt9terminatev() noreturn nounwind
unreachable

and yikes?

Can we standardize "yikes" as the official terminate handler name? :smiley:

Well? What do you think? Pretty cool, eh? :slight_smile:

Yup. I think it's much more clear than the current scheme, at least
for C++ Itanium ABI (AFAICS).

Can anyone try an Ada (or any other language) example to see if it
makes sense for them, too?

cheers,
--renato

I'm not sure why this isn't supposed to be equivalent to noreturn; the point is that the normal edge of the function isn't reachable. I mentioned Ada EH just because it's more obviously useful there, not because it's a special case that can only occur there. For example, you could duplicate this with a destructor that runs an infinite loop or calls exit(). The representation needs to be robust against this.

John.

The unwind edge from an invoke instruction jumps to a landing pad. That landing pad contains code which performs optional cleanups, and then determines which catch handler to call (if any). If no catch handlers are applicable, the exception resumes propagation either to the next enclosing region or out of the function.

Hi Bill,

Looking good already! :wink:

Thanks!

ch.int:

shouldn't the catch handlers have "landingpad" attributes?

I don't think they need it.

;; C's d'tor
c.dtor:

and the cleanups?

Nor these. Basically, I want the basic block that's labeled a "landing pad" to be jumped to by only a dispatch resume or unwind edge of invoke. We could do this with the c.dtor and ch.int here, but it would mean inserting useless "cleanup dispatches" that only resume to the block (see onto.catch.handlers for an example).

yikes:
call void @_ZSt9terminatev() noreturn nounwind
unreachable

and yikes?

Can we standardize "yikes" as the official terminate handler name? :smiley:

I wish! :smiley:

Well? What do you think? Pretty cool, eh? :slight_smile:

Yup. I think it's much more clear than the current scheme, at least
for C++ Itanium ABI (AFAICS).

Can anyone try an Ada (or any other language) example to see if it
makes sense for them, too?

*Looks at Duncan*

-bw

Fair enough. Do you think it would be sufficient to refuse to delete a dispatch instruction unless the landing pad is also deleted? (That's just off the top o' my head...)

-bw

That might be annoying to enforce. On the other hand, it might be less annoying than anything we could do with the landingpad attribute.

John.

Can anyone try an Ada (or any other language) example to see if it
makes sense for them, too?

*Looks at Duncan*

I plan to look at this tonight.

Ciao, Duncan.

Hi Bill,

All cleanups come from invoke's unwinds (from foo's call) and all
catch handlers come from the main catch dispatch, so for that rule,
they all should have it. Or that's what I understood... :wink:

Also, now I noticed a few things in your IR, could be just typos...
(ie. I'm not being an ass, just checking I got it right) :wink:

1. Your BB flow seem odd in the try block. You call foo() four times
and then call the Ctors, rather than interpolating them as in the
source code (otherwise, all calls to foo should unwind to the catch
area, not cleanups).

2. In catch.handlers, the resume label is to %onto.catch.handlers, right?

3. In onto.catch.handlers you resume to %lpad. Which basic-block is
this? Shouldn't this be a "resume_unwinding" call of some sort (if Q2
is right)?

4. Your catch handlers return void, shouldn't then branch to %return
instead? In this case, a front-end would probably call that label a
"%finally" and that would branch to %return, or return directly.

Which brings me to my final question: In languages where the finally
HAS to run, like Java, would that be a simple cleanup regions or do we
need something special from catch's dispatch's resume AND %finally
block?

cheers,
--renato

Nor these. Basically, I want the basic block that's labeled a "landing pad" to be jumped to by only a dispatch resume or unwind edge of invoke. We could do this with the c.dtor and ch.int here, but it would mean inserting useless "cleanup dispatches" that only resume to the block (see onto.catch.handlers for an example).

Hi Bill,

All cleanups come from invoke's unwinds (from foo's call) and all
catch handlers come from the main catch dispatch, so for that rule,
they all should have it. Or that's what I understood... :wink:

Also, now I noticed a few things in your IR, could be just typos...
(ie. I'm not being an ass, just checking I got it right) :wink:

I think I can answer a couple of those...

1. Your BB flow seem odd in the try block. You call foo() four times
and then call the Ctors, rather than interpolating them as in the
source code (otherwise, all calls to foo should unwind to the catch
area, not cleanups).

Those are destructor calls for the case where none of the foo()s
throw. There are no ctor calls because the classes have trivial
constructors, which are no-ops.

4. Your catch handlers return void, shouldn't then branch to %return
instead? In this case, a front-end would probably call that label a
"%finally" and that would branch to %return, or return directly.

He did say it was simplified IR. Just imagine -simplify-cfg already
did some tail duplication here.

Which brings me to my final question: In languages where the finally
HAS to run, like Java, would that be a simple cleanup regions or do we
need something special from catch's dispatch's resume AND %finally
block?

Presumably that would just be encoded as a catch-all to the finally
code followed by some encoding of a rethrow (e.g. an "unwind resume"
to the next enclosing region, an 'unwind' instruction or some form of
"resume_unwind()" call).
If there were also catches, they'd probably need to explicitly execute
the finally code (or if they again throw, have their unwind edges
point to a dispatch with a catch-all to the finally code).

Those are destructor calls for the case where none of the foo()s
throw. There are no ctor calls because the classes have trivial
constructors, which are no-ops.

Indeed! Thanks! :wink:

If there were also catches, they'd probably need to explicitly execute
the finally code (or if they again throw, have their unwind edges
point to a dispatch with a catch-all to the finally code).

The problem is that, the code has to be run regardless of throwing or
not, but the exit of the finally block change depending on its
predecessor...

So either you have two identical blocks (like the clean-up code) or
you have a special way of defining the exit condition based on the
predecessor (like PHI nodes for EH, which is horrible!).

Example pseudo-IR:

try:
  invoke func() to label %finally unwind to %catch

catch:
  dispatch resume to %lpad
    catches [ ... ]

catch1:
  ...
  br %finally

lpad:
  br %finally (??)
  or cleanup code & continue unwinding?

finally:
  cleanup code
  return (or continue unwinding?)

cheers,
--renato

If there were also catches, they'd probably need to explicitly execute
the finally code (or if they again throw, have their unwind edges
point to a dispatch with a catch-all to the finally code).

The problem is that, the code has to be run regardless of throwing or
not, but the exit of the finally block change depending on its
predecessor...

So either you have two identical blocks (like the clean-up code) or

IIRC that's how LDC deals with it (D has 'finally' blocks).

you have a special way of defining the exit condition based on the
predecessor (like PHI nodes for EH, which is horrible!).

You could do this too. I don't think there's any need for a special
way to define it though. Just use a switch on a phi. If you use
"unwind resume" for rethrowing you may still need to duplicate the
code once per EH region though, to make sure it's dominated by the
landing pad.

Jump threading should be able to clean this up just fine by
duplicating the finally code if it considers doing so profitable
enough (i.e. if the duplicated code is small enough). And if not,
apparently not duplicating was the right thing to do :).

You could even use an indirect branch on a phi of blockaddresses, but
I wouldn't recommend it. And not just because jump threading can't
handle it. indirectbr is gross, and should probably be avoided unless
the program you're compiling explicitly asks for it (with computed
gotos).

Hi Bill,

General Model

The unwind edge from an invoke instruction jumps to a landing pad. That landing pad contains code which performs optional cleanups, and then determines which catch handler to call (if any). If no catch handlers are applicable, the exception resumes propagation either to the next enclosing region or out of the function.

I was immediately struck by the fact that you describe cleanups as being run
before dispatching to handlers. In Ada handlers are run first and any cleanups
are run afterwards. As far as I know it is exactly the same in C++. Take your
example. I rewrite it with more explicit scopes:

void bar() {
   try {
     foo();
     { // new scope
       A a;
       foo();
       { // new scope
... etc...
       } // end scope
     } // end scope
   } catch (int i) {
   ... etc ...
}

Here cleanups for "a", "b" etc are only run before the handlers are executed
because they are in inner scopes, and so are cleaned by unwinding before the
"try" scope is reached. Consider the following example instead:

void bar() {
   try {
     A a;
     foo();
   } catch (int i) {
   ... etc ...
}

If you check the assembler you will see that the destructor for "a" is run
before branching to the handler.

I will comment on the rest later, once I have absorbed it.

Ciao,

Duncan.

Hi Bill, this proposal seems strangely complicated. I don't see the advantage
of the dispatch instruction over just attaching the information to each invoke.
Right now you have

   invoke void @_Z3foov()
     to label %invcont unwind label %catch.handlers

catch.handlers: landingpad
   dispatch resume to label %...
     catches [
       %struct.__fundamental_type_info_pseudo* @_ZTIi, label %ch.int
       %struct.__pointer_type_info_pseudo* @_ZTIPKc, label %ch.str
     ]
   catchall [i8* null, label % ch.ca]

Why not combine this into:

   invoke void @_Z3foov()
     to label %invcont unwind resume to label %...
     catches [
       %struct.__fundamental_type_info_pseudo* @_ZTIi, label %ch.int
       %struct.__pointer_type_info_pseudo* @_ZTIPKc, label %ch.str
     ]
   catchall [i8* null, label % ch.ca]

? That would get rid of the "landingpad" basic block attribute, and rules about
what can branch to landing pads, that these cannot be split etc. Your rules
mean that you can uniquely associate a dispatch instruction with each invoke
unwind edge, so why not directly attach the dispatch information to the edge,
i.e. to the invoke?

Syntax:

   dispatch resume to label<resumedest>

How do you say: if nothing is matched, keep unwinding out of the function?
Actually I don't see why <resumedest> is useful at all, since you can always
place a complete set of all handlers into one dispatch, so redispatching is
not needed. That said, it might be handy for reducing code duplication.

     catches [
       <type> <val>, label<dest1>
       ...
     ]
     catchall [<type> <val>, label<dest> ]
     personality [<type> <value>]
     filters [
       <type> <val>,<type> <val>, ...

You also need labels for filters. I know it may not look like it, but
if you inline a function with a filter into a function with handlers
then the filter ends up being nested inside outer handlers, resulting
in the need for a destination. In fact inlining also means that you
can't just list filters after catches, you need to consider maybe
having some catches, then a filter, then some other catches etc. Or
perhaps you want to take care of it by chaining dispatches via the
resumedest?

     ]

The `catches', `catchall', and `filters' clauses are optional. If neither `catches' nor `catchall' is specified, then the landing pad is implicitly a cleanup.

You may have cleanups to run even if there are handlers, how do you distinguish
between "have handlers, plus a cleanup to run in any case" and "have handlers,
no cleanup needs to be run if handlers don't match, just unwind straight
through"?

• The `filters' clause lists the types of exceptions which may be thrown by the
   region.

What is "a region"?

onto.catch.handlers:
   dispatch resume to %lpad

What is %lpad?

As a general remark, I don't see how this is superior to gcc's region scheme,
which could be adapted to LLVM by (in essence) attaching to an invoke the list
of all regions (and their info like list of catches) that contain the invoke.
If you do that then you get something rather close to your proposal, only the
dispatch instructions have become part of the invokes, you don't need the funny
landing pad special semantics etc.

Ciao,

Duncan.

Nor these. Basically, I want the basic block that’s labeled a “landing pad” to be jumped to by only a dispatch resume or unwind edge of invoke. We could do this with the c.dtor and ch.int here, but it would mean inserting useless “cleanup dispatches” that only resume to the block (see onto.catch.handlers for an example).

Hi Bill,

All cleanups come from invoke’s unwinds (from foo’s call) and all
catch handlers come from the main catch dispatch, so for that rule,
they all should have it. Or that’s what I understood… :wink:

Also, now I noticed a few things in your IR, could be just typos…
(ie. I’m not being an ass, just checking I got it right) :wink:

  1. Your BB flow seem odd in the try block. You call foo() four times
    and then call the Ctors, rather than interpolating them as in the
    source code (otherwise, all calls to foo should unwind to the catch
    area, not cleanups).

I think you mean that it calls the Dtors on %c, %b, and %a. :slight_smile: It optimized the creation of the objects by not calling the empty constructors. bb1, bb2, and bb3 are the normal calls to the d’tors that we expect.

  1. In catch.handlers, the resume label is to %onto.catch.handlers, right?

Oh…I forgot to fill that in. It should to go %yikes.

  1. In onto.catch.handlers you resume to %lpad. Which basic-block is
    this? Shouldn’t this be a “resume_unwinding” call of some sort (if Q2
    is right)?

This was necessary to keep the invariant that only resume or unwind edges may jump to a landing pad (%lpad in this instance). Otherwise, I would have the invoke in a.dtor to directly to %lpad on the “normal” edge.

  1. Your catch handlers return void, shouldn’t then branch to %return
    instead? In this case, a front-end would probably call that label a
    “%finally” and that would branch to %return, or return directly.

They could. It’s an optimization that the back-end normally does. :slight_smile:

Which brings me to my final question: In languages where the finally
HAS to run, like Java, would that be a simple cleanup regions or do we
need something special from catch’s dispatch’s resume AND %finally
block?

I need to see how we do things today (Objective-C also has finally blocks). But I suspect that it will be a matter of having the catch handlers jumping to the finally block. In other words, I don’t suspect that we need a different mechanism to handle this.

-bw

Hi Bill,

General Model

=============

The unwind edge from an invoke instruction jumps to a landing pad. That landing pad contains code which performs optional cleanups, and then determines which catch handler to call (if any). If no catch handlers are applicable, the exception resumes propagation either to the next enclosing region or out of the function.

I was immediately struck by the fact that you describe cleanups as being run
before dispatching to handlers. In Ada handlers are run first and any cleanups
are run afterwards. As far as I know it is exactly the same in C++. Take your
example. I rewrite it with more explicit scopes:

void bar() {
try {
foo();
{ // new scope
A a;
foo();
{ // new scope
… etc…
} // end scope
} // end scope
} catch (int i) {
… etc …
}

Here cleanups for “a”, “b” etc are only run before the handlers are executed
because they are in inner scopes, and so are cleaned by unwinding before the
“try” scope is reached.

I don’t understand what you’re trying to say here. I looked at the assembly for your first bar() function and it’s exactly the same as would be emitted for one without the new scopes. Indeed, it’s semantically identical. What do you mean by “before the try scope is reached”? If an exception is thrown by something in a nested scope, any variables live at that point must have their cleanups called before exiting the try block. And that occurs before the catch handlers are called.

Consider the following example instead:

void bar() {
try {
A a;
foo();
} catch (int i) {
… etc …
}

If you check the assembler you will see that the destructor for “a” is run
before branching to the handler.

Correct. How is this a contradiction of what I wrote? I’m confused.

-bw

Hi Bill,

...

I don't understand what you're trying to say here. I looked at the assembly for
your first bar() function and it's exactly the same as would be emitted for one
without the new scopes. Indeed, it's semantically identical. What do you mean by
"before the try scope is reached"? If an exception is thrown by something in a
nested scope, any variables live at that point must have their cleanups called
before exiting the try block. And that occurs before the catch handlers are called.

Consider the following example instead:

void bar() {
try {
A a;
foo();
} catch (int i) {
... etc ...
}

If you check the assembler you will see that the destructor for "a" is run
before branching to the handler.

Correct. How is this a contradiction of what I wrote? I'm confused.

I got myself confused. Indeed it seems that C++ behaves the way you describe.
However in Ada it is the other way round: cleanups are run after handlers. For
example in Ada it would be possible to use the variable "a" inside one of the
handlers, so it had better not be destructed before the handler runs. Thus it
is important not to bake "cleanups run before handlers" into your proposal,
since it is C++ specific. That said, as far as I can tell you didn't bake it
in.

Ciao,

Duncan.