C++ exception handling

(Moving this discussion on list as this could be of general interest.)

My current work-in-progress implementation is attempting to map out the blocks used by a landing pad before it starts outlining. It creates a table of catch and cleanup handlers with the block at which each one starts. During outlining I intend to have another mechanism to check to see if we’ve already outlined the handler at the specified start block (which handles the case of handlers shared between landing pads) and if not starts the outlining process at that block. For cleanup handlers I’m treating any catch dispatch block (identified by looking for the eh_typeid_for+cmp+conditional branch pattern) as a stopping point for the cloning process.

Here’s a snippet of comments explaining how I am doing the block mapping.

// The landing pad sequence will have this basic shape:

//

//

//

//

//

//

//

//

// …

//

// Any of the cleanup slots may be absent. The cleanup slots may be occupied by

// any arbitrary control flow, but all paths through the cleanup code must

// eventually reach the next selector comparison and no path can skip to a

// different selector comparisons, though some paths may terminate abnormally.

// Therefore, we will use a depth first search from the start of any given

// cleanup block and stop searching when we find the next selector comparison.

//

// The catch handlers may also have any control structure, but we are only

// interested in the start of the catch handlers, so we don’t need to actually

// follow the flow of the catch handlers. The start of the catch handlers can

// be located from the compare instructions, but they can be skipped in the

// flow by following the contrary branch.

//

This has a flaw in that cleanup code could contain blocks that are shared with control flows outside the cleanup block and use phi-dependent branching. If the shared flow isn’t part of another cleanup handler this is only an efficiency problem as the non-cleanup flow will reach a landingpad or a function terminator before it hits a selector. However, if the shared block is shared with another cleanup handler, it could lead to a catch dispatch that has nothing to do with the current block.

It turns out that the cloning code handles this problem of shared blocks really well. It just clones everything and prunes the unwanted code paths based on phi resolution and subsequent instruction simplification. I suppose I could clone the entire landing pad to do the block mapping and just discard the clone when I’m done, but that seems really horrific as a way of resolving this corner case problem. What I’d like is to have a way to determine whether the successors of a block are phi-dependent and if so find a way to limit the search to paths that result from predecessors I’ve already seen. I can do this easily enough for simple phi dependence (like constant phi values and a single comparison to choose a branch target) but I haven’t tried to see if the process I have in mind work would for arbitrarily complex cases. If you have any suggestions on this (like maybe a “isSuccessorPhiDependent()” function tucked away somewhere that I haven’t noticed), I’d love to hear them.

-Andy

I think complex cases are unlikely and should be handled conservatively by
cloning. A good example of the simple case is the way we lower __finally.
Internally Clang has the invariant that it cannot double-emit local
declarations twice while emitting a function. So for this test case, we had
to make the exceptional cleanup path branch into normal path with a phi,
and then have a conditional branch back over to the exceptional path.

// C++:
void f() {
  __try { might_crash(); }
  __finally {
    mylabel: // better not emit twice!
    int r; // better not emit twice!
    r = do_cleanup();
    if (!r)
      goto mylabel;
  }
}

// Simplified IR:

define void @f() {
  %abnormal_termination = alloca i8
  invoke void @might_crash()
      to label %cont unwind label %lpad
cont:
  store i8 0, i8* %abnormal_termination
  br label %__finally
lpad:
  landingpad { i8*, i32 } personality i32 (...)* @__C_specific_handler
      cleanup
  store i8 1, i8* %abnormal_termination
  br label %__finally
__finally: ; code simplified for brevity
  %r = call i1 @do_cleanup()
  br i1 %r, label %__finally, label %done
done:
  %ab = load i8* %abnormal_termination
  %tobool = icmp eq i8 %ab, i8 0
  br i1 %tobool, label %ret, label %resume
ret:
  ret void
resume:
  resume
}

You can see at -O0 we have these loads and stores, but at -O1 we'll have a
phi-dependent branch that's really easy to analyze. It'd be good if the
preparation pass could understand this much, and no more. :slight_smile:

I’m not sure I understand what you are suggesting in terms of how this should be handled.

I agree that the cleanup code is unlikely to be complex, but obviously we still need some way to handle any cases that unlikely but possible. I hadn’t even thought about what might be in a __finally block. I was thinking about destructors being inlined. Obviously that puts a practical limit on what will be there, but not really a theoretical limit.

So I’m looking for an implementation that will handle the typical cases efficiently but still not break if the very rare case turns up. It occurs to me that maybe the depth first search I was planning is exactly wrong in this case. Given a cleanup handler that goes into a shared block and can either branch back out to a block that belongs to the landing pad or branch to a block at some arbitrary point in the normal flow, the path we’re looking for is almost certainly going to meet the selector comparison or a resume within a block or two. So a breadth-first search would handle this case better and probably is reasonable for the common case (which usually won’t have any branches at all).

There are two scenarios I have in mind here.

  1. A shared block that might branch into normal code:

lpad:

landingpad …

cleanup

catch …

br label %shared.block

shared.block:

%dst = phi i1 [1, %normal.cont], [0, %lpad]

; do something that happens in cleanup and in regular code

br i1 %dst, label %normal.cont2, label %catch.dispatch

catch.dispatch:

That sort of PHI dependence is very easy to handle, but can we count on it always being that simple? It seems like in theory there could be an arbitrary use-def chain between the PHI node and the branch condition that we may or may not be able to resolve at compile-time. I wouldn’t be happy with whoever created such an atrocity, but I didn’t feel like we could just assume it can’t happen.

If we follow the branch to “normal.cont2” it could conceivably take us through the entire CFG for the function. Obviously we want to avoid that.

  1. A block shared between unconnected landing pads

lpad1:

landingpad …

cleanup

catch …

br label %shared.block

lpad2:

landingpad …

cleanup

catch …

br label %shared.block

shared.block:

%dst = phi i1 [1, %lpad1], [0, %lpad2]

; do something that happens in both cleanup handlers

br i1 %dst, label %lpad1.catch.dispatch, label %lpad2.catch.dispatch

lpad1.catch.dispatch:

lpad2.catch.dispatch:

I haven’t tried to devise code that would lead to this, but my intuition is that this kind of sharing is more likely than the other and if it does happen it could be problematic. Again, the PHI-dependence here is easy to sort out here and it would be trivial to determine which branch we should follow. But if a case arises where we can’t figure out which predecessor leads to which successor, what I had planned just won’t work.

I’d like to believe that all PHI-dependent control flow that arises from block merging will always be simple and that if anyone tried to do something more complex someone would ask them not to, but I worry that you never know what is going to happen over the course of many transformation passes. Like, imagine:

shared.block:

%foo = phi i1 [1, %lpad1], [0, %lpad2]

%fubar = call i1 @bar(i1 %foo)

br i1 %fubar, label %dispatch1, label %dispatch2

Am I overthinking this?

-Andy

I’m not sure I understand what you are suggesting in terms of how this
should be handled.

I agree that the cleanup code is unlikely to be complex, but obviously we
still need some way to handle any cases that unlikely but possible. I
hadn’t even thought about what might be in a __finally block. I was
thinking about destructors being inlined. Obviously that puts a practical
limit on what will be there, but not really a theoretical limit.

So I’m looking for an implementation that will handle the typical cases
efficiently but still not break if the very rare case turns up. It occurs
to me that maybe the depth first search I was planning is exactly wrong in
this case. Given a cleanup handler that goes into a shared block and can
either branch back out to a block that belongs to the landing pad or branch
to a block at some arbitrary point in the normal flow, the path we’re
looking for is almost certainly going to meet the selector comparison or a
resume within a block or two. So a breadth-first search would handle this
case better and probably is reasonable for the common case (which usually
won’t have any branches at all).

There are two scenarios I have in mind here.

1) A shared block that might branch into normal code:

lpad:

  landingpad …

    cleanup

    catch …

  …

  br label %shared.block

shared.block:

  %dst = phi i1 [1, %normal.cont], [0, %lpad]

  ; do something that happens in cleanup and in regular code

  …

  br i1 %dst, label %normal.cont2, label %catch.dispatch

catch.dispatch:

That sort of PHI dependence is very easy to handle, but can we count on it
always being that simple? It seems like in theory there could be an
arbitrary use-def chain between the PHI node and the branch condition that
we may or may not be able to resolve at compile-time. I wouldn’t be happy
with whoever created such an atrocity, but I didn’t feel like we could just
assume it can’t happen.

If we follow the branch to “normal.cont2” it could conceivably take us
through the entire CFG for the function. Obviously we want to avoid that.

In the case where we have such unanalyzable code, I think outlining the
entire function is a good way of being conservatively correct. We also have
tools that help us here. For example we know that 'ret' is unreachable in a
cleanup. Cleanup code must return to either 'resume' or more EH dispatch.
We can propagate that backwards and delete unreachable code.

That said, it might be more practical to just report_fatal_error when
cleanup code doesn't rejoin EH dispatch. Outlining the whole function will
generate terrible, terrible code, and I'd rather tweak the frontend and
intervening optimizers to avoid this situation.

2) A block shared between unconnected landing pads

lpad1:

  landingpad …

    cleanup

    catch <some type> …

  …

  br label %shared.block

lpad2:

  landingpad …

    cleanup

    catch <some other type> …

  …

  br label %shared.block

shared.block:

  %dst = phi i1 [1, %lpad1], [0, %lpad2]

  ; do something that happens in both cleanup handlers

  …

  br i1 %dst, label %lpad1.catch.dispatch, label %lpad2.catch.dispatch

lpad1.catch.dispatch:

lpad2.catch.dispatch:

I haven’t tried to devise code that would lead to this, but my intuition
is that this kind of sharing is more likely than the other and if it does
happen it could be problematic. Again, the PHI-dependence here is easy to
sort out here and it would be trivial to determine which branch we should
follow. But if a case arises where we can’t figure out which predecessor
leads to which successor, what I had planned just won’t work.

I’d like to believe that all PHI-dependent control flow that arises from
block merging will always be simple and that if anyone tried to do
something more complex someone would ask them not to, but I worry that you
never know what is going to happen over the course of many transformation
passes. Like, imagine:

shared.block:

  %foo = phi i1 [1, %lpad1], [0, %lpad2]

  …

  %fubar = call i1 @bar(i1 %foo)

  …

  br i1 %fubar, label %dispatch1, label %dispatch2

Am I overthinking this?

I want to say that transforms should never do this, but it's better to be
correct in the face of bad code. =/

When I talked through this with my coworkers, everyone kind of calmed down
when we realized that cloning the whole function body was essentially a
conservatively correct outlining implementation.

OK then. I’ll proceed as I was, trusting that we can come up with something to handle the nightmare cases before it’s finished.

Thanks,

Andy