Exception Handling Problems

Exception handling in LLVM is broken. It's as simple as that.

We can simulate exception handling in most cases, but we cannot handle all
cases. (For instance, SingleSource/UnitTests/ObjC/exceptions.m in our testsuite
doesn't work on ARM at anything optimization level above -O0.) And there's no
way to coerce it to work with our current EH scheme.

We don't follow the exception handling ABI:

   http://www.codesourcery.com/public/cxx-abi/abi-eh.html

This has caused problems for at least one project I know of. Also, because we
don't follow the ABI, our exception handling is slow (and people have
noticed). We call _Uwind_Resume_or_Rethrow, which is expensive and unnecessary.

Inlining is a huge problem for our current EH scheme. Its inability to properly
inline cleanups is the reason why I had to create the (very expensive) hack in
DwarfEHPrepare. And in the case of SingleSource/UnitTests/ObjC/exceptions.m, it
simply fails. The inlining code has to create "catch-alls" that throw and catch
within the same function. To see an example of this, compile this simple code
into LLVM IR:

#include <iostream>

struct A {
  ~A();
};

void bar();

void foo() __attribute__((always_inline));
void foo() {
  try {
    A a;
    bar();
  } catch (const char *c) {
    std::cout << "foo() catch value: " << c << "\n";
  }
}

int main() {
  try {
    foo();
  } catch (int i) {
    std::cout << "main() catch value: " << i << '\n';
  }
}

The code is much larger than it needs to be, it has catch-alls, and is very
difficult to understand.

All of this is because the LLVM passes cannot properly reason about the
exception handling code. The EH information resides in intrinsics, which may be
located far from the `unwind' edge of the invoke they're associated with (this
is resolved directly before CodeGen). So it's not always possible for the
inlining pass, or any other pass, to have the knowledge it needs to modify the
EH code in a sensible manner.

If exception handling were to use native IR instructions, it would be easy for
inlining and other passes to understand what's going on. And they would be able
to modify the code in well-documented ways that would retain the correct EH
semantics.

For all of the trouble it's causing us, exception handling is conceptually
rather simple. A call within a section of code (called a `region', for lack of a
better term) may throw an exception. When that occurs, execution continues at
the catch handler. The existence of cleanups shouldn't complicate this. (They
execute before the catch handler code, or not at all if it's C++ and there are
no catch handlers on the stack.) All of the heavy lifting is done by external
libraries -- the personality function and libunwind.

There's only one complication that I ran into when I was rewriting EH last
year. The EH information needs to be available at two places in the code for
code-gen to produce the correct EH tables. (Again, this isn't meant to be
DWARF-specific, but it needs to support it.)

* At the throwing call -- We need it here because it's the origin of the
  exception, and it has the information of where we're coming from and the
  landing pad for the region containing the call, and

* At the landing pad, but after the cleanup code -- We need it here because this
  is where we generate a "jump table" (something like a switch statement) to go
  to a specific catch block. Note that the cleanup code can be arbitrarily
  complex. This, coupled with the movement of the EH intrinsics, makes
  associating a particular set of catch blocks with a throwing call almost
  impossible (with our current scheme).

To summarize:

* Exception handling needs to be a first-class citizen of the LLVM IR in order
  for it to be understood and modified correctly by all passes.

* The information needed to generating correct EH tables needs to be available
  at more than just one point in the function.

-bw

Pardon the basic question, but does this apply to clang, llvm-gcc, or
both?

Thanks,
-David

Yes.

-eric

Sorry, but does your reply yes mean both?

-David

It means anything that uses the llvm exception handling interface. So, llvm-gcc, clang, and anything else (like ada) :slight_smile:

-eric

* Exception handling needs to be a first-class citizen of the LLVM IR in order
for it to be understood and modified correctly by all passes.

Agreed!

* The information needed to generating correct EH tables needs to be available
at more than just one point in the function.

Indeed, it needs to be consistent and reachable from multiple places,
code and unwind blocks.

The unwind call graph must be first-class citizen and it must be
tightly coupled with the normal flow (to allow inlining) and the
semantics must be clear, so passes won't destroy it easily.

However, since the C++ ABI is but one example on how to do EH and LLVM
is language agnostic, I'm inclined to say that this is an impossible
task.

This is not to say that it can't be done, far from it, but that it
won't be as clean as we'd hope for. There are some things (like
exception handling and bitfields) that doesn't matter how hard you try
refactoring, it always end up dirty.

What we need is a clear set of premises (just like John has just made)
that are language agnostic and follow them wholeheartedly. We should
only try to come up with a plan for IR when those premises have been
agreed in a document in SVN.

cheers,
--renato