order of object files at link affects exception catching

First of all a preface - This problem was spotted while trying to
build a large C++ project which links a close to 100 of object file
together, plus libraries. I can't replicate this behavior in a simple
isolated test. Just want to understand if potentially this may be
caused by clang's compiler or linker behavior (missed flag, or
optimization effect/bug). The project builds and runs correctly with
GCC.

Compiler: clang-10 on OSX 10.13

The project builds fully without errors and the final binary
executable is produced. The binary starts up ok and presents a prompt.
However any exception-based processing (like input errors are expected
to show a message and continue, or catching Ctrl+C and processing into
a message and continue) result in uncaught exception and ends in
abort(). Basically, the libc++ calls std::terminate(), as if the
proper catch statement is missing, which clearly is in the code.
Somehow the exception unwind stack gets broken.

The code links a large number of objects and a few .a libraries, so I
tried to put the individual objects into another .a lib to try to
eliminate the order effects. Still, the resulting binary has the
exception catching issues.

Then I tried to craft a simple test (which does not use any of the
actual project's code) that has throw/catch and then linked it in the
same way. The results:
* when the test code is linked from the .a library (with all objects
as above), the exceptions are processed ok.
* when the test code is linked with all the objects above specified
on the command line, the exception issues are back.

Obviously, the simple test code does not need any of the code from the
other objects, yet the resulting code appears somehow broken. Granted,
the linker will have to resolve all dependecies it finds on the
command line and tie it into the binary, still none of those functions
should be executed by the sample test code.

Finally, I tried to change the order of the project's object files at
line and put the object file which does the actual throw, right next
to the main's object file. To my surprise, the exceptions were caught
ok... But too soon to celebrate, exceptions tripped in other parts of
the code still were not caught properly.

So the bottom line, some how the exceptions table gets messed up in
the process on linking. I can't think of any other way to diagnose
this.

By the way, the very same code is properly linked and functioning when
using GCC default compile/link options.

I tired without success -fno-lto to disable link-time-optimization,
but that's default anyway.

To reiterate the questions:
1. Why would order of the object files matter for correct exception processing?
2. Are there some clang's options specific for such cases?

Any ideas are welcome!

First of all a preface - This problem was spotted while trying to
build a large C++ project which links a close to 100 of object file
together, plus libraries. I can’t replicate this behavior in a simple
isolated test. Just want to understand if potentially this may be
caused by clang’s compiler or linker behavior (missed flag, or
optimization effect/bug). The project builds and runs correctly with
GCC.

Compiler: clang-10 on OSX 10.13

The project builds fully without errors and the final binary
executable is produced. The binary starts up ok and presents a prompt.
However any exception-based processing (like input errors are expected
to show a message and continue, or catching Ctrl+C and processing into
a message and continue) result in uncaught exception and ends in
abort(). Basically, the libc++ calls std::terminate(), as if the
proper catch statement is missing, which clearly is in the code.
Somehow the exception unwind stack gets broken.

The code links a large number of objects and a few .a libraries, so I
tried to put the individual objects into another .a lib to try to
eliminate the order effects. Still, the resulting binary has the
exception catching issues.

Then I tried to craft a simple test (which does not use any of the
actual project’s code) that has throw/catch and then linked it in the
same way. The results:

  • when the test code is linked from the .a library (with all objects
    as above), the exceptions are processed ok.
  • when the test code is linked with all the objects above specified
    on the command line, the exception issues are back.

Only objects that contain referenced symbols get pulled in from archives, so using a .a library will tend to result in fewer objects being linked in than specifying the .o files on the command line. That might explain part of the difference you’re seeing here.

Obviously, the simple test code does not need any of the code from the
other objects, yet the resulting code appears somehow broken. Granted,
the linker will have to resolve all dependecies it finds on the
command line and tie it into the binary, still none of those functions
should be executed by the sample test code.

Finally, I tried to change the order of the project’s object files at
line and put the object file which does the actual throw, right next
to the main’s object file. To my surprise, the exceptions were caught
ok… But too soon to celebrate, exceptions tripped in other parts of
the code still were not caught properly.

So the bottom line, some how the exceptions table gets messed up in
the process on linking. I can’t think of any other way to diagnose
this.

By the way, the very same code is properly linked and functioning when
using GCC default compile/link options.

I tired without success -fno-lto to disable link-time-optimization,
but that’s default anyway.

To reiterate the questions:

  1. Why would order of the object files matter for correct exception processing?

The most likely explanation is that your program contains a violation of C++'s “One Definition Rule” (ODR). Specifically, you probably have a function or class that’s defined in different ways in two different source files, and the behavior of your program depends on which one gets picked at link time. (Worse, there are ways in which we can end up picking one version from one .o file and a different version from a different .o file.) Given the symptoms, it’s possible that this is happening because part of your program is built with -fno-exceptions and part of your program is build without that flag, and an exception in question is propagating through a (perhaps inline) function that was built both ways. But that’s just a guess.

  1. Are there some clang’s options specific for such cases?

Do you still see the issue with -O0? Do you still see the issue if you explicitly add -fexceptions to every compilation?

Richard,

Thanks for the quick response; it gave me some directions to
investigate further, otherwise it seemed I got stuck trying to make
sense of many moving pieces in this puzzle. So, my understanding is
that generally the run-time exception handling should _not_ depend on
the order of the linkage (provided there're no violations as you
mentioned). This is unlike the familiar case of order of object files
affecting the linker's resolutions of external symbols, where the
order _does_ matter. That means what I'm seeing is rather anomalous,
not a by-design behavior.

Now, looking into the ideas of the ODR violation, I realise that in
the set up I'm using, the clang (installed from pre-built
package@llvm.org) is used with '-stdlib=libc++', so the link pulls the
libc++.dylib and the libc++abi.dylib. The compiler gets clang's libc++
includes, the linker resolves these from clang's /lib, however OSX
(10.13) has its own set of these .dynlibs in /usr/lib; system's
libSystem pulls these (via libobjc.dylib). So the resulting binary
loads two sets of libc++ and libc++abi.

Are there any linkages between the clang's supplied libc++ and
system's libc++abi, or it's meant to use exclusively clang's libc++?

Could this be the reason for exception breakdown? I understand that
generally there should be only one libc++abi for the whole
application, this way the type_info is common across all classes, and
thus exceptions are correctly typed. This may explain why a sample
test (try/throw/catch) works in isolation, as it may not cross from
one set of libc++abi into the other.

I'm thinking what test code could I craft that would possibly trigger
the use of both clang's and system's libc++abi? Clearly, the simple
try/throw/catch works OK whether with or without -rpath to clang's
lib.

Given the symptoms, it's possible that this is happening because part of your program is built with -fno-exceptions and part of your program is build without that flag, and an exception in question is propagating through a (perhaps inline) function that was built both ways. But that's just a guess.

I tried to rebuild the whole application with -fexceptions; still the
same symptoms. Also tried with -funwind-tables. The issue is present
wth -O0 too. Reading on this, back in time Apple was advising Xcode
users _not_ to use -no_compact_unwind switch, as it led to similar
issue of exceptions not getting caught. Not sure what exactly was the
effect of that switch, but clang does not seem to have this switch
and, well, exceptions are being caught in isolated sample test.

I appreciate your input.

Richard,

Thanks for the quick response; it gave me some directions to
investigate further, otherwise it seemed I got stuck trying to make
sense of many moving pieces in this puzzle. So, my understanding is
that generally the run-time exception handling should not depend on
the order of the linkage (provided there’re no violations as you
mentioned). This is unlike the familiar case of order of object files
affecting the linker’s resolutions of external symbols, where the
order does matter. That means what I’m seeing is rather anomalous,
not a by-design behavior.

Now, looking into the ideas of the ODR violation, I realise that in
the set up I’m using, the clang (installed from pre-built
package@llvm.org) is used with ‘-stdlib=libc++’, so the link pulls the
libc++.dylib and the libc++abi.dylib. The compiler gets clang’s libc++
includes, the linker resolves these from clang’s /lib, however OSX
(10.13) has its own set of these .dynlibs in /usr/lib; system’s
libSystem pulls these (via libobjc.dylib). So the resulting binary
loads two sets of libc++ and libc++abi.

Are there any linkages between the clang’s supplied libc++ and
system’s libc++abi, or it’s meant to use exclusively clang’s libc++?

I’m afraid I don’t know exactly how our packages nor the Mac OS X versions are configured. It’s possible there’s some mismatch here. If this is specific to the libc++ dylib that’s included in OS X, the Apple folks would probably be interested in you contacting them directly.

One other thing that sometimes goes wrong when exceptions are thrown but can’t be caught is that the type information on the throw and catch sides doesn’t match, usually because of visibility attributes (or command-line flags) causing one or both versions of the type to be considered DSO-internal. Can you catch the exception with “catch (…)”?

Could this be the reason for exception breakdown? I understand that
generally there should be only one libc++abi for the whole
application, this way the type_info is common across all classes, and
thus exceptions are correctly typed. This may explain why a sample
test (try/throw/catch) works in isolation, as it may not cross from
one set of libc++abi into the other.

If you do have two different libc++abis in the same process (maybe one statically and one dynamically linked?) then it seems plausible that exception throw/catch would break down, because they would have different ideas of what some of the key globals involved in exception throw/catch are (for example, primitive type_info objects).

Can you catch the exception with "catch (...)"?

I tried this route and added such catch-all clause just at the throw
site. Moreover, I put an explicit throw("catch-me") there in hope to
see if it wil just get caught rightaway. Nope, the exception is thrown
properly, but the catch (...) is not invoked. I can clearly see the
stack trace on the crash log, with the throw happening correctly, then
handed over to clang's libc++abi.1.dylib (__cxa_throw) , then
proceeding into std::__terminate(), ending up in abort() from
libsystem_c.dylib. As if the catch clause is not there. The build
process is done with explicit -fexceptions and clang's default RTTI
(that is it's ON in this case).

Which makes me believe there's something else at play in this program
that somehow disturbs the exception handling process. It's still not
clear why changing the order of the linking object files results in
correct catching of those throws; and why this happens only with this
OSX+clang mix. To be specific, I ordered the objects according to how
they appear on the crash log, with the rest following it
alphabetically just as before.

Thank you for you input! For now this helps me eliminate some
possibility of misconfiguration of the pre-built clang and focus more
on the program's entrails.