C++11 ABI Compatibility and Static Initialization

Hello All,

I've been trying to track down an EXC_BAD_ACCESS error (Bus error: 10)
which occurs during static initialization when linking a program using
boost to a library using boost. The problem occurred, for me, with
QuantLib, but it has been reported to occur with mlpack[1] and likely
affects other boost-using libraries as well. I have determined that
the issue occurs when a library is compiled with -std=c++11 and the
program is not (or vice versa).

Now that I have identified that the issue is C++11 ABI compatibility,
I can easily avoid the issue on my systems. However, I have brought
it to your attention in the hopes that it can either made easier to
identify or the compatibility can be improved to save future users the
effort spent debugging to identify the same issue.

For reference, the issue occurs when using "Apple LLVM version 5.1
(clang-503.0.38) (based on LLVM 3.4svn)" on Mac OS X 10.9.2 with Boost
1.55.0 using the following example program (please excuse the abuse of
Boost internals to make a minimal test case):

-8<--libcrash11.cpp---------------------------------------------------
#include <boost/math/special_functions/lanczos.hpp>

double libfunc() {
    return boost::math::lanczos::lanczos17m64::lanczos_sum(1.0);
}
-8<--libcrash11.cpp---------------------------------------------------

-8<--crash11.cpp------------------------------------------------------
#include <boost/math/special_functions/lanczos.hpp>

int main(int,char**) {
    boost::math::lanczos::lanczos17m64::lanczos_sum(1.0);
    return 0;
}
-8<--crash11.cpp------------------------------------------------------

Compiled as follows:
clang++ -dynamiclib -o libcrash11.dylib libcrash11.cpp
clang++ -std=c++11 -L. -lcrash11 -o crash11 crash11.cpp

Now the binary compiled in C++11 mode has
double boost::math::lanczos::lanczos17m64::lanczos_sum<double>(double const&)::num
inside the .__TEXT.__const segment while the other has this symbol inside
the .__DATA.__data segment.

Executing crash11 in a debugger shows that the program stops with
EXC_BAD_ACCESS when attempting to initialize the copy of
double boost::math::lanczos::lanczos17m64::lanczos_sum<double>(double const&)::num
inside the crash11.__TEXT.__const segment (which is presumably mapped
read-only).

As a non-expert, I would guess that in C++11 the value of the symbol
can be determined at compile time and placed in the const segment
while in C++98 mode it must be initialized at runtime and that when
linked together the initialization is applied to both occurrences of
the symbol, causing the crash. But I may be completely off-base with
this guess.

So, my question to you is: Can the situation be improved? Either by
avoiding the crash by applying the initialization only to the symbol
in .__DATA.__data or could it generate a linker or loader error with a
message that indicates the C++11 ABI as a potential cause of the
incompatibility? Or perhaps something else?

In either case users would be advised to not link different C++
dialects together. But I'm afraid we users are perpetually finding
ways to screw such things up (or, at least, I am).

Cheers,
Kevin

1. http://trac.research.cc.gatech.edu/fastlab/ticket/296

Hello All,

I've been trying to track down an EXC_BAD_ACCESS error (Bus error: 10)
which occurs during static initialization when linking a program using
boost to a library using boost. The problem occurred, for me, with
QuantLib, but it has been reported to occur with mlpack[1] and likely
affects other boost-using libraries as well. I have determined that
the issue occurs when a library is compiled with -std=c++11 and the
program is not (or vice versa).

Now that I have identified that the issue is C++11 ABI compatibility,
I can easily avoid the issue on my systems. However, I have brought
it to your attention in the hopes that it can either made easier to
identify or the compatibility can be improved to save future users the
effort spent debugging to identify the same issue.

For reference, the issue occurs when using "Apple LLVM version 5.1
(clang-503.0.38) (based on LLVM 3.4svn)" on Mac OS X 10.9.2 with Boost
1.55.0 using the following example program (please excuse the abuse of
Boost internals to make a minimal test case):

-8<--libcrash11.cpp---------------------------------------------------
#include <boost/math/special_functions/lanczos.hpp>

double libfunc() {
   return boost::math::lanczos::lanczos17m64::lanczos_sum(1.0);
}
-8<--libcrash11.cpp---------------------------------------------------

-8<--crash11.cpp------------------------------------------------------
#include <boost/math/special_functions/lanczos.hpp>

int main(int,char**) {
   boost::math::lanczos::lanczos17m64::lanczos_sum(1.0);
   return 0;
}
-8<--crash11.cpp------------------------------------------------------

Compiled as follows:
clang++ -dynamiclib -o libcrash11.dylib libcrash11.cpp
clang++ -std=c++11 -L. -lcrash11 -o crash11 crash11.cpp

Now the binary compiled in C++11 mode has
double boost::math::lanczos::lanczos17m64::lanczos_sum<double>(double const&)::num
inside the .__TEXT.__const segment while the other has this symbol inside
the .__DATA.__data segment.

Executing crash11 in a debugger shows that the program stops with
EXC_BAD_ACCESS when attempting to initialize the copy of
double boost::math::lanczos::lanczos17m64::lanczos_sum<double>(double const&)::num
inside the crash11.__TEXT.__const segment (which is presumably mapped
read-only).

As a non-expert, I would guess that in C++11 the value of the symbol
can be determined at compile time and placed in the const segment
while in C++98 mode it must be initialized at runtime and that when
linked together the initialization is applied to both occurrences of
the symbol, causing the crash. But I may be completely off-base with
this guess.

My initial reaction is that this is an error in Boost (which could make this distinction at configuration time instead of at compile time), or that Boost should at least detect this problem and abort with an error. Of course, if clang could detect this incompatibility, this would also be good.

-erik

Hi,

Hello All,

I've been trying to track down an EXC_BAD_ACCESS error (Bus error: 10)
which occurs during static initialization when linking a program using
boost to a library using boost. The problem occurred, for me, with
QuantLib, but it has been reported to occur with mlpack[1] and likely
affects other boost-using libraries as well. I have determined that
the issue occurs when a library is compiled with -std=c++11 and the
program is not (or vice versa).

Now that I have identified that the issue is C++11 ABI compatibility,
I can easily avoid the issue on my systems. However, I have brought
it to your attention in the hopes that it can either made easier to
identify or the compatibility can be improved to save future users the
effort spent debugging to identify the same issue.

It is already identified by Boost and in short it won't be fixed: see Boost mailing page: [boost] C++03 and C++11 ABI compatibility for compiled libraries
Boost won't be ABI compatible between C++03 and C++11.
I don't think Clang can do much about it as I believe it is inherent to some changes in the standard.

GCC wiki has a page about ABI changes: Cxx11AbiCompatibility - GCC Wiki

Best,

Mehdi

Hi Erik and Mehdi,

Thanks for considering the issue! You are quite right about the
conditional use of constexpr and Boost's decision not to pursue binary
compatibility across C++ dialects. A bug report to boost to make the
error more easily identifiable may be warranted, depending on your
thoughts on the root of the issue, discussed below.

I wonder if Boost may have been a bit of a red herring as to whether
there is a clang issue (or potential improvement) for this case. So
I've worked out a more minimal example which does not rely on Boost:

-8<--libawesome.hpp---------------------------------------------------
#if __has_feature(cxx_constexpr)
#define CONSTEXPR_OR_CONST constexpr
#else
#define CONSTEXPR_OR_CONST const
#endif

inline double dummy(const double* num) {
    return 1.0L;
}

inline CONSTEXPR_OR_CONST double maybe_constexpr(double v)
{
   return v;
}

template <class Awesome>
struct awesome_initializer
{
   struct init
   {
      init()
      {
         Awesome::calc_stuff();
      }
      void force_instantiate()const{}
   };
   static const init initializer;
   static void force_instantiate()
   {
      initializer.force_instantiate();
   }
};
template <class Awesome>
typename awesome_initializer<Awesome>::init const awesome_initializer<Awesome>::initializer;

struct awesome
{
   static double calc_stuff()
   {
      awesome_initializer<awesome>::force_instantiate();
      static const double num[1] = {
         maybe_constexpr(1.0)
      };
      return dummy(num);
   }
};
-8<--libawesome.hpp---------------------------------------------------

-8<--libcrash11.cpp---------------------------------------------------
#include "libawesome.hpp"

double libfunc() {
    return awesome::calc_stuff();
}
-8<--libcrash11.cpp---------------------------------------------------

-8<--crash11.cpp------------------------------------------------------
#include "libawesome.hpp"

int main(int,char**) {
    awesome::calc_stuff();
    return 0;
}
-8<--crash11.cpp------------------------------------------------------

As before, they can be compiled as follows:
clang++ -dynamiclib -o libcrash11.dylib libcrash11.cpp
clang++ -std=c++11 -L. -lcrash11 -o crash11 crash11.cpp

Also as before, executing crash11 will result in an EXC_BAD_ACCESS
error and program crash.

It is worth noting that the error can be caused by a header-only
library shared (possibly as an internal implementation detail) by the
program and a library on which it depends, even when neither the
program nor library use any C++11 features. Also note that only
fundamental types are passed between the program and the library (in
the example, only the double returned by the library function), making
the error all the more unexpected.

Is there anything which can be done to improve the situation in cases
like this, or is some hairy static initializer debugging just the
price that everyone has to pay for making this sort of mistake?

Thanks for considering once again,
Kevin

I would have hoped that name mangling catches this problem via a linker error.

-erik

This is a known issue that we’ve discussed in the past (though I can’t find a record of the discussion right now). The issue, in the general case, is that it’s possible for a common variable (local static or static data member of a class template specialization or variable template specialization) to have static initialization in one translation unit and dynamic initialization in another.

The approach we came up with was:

  • put the variable and its guard variable in a COMDAT together, as the ABI document suggests (LLVM doesn’t yet have the machinery for this)
  • if we constant-initialize a common variable, but we can’t prove that every other translation unit that sees it will also constant-initialize it, emit a guard variable statically initialized to the ‘already initialized’ value

… but we’ve not implemented anything like this yet.

That explains it very well. Thanks!

Do you know if there is a bug filed for tracking progress on the
issue? I didn't see anything in Bugzilla but may not have been
searching with the right keywords.

Cheers,
Kevin

I don't know of one; please go ahead and file one. Duplicates are cheap :slight_smile:

Good point. Sorry for the delay. I've now filed Bug 19491[1] for
tracking the issue anyone interested can follow/comment there.

Thanks again!

Kevin

1. http://llvm.org/bugs/show_bug.cgi?id=19491