[libc++] r160604 appears to have broken libc++ on linux

I have a mostly working clang toolchain on Linux using libc++abi and
libc++. But today I pulled the latest updates to libc++ and rebuilt
it, and now any attempt to construct a std::stringstream fails.

#include <sstream>

int main(int argc, char* argv[]) {
    std::stringstream ss;
    return EXIT_SUCCESS;
}

This dies with an unhandled std::bad_cast exception:

terminating with uncaught exception of type std::bad_cast: std::bad_cast

Program received signal SIGABRT, Aborted.
0x00007ffff75f0475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) where
#0 0x00007ffff75f0475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x00007ffff75f36f0 in *__GI_abort () at abort.c:92
#2 0x00007ffff7fcd457 in abort_message (format=<optimized out>) at
../../../src/libcxxabi/src/abort_message.cpp:47
#3 0x00007ffff7fcd6c2 in default_handler (cause=0x7ffff7fe3ba3
"uncaught") at ../../../src/libcxxabi/src/cxa_default_handlers.cpp:61
#4 0x00007ffff7fcd54d in default_terminate_handler () at
../../../src/libcxxabi/src/cxa_default_handlers.cpp:81
#5 0x00007ffff7fe0676 in std::__terminate (func=0xeed9) at
../../../src/libcxxabi/src/cxa_handlers.cpp:67
#6 0x00007ffff7fdfce6 in failed_throw (exception_header=<optimized

) at ../../../src/libcxxabi/src/cxa_exception.cpp:147

#7 __cxa_throw (thrown_object=0x406090, tinfo=<optimized out>,
dest=<optimized out>) at
../../../src/libcxxabi/src/cxa_exception.cpp:242
#8 0x00007ffff7e8f857 in std::__1::locale::__imp::use_facet
(this=0x7ffff7faba70, id=28) at
/home/acm/Documents/Develop/externals/clang-toolchain/src/libcxx/src/locale.cpp:388
#9 0x00007ffff7e90633 in std::__1::locale::use_facet
(this=0x7fffffffe540, x=...) at
/home/acm/Documents/Develop/externals/clang-toolchain/src/libcxx/src/locale.cpp:530
#10 0x0000000000401e8a in init (this=0x7fffffffe310,
__sb=0x7fffffffe2a8) at /home/acm/opt/include/c++/v1/ios:660
#11 basic_istream (this=0x0, vtt=0x7fffffffe660, __sb=0x7fffffffe2a8,
this=0x0, vtt=0x7fffffffe660) at
/home/acm/opt/include/c++/v1/istream:294
#12 basic_ios (this=0x7ffff7e109c0, this=0x7ffff7e109c0, vtt=0x404788,
__sb=0x7fffffffe2a8, this=0x7ffff7e109c0, __wch=32767) at
/home/acm/opt/include/c++/v1/istream:1488
#13 basic_stringstream (this=0x7fffffffe290, __wch=24) at
/home/acm/opt/include/c++/v1/sstream:809
#14 main (argc=1, argv=0x7fffffffe668) at ./test.cpp:26
(gdb)

In addition to my trivial test case, many of the libc++abi and libc++
unit tests fail with similar exceptions when r160604 is applied.

If I revert libc++ back to r160594 things start working again.

The rest of toolchain was built with the following component revisions:
llvm: r160611
clang: r160613
libc++abi: r160553

It is not immediately obvious to me how r160604's noexcept and
constexpr changes to std::mutex could cause this. Valgrind didn't have
anything interesting to say. Any suggestions about where to start
looking?

Thanks,
Andrew

It isn't obvious to me either.

In locale.cpp, this throw is happening:

const locale::facet*
locale::__imp::use_facet(long id) const
{
#ifndef _LIBCPP_NO_EXCEPTIONS
    if (!has_facet(id))
        throw bad_cast();
#endif // _LIBCPP_NO_EXCEPTIONS
    return facets_[static_cast<size_t>(id)];
}

But I have no idea why you would be throwing now, and not without the noexcept declarations on mutex et al. in r160594. I'm not replicating this on Mac OS X.

Here's has_facet:

    bool has_facet(long id) const
        {return static_cast<size_t>(id) < facets_.size() && facets_[static_cast<size_t>(id)];}

which is just range checking the vector of facet pointers which should be constructed by now.

This looks to be happening while default constructing a locale. But default constructing a locale should not be calling use_facet or has_facet. This stack trace should be diving into make_global and make_classic in locale.cpp, which call this constructor:

locale::__imp::__imp(size_t refs)

which does not lead to use_facet or has_facet.

So it looks like some kind of corruption going on somewhere.

Howard

Hi Howard -

Thank you for your help investigating this. I reduced the test case a bit:

#include <istream>

int main(int argc, char* argv[]) {
    std::istream is(NULL);
    return EXIT_SUCCESS;
}

and rebuilt everything with all types of optimization turned off, and
I have a more understandable stack trace now, and something that
suggests how r160604 is causing trouble for me.

The following stack trace is from an interactive step through of the
above program, compiled with -stdlib=libc++, and paused right before
has_facet returns false, which in turn will make use_facet throw
bad_cast. Here, 'has_facet' will return false because 'id' is not less
than facets_.size():

(gdb) up
#1 0x00007ffff7f4fa7f in std::__1::locale::__imp::use_facet
(this=0x7ffff7ff6810, id=28) at libcxx/src/locale.cpp:387
387 if (!has_facet(id))
(gdb) down
#0 std::__1::locale::__imp::has_facet (this=0x7ffff7ff6810, id=28) at
libcxx/src/locale.cpp:106
106 {return static_cast<size_t>(id) < facets_.size() &&
facets_[static_cast<size_t>(id)];}
(gdb) where
#0 std::__1::locale::__imp::has_facet (this=0x7ffff7ff6810, id=28) at
libcxx/src/locale.cpp:106
#1 0x00007ffff7f4fa7f in std::__1::locale::__imp::use_facet
(this=0x7ffff7ff6810, id=28) at libcxx/src/locale.cpp:387
#2 0x00007ffff7f50533 in std::__1::locale::use_facet
(this=0x7fffffffe428, x=...) at libcxx/src/locale.cpp:530
#3 0x00007ffff7f44a9c in std::__1::use_facet<std::__1::ctype<char> >
(__l=...) at libcxx/include/__locale:164
#4 0x00007ffff7f8c63c in std::__1::basic_ios<char,
std::__1::char_traits<char> >::widen (this=0x7fffffffe4d8, __c=32 ' ')
    at libcxx/include/ios:725
#5 0x00007ffff7f8c183 in std::__1::basic_ios<char,
std::__1::char_traits<char> >::init (this=0x7fffffffe4d8, __sb=0x0)
    at libcxx/include/ios:661
#6 0x00007ffff7f8eddf in std::__1::basic_istream<char,
std::__1::char_traits<char> >::basic_istream (this=0x7fffffffe4c8,
__sb=0x0)
    at libcxx/include/istream:294
#7 0x00000000004008d5 in main (argc=1, argv=0x7fffffffe668) at
./libc++.stringstream.crash.cpp:4
(gdb) print id
$25 = 28
(gdb) print facets_.size()
$26 = 28

So we get to the whole use_facet/has_facet call chain by way of
basic_ios<char>::widen which is explicitly called from basic_ios::init
to initialize the __fil_ member. So I think the stack trace is
plausible without any sort of corruption.

One thing I noticed while doing the above step through was that
locale::id::__get and locale::id::__init were involved in producing
the value passed to use_facet, and these functions use std::once_flag
which changed in r160604.

I rolled back to r160594, and hacked in some printfs to locale::id
__get and __init, and made once_flag.__state_ public:

long
locale::id::__get()
{
    printf("XXX locale::id::__get: before call_once: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

    call_once(__flag_, __fake_bind(&locale::id::__init, this));

    printf("XXX locale::id::__get: after call_once: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

    printf("XXX locale::id::__get: this(%x), will return: %ld\n",
this, __id_ - 1);

    return __id_ - 1;
}

void
locale::id::__init()
{
    printf("XXX locale::id::__init before increment: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

    __id_ = __sync_add_and_fetch(&__next_id, 1);

    printf("XXX locale::id::__init: after increment: this(%x),
__flag_.__state_(%lu), __id_(%ld), __next_id(%d)\n",
           (void *)this, __flag_.__state_, __id_, __next_id);

}

With a libc++ tree based on r160594, the calls to the above functions
after 'main' starts emit the following logs:

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe668) at
./libc++.stringstream.crash.cpp:4
4 std::istream is(NULL);
(gdb) c
Continuing.
XXX locale::id::__get: before call_once: this(401e20),
__flag_.__state_(18446744073709551615), __id_(3), __next_id(28)
XXX locale::id::__get: after call_once: this(401e20),
__flag_.__state_(18446744073709551615), __id_(3), __next_id(28)
XXX locale::id::__get: this(401e20), will return: 2
[Inferior 1 (process 16448) exited normally]

If I pull the r160604 updates however, the log looks like this:

Temporary breakpoint 1, main (argc=1, argv=0x7fffffffe668) at
./libc++.stringstream.crash.cpp:4
4 std::istream is(NULL);
(gdb) c
Continuing.
XXX locale::id::__get: before call_once: this(401e30),
__flag_.__state_(0), __id_(3), __next_id(28)
XXX locale::id::__init before increment: this(401e30),
__flag_.__state_(1), __id_(3), __next_id(28)
XXX locale::id::__init: after increment: this(401e30),
__flag_.__state_(1), __id_(29), __next_id(29)
XXX locale::id::__get: after call_once: this(401e30),
__flag_.__state_(18446744073709551615), __id_(29), __next_id(29)
XXX locale::id::__get: this(401e30), will return: 28
terminating with uncaught exception of type std::bad_cast: std::bad_cast

Program received signal SIGABRT, Aborted.
0x00007ffff75f0475 in *__GI_raise (sig=<optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.

There are many lines logged before main starts for both cases, and I
can provide those if it will be helpful, but the main difference
between r160594 and r160604 seems to be that after r160604 the state
flag is not set 'on' in the printfs after main starts, so __next_id
gets incremented when it shouldn't.

All of this of course could be because of some deeper level corruption
in my admittedly hacked up linux libc++ stack, but if you have any
suggestions for next steps I would appreciate it.

Thanks,
Andrew

I'm working on a theory. Question: Are you building libc++ with -std=c++11?

Howard

I just double checked, and it is building libc++ with -std=c++0x. I'm
using the CMake build for libc++, which passes it automatically. Here
is a representative compile line:

/home/acm/opt/bin/clang++ -Dcxx_EXPORTS -DNDEBUG
-I/home/acm/opt/include -g -fPIC
-I/home/acm/Documents/Develop/externals/clang-toolchain/src/libcxx/include
   -nostdinc++ -std=c++0x -Wall -W -Wno-unused-parameter
-Wwrite-strings -Wno-long-long -pedantic -fPIC -o
CMakeFiles/cxx.dir/__/src/bind.cpp.o -c
/home/acm/Documents/Develop/externals/clang-toolchain/src/libcxx/src/bind.cpp

My CMake and make invocation looks like this:

#!/bin/bash -ex

export CC=/home/acm/opt/bin/clang
export CXX=/home/acm/opt/bin/clang++

export CXXFLAGS="-I/home/acm/opt/include"
LINKOPTS="-L/home/acm/opt/lib -lc++abi -Wl,-z,origin -Wl,--no-undefined"

Prefix=/home/acm/opt/

( cmake -DLIT_EXECUTABLE=/home/acm/Documents/Develop/externals/clang-toolchain/src/llvm/utils/lit/lit.py
\
        -DCMAKE_BUILD_TYPE=Debug \
        -DCMAKE_INSTALL_PREFIX=$Prefix \
        -DCMAKE_INSTALL_RPATH=\$ORIGIN/../lib \
        -DCMAKE_SHARED_LINKER_FLAGS="$LINKOPTS" \
         ../../../src/libcxx && \
make -j2 all VERBOSE=1 && \
make install ) \
2>&1 | tee ../libcxx.build.log

Thanks,
Andrew

Ok, thanks, and with -std=c++0x, is constexpr turned on?

#if __has_feature(cxx_constexpr)
#error has constexpr
#else
#error doesn't have constexpr
#endif

int main()
{
}

Howard

Ok, thanks, and with -std=c++0x, is constexpr turned on?

#if __has_feature(cxx_constexpr)
#error has constexpr
#else
#error doesn't have constexpr
#endif

int main()
{
}

Howard

$ cat ./libc++.stringstream.crash.cpp
#include <istream>

#if __has_feature(cxx_constexpr)
#error has constexpr
#else
#error doesn't have constexpr
#endif

int main(int argc, char* argv[]) {
    std::istream is(NULL);
    return EXIT_SUCCESS;
}

$ ~/opt/bin/clang++ ./libc++.stringstream.crash.cpp -std=c++11
-stdlib=libc++ -lc++abi -I/home/acm/opt/include/c++/v1
-L/home/acm/opt/lib -Wl,-rpath,/home/acm/opt/lib -g
./libc++.stringstream.crash.cpp:4:2: error: has constexpr
#error has constexpr
^
1 error generated.

Yes, constexpr is turned on.

Ok, I'm testing a solution that looks like this:

Index: include/__locale

+cfe-dev

Ok, I'm testing a solution that looks like this:

Index: include/__locale

--- include/__locale (revision 160055)
+++ include/__locale (working copy)
@@ -119,7 +119,7 @@

     static int32_t __next_id;
public:
- _LIBCPP_INLINE_VISIBILITY id() {}
+ _LIBCPP_INLINE_VISIBILITY _LIBCPP_CONSTEXPR id() :__id_(0) {}
private:
     void __init();
     void operator=(const id&); // = delete;

Could you test that out as well? My hope is that this will force the id() constructor to be constructed at compile time.

Thanks,
Howard

I just tried it out and my little test passes now. So this looks very
promising! I will try getting the libc++abi and libc++ unit tests
going next.

Thanks,
Andrew

Hi Howard -

Sorry for the delay getting to the tests.

After applying your patch all of the libc++abi tests and the majority
of the libc++ tests are passing again. The small set of failing libc++
tests is consistent with pre-r160604 behavior.

Thanks,
Andrew