How do _LIBCPP_DEBUG=1 "debug iterators" ever work, when linking against libc++.a?

Hi folks,

I recently added a buildkite CI step that runs libc++'s test suite with -D_LIBCPP_DEBUG=1. This turns on “debug iterators,” which is a thing in the library that maintains a collection of all known container instances and iterator instances, and updates the collection on every container-construction, container-destruction, container-modification, iterator-construction, iterator-destruction, or iterator-modification. This enables it to print an error message on any iterator-modification, iterator-comparison, or iterator-dereference that violates the iterator-invalidation guarantees of the Standard.
https://reviews.llvm.org/D100866 was the review that added that CI step.

I marked a bunch of tests as “LIBCXX-DEBUG-FIXME”, with the intent of fixing them all quickly. The patches for all-but-one of those FIXMEs are:
https://reviews.llvm.org/D101674

https://reviews.llvm.org/D101675

https://reviews.llvm.org/D101676

https://reviews.llvm.org/D101677

https://reviews.llvm.org/D101678

https://reviews.llvm.org/D101679

I’ve just tracked down the very final LIBCXX-DEBUG-FIXME. Unfortunately, it’s a doozy.

The symptom is a segfault (null pointer dereference) in
libcxx/test/std/input.output/filesystems/fs.op.funcs/fs.op.proximate/proximate.pass.cpp

It’s a segfault instead of a _LIBCPP_ASSERT-failure, because the compiled-to-binary filesystem library was not built with _LIBCPP_ASSERT enabled. Okay, no problem, I should enable debug assertions in the library for this particular CI builder. I believe this is the patch for that:

--- a/libcxx/cmake/caches/Generic-debug-iterators.cmake

+++ b/libcxx/cmake/caches/Generic-debug-iterators.cmake
@@ -1,2 +1,3 @@
set(LIBCXX_TEST_PARAMS "debug_level=1" "additional_features=LIBCXX-DEBUG-FIXME" CACHE STRING "")
set(LIBCXXABI_TEST_PARAMS "${LIBCXX_TEST_PARAMS}" CACHE STRING "")
+set(LIBCXX_ENABLE_ASSERTIONS ON CACHE BOOL "")

Now the symptom is a _LIBCPP_ASSERT-failure. The problem is that we’re move-constructing a string, and the source string was never registered with the debug-iterators library.

Reduced test case:

cat > test.cpp <<EOF
#include <string>
void test_debug_function(std::__libcpp_debug_info const&) {
std::abort();
}
int main(int, char**) {
std::__libcpp_set_debug_function(&test_debug_function);
std::string x = "1234567890123456789012345678901234";
std::string y = x; // calls into the compiled library
std::string z = std::move(y);
}
EOF
bin/clang++ test.cpp -D_LIBCPP_DEBUG=1 lib/libc++.a lib/libc++abi.a
./a.out
Abort trap: 6

The problem seems to be that
(1) libc++.a needs to copy strings sometimes, so it contains codegen for the copy constructor
(2) libc++.a is compiled with -D_LIBCPP_DEBUG=0, i.e., assertions but no debug iterators
(3) Therefore any strings created by the copy constructor inside libc++.a, don’t get registered with the debug-iterators library
(4) Therefore pretty much everything is affected by unpredictable assert-fails??
I’m amazed that only one test out of our entire test suite seems to be affected by this issue. I would think that pretty much everything in the debug-iterators CI should be failing because of this; I’m not sure how it’s working right now.

The real question is, how was this ever supposed to work? It seems like the person-who-compiled-libc++.a and the person-compiling-their-user-code-against-libc+±headers need to agree on the setting of _LIBCPP_DEBUG:— If libc++.a has _LIBCPP_DEBUG=1 and the user code doesn’t, everything explodes; and also vice versa. But this doesn’t seem realistic.

None of the offending basic_string constructors are currently marked with _LIBCPP_INLINE_VISIBILITY. I suspect that adding _LIBCPP_INLINE_VISIBILITY could help reduce the surface area of the problem even further… but based on what I said above, I don’t see how the problem ever gets to zero, unless we can somehow prevent libc++.a from ever constructing a string (or at least never letting strings-it-constructs leak out into the user’s code where _LIBCPP_DEBUG=1 might be in effect).

https://reviews.llvm.org/D48616 may be related — it added some non-inline constructors — but the review comments don’t talk about inline or visibility at all.

What am I missing here? Thoughts?

–Arthur

Gentle bump.

Quoting myself for emphasis:

It seems the issue also happens with strings returned from
std::to_string. This function is implemented in src/string.cpp.
I ran into it a while ago while implementing the std::formatter stubs.

(I actually forgot about this issue, but Arthur reminded me.)

To me it seems unintentional and unwanted. But I don't have a suggestion
how to fix it.

Cheers,
Mark de Wever

Bump again, since this just came up AGAIN in Louis’s https://reviews.llvm.org/D103960

Thanks for bumping and bringing this back to my attention, Arthur.

So the short answer for “how was this ever supposed to work” is “I’m not sure, that was before my time”.
However, I think it was originally believed that by not using the extern template instantiations in the library
when the debug mode is enabled (see logic in __config), we would basically be using definitions from the
headers only, which would see the _LIBCPP_DEBUG=X passed by the user, and correctly register whatever
iterators needed to be registered.

I suspect that as time went by and nobody used the feature, support for it declined and eventually things
stopped working sanely. I believe you could probably fix the proximate.pass.cpp test by putting the right
set of functions in the headers (as inlines), and only using the version in the library when the debug mode
isn’t in effect. However, that is very brittle IMO.

I also do know there’s been some gripes about the debug mode ever since I started getting involved in
libc++, and I also do know that Apple has never shipped the debug mode. And when I say Apple has
never shipped the debug mode, I mean it has never shipped a library that even contained support for
the debug mode - so if you were to try enabling it while linking against the system library, you would
get linker errors like:

Undefined symbols for architecture x86_64:
“std::__1::__libcpp_db::__find_c_from_i(void*) const”

I can’t speak for other vendors, but I would assume most have been shipping support for the debug
mode (i.e. the iterator database). However, I strongly suspect no user has ever tried using it seriously
(I would love to hear about it if I’m wrong).

So, to summarize, it looks like we’ve got a debug mode with a usage story that doesn’t work. What do we do?
I see a couple options:

  1. We make the debug mode a configure-time setting that vendors can use to control whether the debug mode
    is enabled for the flavor of libc++ that they build. Users can’t control that. Then, if a vendor is interested in
    shipping it, they can start shipping a version of the library with the debug mode enabled, and add a nice
    compiler option that causes Clang to link against the debug-enabled libc++ instead of the normal libc++.
    Or something along those lines.

  2. We try to tweak the debug mode with the current usage model such that it passes our test suite. As Arthur
    nicely explained though, that means we’ll be walking in a minefield since any function compiled into the library
    will not be registering its iterators, so if they leak anywhere where the debug mode is enabled, BOOM. CI might
    be enough to catch this, but that sounds brittle to me.

  3. Eric Fiselier (CC’d) had once told me about a debug mode design he’d thought about based on ASAN (I think) that
    did not require a separate build of the library. This might be a viable path forward too, and we could investigate it
    if we have more information.

  4. We rip out support for the debug mode entirely since it has very few serious users (I know of some, but not many).

Personally, I think having a debug mode is something useful, so I would rather see us fix it than get rid of it.
Unless someone else can provide additional context around how the debug mode was originally supposed to
work, I would strongly go for (1) or (3), but not (2). Solutions (1) and (3) are the ones that will empower vendors
to do what’s right for their users most easily, at least from my perspective. If we go for (2), unless I misunderstand
something fundamental, I don’t think we’ll ever be able to get the debug mode to a point where it is sufficiently solid
so that vendors are able to ship it.

If we go for (1), I can lay out the steps to get us there (and do some of them). I’ve already thought about it some.
Eric, do you have an opinion? Would you like to expand on (3)?

Louis

Well, nobody asked, but I wrote down the steps I think we can take to get to (3) anyway to offload them from my head:

A. Decouple LIBCXX_ENABLE_ASSERTIONS and _LIBCPP_DEBUG
    - Many assertions in libc++ are unrelated to iterator debugging and they don't require any state or tracking to be useful
    - Make it possible to turn assertions on/off based on a macro when using the headers

B. Move _LIBCPP_DEBUG to __config_site, and make it non-defineable by users
    - Release note required
    - All tests that do `-D_LIBCPP_DEBUG=XXX` need to be moved to `// REQUIRES: libcpp-debug=XXX` or something like that
    - _LIBCPP_DEBUG would use _LIBCPP_ASSERT, which would now be orthogonal to _LIBCPP_DEBUG (however, we could turn on _LIBCPP_ASSERTIONS by default if the debug mode is enabled)

C. Unify LIBCXX_ENABLE_DEBUG_MODE_SUPPORT (the CMake setting) and _LIBCPP_DEBUG (the macro)
    - We should provide backing support for the debug mode (i.e. LIBCXX_ENABLE_DEBUG_MODE_SUPPORT) if and only if _LIBCPP_DEBUG is enabled in that build of the library.

As a note about step (A) here, I think it would be fine if we kept using something like `__libcpp_debug_function` that can be overriden at runtime only (but perhaps with a different name). Indeed, when we call that function, we're about to basically abort the program (or at least we know the program did something very bad). So I think it's entirely fine to pay the price of calling a function that can be replaced at runtime by the user in those cases.

Louis