[RFC] Stop supporting extern instantiations with GCC

Currently, libc++ allows instantiating extern templates with both GCC and Clang, since the standard requires support for that with at least one user provided type (e.g. extern template class std::vector<MyType>; is allowed). libc++ avoids actually instantiating anything by applying [[clang::exclude_from_external_instantiation]] very liberally (i.e. every function that shouldn’t be exported from our dylib). Because of that, extern instantiations work, but are mostly useless with libc++. GCC does not have this attribute currenlty, so we apply always_inline on every function to achieve a similar effect. This approach has many problems; the main one being terrible code gen. Because of this, our CI takes more than twice as long with GCC compared to using Clang.

I propose to not support using extern instantiations with libc++ and GCC until/unless gcc implements [[gnu:exclude_from_external_instantiation]], or whatever the name will be (see 110000 – GCC should implement exclude_from_explicit_instantiation). That would allow us to drop always_inline from most functions, resulting in massively better code gen and reduced compile times. Doing that will make libc++ slightly non-conforming, but IMO that’s better than having terrible code gen and compile times.

This will not affect many users, since most people don’t externally instantiate any types, and it’s only actually problematic when instantiating a template in an ABI sensitive way (e.g. in a dylib). All the functions have hidden visibility, so there should be a linker error when someone tries to instantiate a type in an ABI sensitive way.

CC @mordante @ldionne @varconst @jwakely @fweimer-rh

The problem here is that “dropping support for extern instantiations” concretely means that users with existing and conforming code that contains extern instantiations will start getting linker errors.

The other thing we could do is avoid using hidden visibility, avoid using ABI tagging and mark only the hide-from-abi functions from classes that libc++ itself instantiates externally as always_inline on GCC. But then we would basically have no control over our ABI surface when using GCC, so we’d have to document that serious caveat and users could get broken by us changing implementation detail functions in ODR-unsafe ways. So at that point I’d argue we’re basically not supporting users on GCC regardless of what our official messaging is.

I think the only real options are:

  1. Keep the status quo with terrible code gen and compile-times
  2. Get exclude_from_explicit_instantiation on GCC
  3. Drop official support for GCC and say “whatever works works”

(3) is really bad cause it’s important for libc++ to keep supporting more than one compiler, it helps with conformance and people use this combination anyway.

(1) is also really bad because some libraries are becoming unusable due to always_inline (e.g. std::format). At some point, it might become hard to pretend that we support GCC when parts of the library like std::format don’t work with that compiler.

IMO we should just really aim to get (2) implemented in GCC since it’s pretty simple. Until then, I’d keep the status quo and if things haven’t moved in like a year, then we need to reconsider whether we are actually able to support GCC in a meaningful way.

Attempt to mitigate the impact of this issue on our CI: ⚙ D152736 [libc++] Mark slow tests as unsupported on GCC

How does libstdc++ handle the ABI stability questions at play here?

When I spoke to @jwakely about this, what I understood is that they don’t change any function (including implementation details) in a way that an ODR violation would be non-benign. So they don’t change pre/post conditions, for example. I am a bit skeptical that this works as intended, since it’s extremely easy to make a mistake – for example if you remove a function and then reintroduce a function with the same name a few years later in another release, you’ve technically broken ABI in case someone happens to link together code compiled against those two versions.

In practice, I suspect this might not affect a lot of people (especially on Linux where everything is built against the system libstdc++). But it’s still too hand wavy IMO, especially since we have a systematic approach that works (and only requires a simple attribute).

I agree the issue described is a real problem. The resource hunger with the current solution is quite large and causes several tests to be unable to be used in the CI due to out of memory issues.

I dislike a possible solution that makes libc++ non-conforming. I don’t think we do our users a service by possibly breaking their code. Terrible code gen and long compile times aren’t great, but they are a QoI issue and not a conformance issue.

Looking at the discussion is the GCC bug report I don’t get the feeling the proposal to add the exclude_from_external_instantiation attribute is rejected. So I really would like to see whether it’s possible to implement that attribute. If not we can consider alternative solutions for this issue.

Is there a writeup somewhere on libc++'s ABI guarantees/contract/limitations and the techniques used to implement them on different compilers?

Slapping always_inline everywhere can result in errors when the compiler is unable to inline everything, so this might actually be a conformance issue too. We just didn’t hit any problems in user code yet; we already have problems with that in our own code.

https://github.com/llvm/llvm-project/blob/1c532b5e44fa1fbff84c494c659fe722b7df4b10/libcxx/include/__config#L607 has some information, but it’s not super detailed.

Hmm, so is the issue that you don’t want explicit instantiations to cause functions to be exported across shared library boundaries?

The hidden visibility thing doesn’t work for this case in some way?

That is the ABI part of the problem.

hidden visibility does work, but isn’t enough. If we just make the functions hidden, the library consumer gets linker errors, since it expects an external instantiation which it doesn’t find.

hidden visibility does work, but isn’t enough. If we just make the functions hidden, the library consumer gets linker errors, since it expects an external instantiation which it doesn’t find.

Ah, right (& this is all about dynamic library boundaries, yeah? That’s where the hidden visibility would have an effect - no bearing on static libraries, which I guess libc++ says don’t get this ABI stability guarantees/need to be more tightly version locked?)

So they don’t change pre/post conditions, for example. I am a bit skeptical that this works as intended, since it’s extremely easy to make a mistake

It’s been around a while - is there much data on whether it’s caused significant issues?

(& do they use the inline namespace abi versioning for some things too? Or decided that wasn’t useful to them?)

But it’s still too hand wavy IMO, especially since we have a systematic approach that works (and only requires a simple attribute).

Though that approach comes at some costs, right? Like code size? (not being able to depend on names coming from another shared library - and hurts static libraries too, that could depend on explicit instantiations?)

Edit: This reply expands on the ABI tag part of our ABI strategy, which is relevant to linking libc++ statically or using different versions of the libc++ headers within the same program. I’m not talking about explicit instantiations specifically – just to clear out any possible confusion.

Yes, visibility is only about dynamic library boundaries. Libc++ does provide ABI guarantees when the static library is used, however, and that’s where the ABI tag we use in _LIBCPP_HIDE_FROM_ABI comes into play. Long story short, it ensures that each function has a different mangled name from release to release, so if you happen to link functions from two different releases of libc++ into the same program, there won’t be an ODR violation.

@jwakely is the one who can chime in on that, I don’t know what their experience with respect to ABI issues in “linkonce_odr” functions has been like. I also don’t know whether libstdc++ supports the same variety of usage scenarios as libc++, e.g. they might not provide strong ABI guarantees when you link the library statically (or they might, I just don’t know). But I know that we do strive to provide the same guarantees for shared and static libraries.

There is no penalty if you use only one version of libc++ throughout your whole program. In that case, you get one version of each HIDE_FROM_ABI function after ODR-deduplication by the linker since they all have the same ABI tag. And for EXPORTED_FROM_ABI functions, those are not ABI-tagged because we do commit to ABI stability, and so you also get only one version after ODR deduping.

If you do have multiple versions of libc++ statically linked (or use multiple versions of the headers) in your program, then you may have multiple copies of some implementation-detail functions with different ABI tags. If those functions are ODR-equivalent, then yes you have a bit of code duplication (assuming the compiler doesn’t optimize this duplication). If the functions are not ODR-equivalent, then having different names for those functions is necessary for correctness.

See the caveat about std::rotate at GCC 9 Release Series — Changes, New Features, and Fixes - GNU Project for the only example I can think of where we introduced a bug and it wasn’t caught before it got into a release.

(& do they use the inline namespace abi versioning for some things too? Or decided that wasn’t useful to them?)

Heck no. We want the ABI to be stable across releases, we don’t want to intentionally keep breaking it with every release and then attempt heroics to prevent those versioned symbols ever leaking into user objects.

I don’t know about Clang, but with GCC it’s absolutely not safe to use always_inline everywhere. You will cause hard errors that users have no way to disable (short of modifying the std::lib headers or overriding the ABI macros that add the attribute). If it’s not breaking anybody yet, you don’t have many people using libc++ with g++. I can see no other possible explanation.

This seems to imply that it happens somewhat more frequently, just that you caught the problem most of the times before a release. Is that inference correct?

AFAICT not being able to inline is mostly a problem with circular dependencies (at least that’s what I’ve seen). Given that most functions in the stdlib don’t call themselves recursively, I’m not sure how often it would actually happen in the wild. I haven’t seen a case yet where the caller has an influence on whether everything can be inlined, so this might not actually be a problem. Either way, libc++ and gcc is definitely not the most common combination, so it might just be a bug waiting to happen.

Because we try to avoid making ABI-breaking changes in headers, it usually doesn’t matter whether you link statically or dynamically, it will often work either way.

There are times that isn’t true though, specifically when linking to extern symbols defined in libstdc++.a which cannot use ELF symbol versioning as we do for libstdc++.so. That means if you compile some objects with an old GCC and then try to statically link to a new libstdc++.a you might (in rare cases) get a new symbol that doesn’t match the old declaration in the header. This applies if you e.g. compile code using the “C++0x” era GCC 4 and then link to the libstdc++.a from GCC 5, but that’s not supported anyway (C++11 support wasn’t considered stable until GCC 5.x so trying to combine objects using std::chrono compiled with GCC 4.x can’t be combined with objects compiled with GCC 5+).

A more recent case that doesn’t involve an unsupported combination of compilers is the fix for 103382 – condition_variable::wait() is not cancellable because it is marked noexcept where code compiled with GCC 11 headers will think the member is non-throwing but the definition in libstdc++.a from GCC 12+ can throw an exception that calling code isn’t prepared for. If you link dynamically, it works (the linker selects the old symbol that will terminate if the thread is cancelled while waiting in a condition_variable, matching the function in the old headers).

So when linking statically, you need to avoid mixing GCC versions (or be responsible for auditing any changes between versions and whether they affect your code). For most linux distros, static linking is strongly advised against anyway, and often not covered by commercial support, so you’re already on your own when static linking.

I should add that we certainly use the abi_tag attribute when we do need to change something in an ABI-incompatible way, but we will do so at the function level, tagging individual functions that changed semantics. We don’t use a blanket tag on the entire namespace and so bump the name of every symbol on every release.

For example, filesystem::path::u8string() changed return type between C++17 and C++20, so we use an abi_tag there to distinguish them. I don’t think we have any cases where we use abi_tag for ABI changes that are always enabled, rather than conditional on some compiler option or macro definition.

More frequently than once in the entire history of GCC? Yes :slight_smile:
But not often.

I found these examples:

  1. 87822 – [6/7/8/9 Regression] Binary incompatibility in std::pair introduced by PR 86751 (present in the gcc 6.5.0 release, as noted in the GCC 6 Release Series — Changes, New Features, and Fixes - GNU Project caveats).
  2. 108331 – [13 Regression] ABI break of std::__c_file and std::fstream for win32 thread model of GCC for windows (fixed before the 13.1.0 release).
  3. 99341 – [11 Regression] new std::call_once is not backwards compatible (reverted before the 11.1 release).
  4. 85222 – [7 Regression] ABI breakage of __throw_ios_failure by r244498 (fallout from the std::string changes in GCC 5 where ios::failure could not be caught across API boundaries that didn’t use the same std::string ABI, eventually fixed using black magic).
  5. 92285 – Layout of istreambuf_iterator subobject depends on -std mode was an incompatibility between C++98 mode and >= C++11 modes, that went unnoticed for years.

So it certainly happens. Would it have been better to use a versioned tag on the whole of std? It wouldn’t have helped the first one, since I’d backported the bug to the gcc-6 branch, and we certainly don’t want gcc-6.4.0 and gcc-6.5.0 to use different symbol names for the entire library (we also don’t want bugs to be backported though …). It wouldn’t have helped the last one either, unless you use a different tag for C++98 mode and C++11 mode, but then that prevents mixing -std modes within the same program, which is something we definitely want to support. The 2nd and 3rd ones would not have needed to be reverted if the new release used different symbol names, and I could have switched to the better call_once impl, so that would have been nice. The 4th one would just never have allowed catching exceptions thrown by different versions at all, or if exceptions are exempt from the tagging, the incompatibility would still have been present even with the rest of std tagged. So I don’t think tagging would have been better really.

Then there are intentional ABI changes between releases like 95609 – span<T> could have better layout where we changed the ABI of std::span for GCC 11.1 making it incompatible with GCC 10. But since it’s not supported to mix C++20 objects built with GCC 10 and GCC 11 (because C++20 support is still “experimental”) that can only fail in programs that are already not supported.

AFAICT not being able to inline is mostly a problem with circular dependencies (at least that’s what I’ve seen).

A large function that would increase stack size too much can also cause an always_inline error.

Thanks for all the information Jonathan, this is extremely insightful.

So I think the goal here is not to say whether one standard library does it better than the other. Both libraries provide slightly different guarantees, have slightly different (but mostly overlapping) use cases and that changes what makes sense for both libraries. I think that’s fine, and if having no annotations at all works well for libstdc++, I think that’s really ideal since that means less clutter in the code.

However, for libc++, I can’t imagine a world where we wouldn’t have some systematic way of protecting us against these issues. For example, I wouldn’t want to have to explain to first-time contributors about being careful when changing existing header-only functions due to ABI compatibility issues, and the intricacies of those. The solution we’ve arrived at after several iterations works really well for us: it gives us the guarantees we want without code size or performance penalty and it is systematic – we even check it with clang-tidy so we never forget to apply the necessary attribute. So instead, I just have to handwave about applying _LIBCPP_HIDE_FROM_ABI to anything that shouldn’t be guaranteed ABI stable and call it a day, folks are pretty happy with that.

All in all, I think the best answer here would just be for GCC to implement the attribute. It’s super easy to implement (at least in Clang) and it provides anyone with control over what gets instantiated as part of an explicit instantiation, which is arguably nice. Until then, the state of support for the GCC / libc++ combination will be less than ideal and it will keep deteriorating as more things like std::format get added to the standard.

:100:

I think it would be a lot more likely to get added to gcc if somebody provided a patch, rather than just asking for it. Most existing g++ contributors have no interest in libc++ and it’s not going to be used by libstdc++.

I would start by looking at how the inline template extension is implemented: Template Instantiation (Using the GNU Compiler Collection (GCC))

I agree. Let’s just say there are some non technical challenges to contributing to GCC for some folks. If someone wants to have a stab at it, I’ll be happy to go through what my implementation of the attribute does in Clang with them, in case that’s useful to figure out what needs to be done in GCC.