Proposal: adding a c++20 early adopter mode for libc++ constexpr annotations in c++17

tl;dr we would like to ease the c++20 transition by providing an early-adopter macro to make _LIBCPP_CONSTEXPR_SINCE_CXX20 partially apply pre-c++20.

C++20 brings in lots of different changes, and with it, many different ways to break existing code. In a large codebase, this can be overwhelming not just to fix all the breakages, but to prevent backsliding of new breakages until you can finally make the transition to C++20.

One widespread issue we’ve seen is with the interaction between incomplete types and various standard library containers that now have many methods marked constexpr:

struct Foo;  // Incomplete type
void Func() {
  std::vector<Foo> foos;
  foos.size();  // c++20: error: arithmetic on a pointer to an incomplete type 'Foo'
}
struct Foo {};

Live link: Compiler Explorer

This isn’t valid even prior to c++20, but in c++17 it nonetheless compiles. c++20 is stricter due to the proliferation of constexpr on many methods, and so it fails to build in c++20.

In order to catch these issues, and further prevent backsliding in a large codebase after fixing it, it would be nice to have an “early adopter constexpr” type of flag that enables constexpr like this in an earlier standard. We have _LIBCPP_CONSTEXPR_SINCE_CXX20, but unfortunately we can’t just change that: that also puts constexpr in places that aren’t allowed until c++20, namely:

  1. Destructors
  2. operator""s on std::string

We believe we need to split _LIBCPP_CONSTEXPR_SINCE_CXX20 into two macros, one that is non-fatal if applied to an earlier standard, and one that cannot ever be applied pre-c++20. The proposal is to change __config to this:

// The normal macro we want to override in most places
#  if _LIBCPP_STD_VER > 17 || \
    (_LIBCPP_STD_VER == 17 && defined(_LIBCPP_EARLY_CXX20_CONSTEXPR))
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20 constexpr
#  else
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20
#  endif

// A new preprocessor macro that can't be overridden, because
// prior languages version don't allow it.
#  if _LIBCPP_STD_VER > 17
// Read as "Since c++20, but not allowed in c++17"
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17 constexpr
#  else
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17
#  endif

Alternatively, we could keep define these two macros more normally for the supported case, but rely on #undef for the early adopter case, e.g.:

#  if _LIBCPP_STD_VER > 17
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20 constexpr
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17 constexpr
#  else
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17
#  endif

#  if defined(_LIBCPP_EARLY_CXX20_CONSTEXPR)
#    undef _LIBCPP_CONSTEXPR_SINCE_CXX20
#    define _LIBCPP_CONSTEXPR_SINCE_CXX20 constexpr
//   Don't redefine _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17
#  endif

Once we split this macro into two, there are 16 annotations in 6 files (e.g. ~vector()) that we would change from _LIBCPP_CONSTEXPR_SINCE_CXX20 to _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17

Does this sound reasonable? If so, I or someone on our team can post a patch with the preferred direction.

Other discussion points

While this would help with a c++20 migration, some potential downsides:

  • This could create a strange “not c++17 but also not c++20” dialect. Users of this should be aware that it is non-standard to be in between modes, and it should only be used as a stepping stone until full c++20 migration can conclude.
  • This adds another macro to libc++, which increases libc++ maintenance burden. If complexity is a concern, option 2 might be more appealing since it makes the non-early-adopter mode more obviously correct. If it would help, we could also set an expiration date for this flag, e.g. one of the LLVM release cuts in 2025, where we remove this and tell people they need to do a full c++20 switch instead of partial migration.
  • Even if we can’t have the _LIBCPP_EARLY_CXX20_CONSTEXPR macro in tree, it would still help significantly to have the second _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17 macro in tree for the rest of libc++ to use. For those willing to maintain a downstream patch, it’s significantly easier to maintain a single local patch in __config than it is to maintain local patches across various libc++ headers.
  • Bikeshed color 1: do we like _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17 as a name? Another choice is a pattern like _LIBCPP_CONSTEXPR_SINCE_CXX20_STRICT (or “sticky”, “absolute”) to indicate this macro should be used when it absolutely must be c++20, it can’t be overridden, even for early adopters.
  • Bikeshed color 2: Instead of a generic macro, we have individual macros for use cases for why it can’t be constexpr, e.g. _LIBCPP_CONSTEXPR_SINCE_CXX20_DESTRUCTORS and _LIBCPP_CONSTEXPR_SINCE_CXX20_LITERALS for the two use cases mentioned, and possibly more if we discover them.

One more point I forgot to mention: this proposal is a specific example of providing migration paths to enable incremental transitions in an existing codebase. I think it would be good to have a larger discussion around policy and best practices in this domain, although I don’t want to wait for that to happen before making this change.

Sorry, but I don’t think it’s an options as proposed. This results in a non-conforming mode, a lot of extra complexity and additional maintenance overhead. We are working on reducing the amount of extensions and non-conforming code, not add new stuff. You could look into using something like __attribute__((diagnose_if(!__is_complete_type(T), "value_type has to be a complete type when instantiating this function", "warning"))) though to warn pre-C++20 that the type has to be complete.

Yes, it’s definitely a non-conforming mode. Is there a path forward if we make it more obvious this is not supported? e.g. name it _LIBCPP_EXPERIMENTAL_EARLY_CXX20_CONSTEXPR, keep it out of documentation/release notes (except possibly to underscore that it’s unsupported)? Or additionally gate it by _LIBCPP_ABI_VERSION >= 2 (although this is not ABI-related, that seems to be a flag that gates lots of tweaks).

I am biased, but I don’t think it’s that much overhead. The total size of the patch is ~50 lines. In supported build modes, _LIBCPP_CONSTEXPR_SINCE_CXX20 and _LIBCPP_CONSTEXPR_SINCE_CXX20_NO_CXX17 have the same value, so there wouldn’t be a worry if a libc++ maintainer accidentally used the wrong one; it’d be up to people that care about this experimental mode to do a post-commit fix.

Certainly, I could look into that. The diff would probably be much larger, but then it should be a regular, supported config. Since it has broad impact, it might still need to be behind a build macro to turn off for targets that don’t conform.

I wasn’t able to get __is_complete_type to work. It has false positives when other templated methods reference it. In this case I only annotated size() and had a one line file that calls #include <vector>:

toolchain/include/c++/v1/vector:625:32: error: value_type has to be a complete type when instantiating this function [-Werror,-Wuser-defined-warnings]
        size_type __old_size = size();
                               ^
toolchain/include/c++/v1/vector:528:20: note: from 'diagnose_if' attribute on 'size':
    __attribute__((diagnose_if(!__is_complete_type(value_type), "value_type has to be a complete type when instantiating this function", "warning")))
                   ^           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

std::unique_ptr has a similar check that uses sizeof. That works as expected, although the error message is not ideal – it uses a generic sizeof error message instead of the one I provided:

toolchain/include/c++/v1/vector:527:32: error: invalid application of 'sizeof' to an incomplete type 'value_type' (aka 'Foo')
    __attribute__((diagnose_if(sizeof(value_type), "value_type has to be a complete type when instantiating this function", "warning")))
                               ^~~~~~~~~~~~~~~~~~
incomplete_vector.cc:6:20: note: in instantiation of template class 'std::vector<Foo>' requested here
  std::vector<Foo> foos;
                   ^
incomplete_vector.cc:3:8: note: forward declaration of 'Foo'
struct Foo;  // Incomplete type
       ^

Does that look right?

I don’t think there is a way to warn about this via diagnose_if or any other existing functionality.
__attribute__ are being evaluated when Clang instantiates a function declaration. The reason we see these errors is because adding constexpr causes the function definition (function body) to be instantiated earlier. In particular, non-constexpr function definitions are instantiated at the end of the translation unit, hence we saw no error in C++17 mode. Constexpr function definitions get instantiated on first use.

There is no way to properly add a check on the “use” of a function that would correspond to the instantiation point of newly-constexpr members of a vector.

E.g. the following code is standard-compliant, but adding __attribute__(diagnose_if(sizeof(T)… to size() causes a compiler error:

#include <vector>
struct X {
  std::vector<X> children;
}

It’s understandable that non-standard modes are undesirable and they should not be widely-supported. But having a migration path towards C++20 for large codebases is also important. It’s obviously the responsibility of libc++ users to keep their code from drifting away from the standard when using such options and dealing with errors induced. But there are ways to achieve this, e.g. one can use this option in an optional build mode and make sure the code also builds without this option.

Edit: corrected the GH link.

I agree with @philnik I expect it to be quite a maintenance burden.

  • As you mentioned this doesn’t work with some C++20 language features so we need to pay attention to the proper macro depending on: the language version introducing this feature, the compiler version used. This means we need to pay extra attention when a new paper is implemented, which takes both developer and reviewer time.
  • To properly support this we need to test it in the CI and fix bugs.
  • We need to add the same option for C++23, C++23, etc. Since it’s quite likely somebody will depend on this feature once added we need to keep supporting it.

I’m all in favour of making it easier for users to migrate to a new language version, but I feel this is quite a high burden for the libc++ developers. I also wonder how many users will use this feature in the end.

Is it not possible for your large code base to have a separate build where you enable C++20 per module/library and fix the issues per library/module and that way slowly fix all C++20 issues?

If we advertise this as experimental/transition build flag, we don’t need to claim that it’s properly supported, and we don’t need CI coverage to ensure it keeps working. We can push the support burden on users of this build mode (us) to make sure it keeps building, and any breakages to this configuration don’t need to block any libc++ work.

Practically speaking, I’m not sure how this would play out.

If we have this separate build on the side, where only a c++20 migration team is looking at it, and not part of the CI configuration that teams use, we’re going to be constantly fighting against regressions. So this kind of build is fine for an initial assessment of how much work there is to migrate to c++20, and gives some low hanging fruit for major things to fix, but doesn’t suffice for being able to lock-in progress on c++20.

Teams all have their own CI projects, so we could duplicate those and have a second CI that they also submit against that builds in c++20 mode. But then this would be a large increase on the build system (potentially doubling), which is large enough that we would rule this out. Also, we could only enable this second c++20 CI build once it actually works (we can’t ask developers to submit against an already broken CI), so we’d have to do it gradually. If library X depends on library Y, but only library X has the c++20 build running, then library Y can easily check in a change that breaks the library X build because it uses c++17-only features.

Maybe I didn’t interpret your suggestion correctly. Do those objections make sense?

Finally though, even if those problems don’t apply, there’s the mental complexity of needing to switch to c++20 all at once. Without needing to make a significant time investment, I can fairly easily understand this one specific type of failure that prevents us migrating to c++20, and also easily explain it to a small handful of randomly average c++ developers looking to help with a cleanup in their spare time. Needing to switch wholesale to c++20 means everyone needing to understand everything about c++20, and now it’s much harder to get people willing to work on a c++20 migration. It’s possible, just harder and will take longer.

In general we give proper support to experimental features. Experimental in libc++ means it’s not complete or not API/ABI stable. But the intention is to finish it future releases. So this would be special. Even when you maintain it we still need to review the changes. What happens when you migrated your projects to C++20, are you still going to maintain it or is it then up to the libc++ maintainers to maintain it.

I see your issues and it sounds quite a challenge. I understand why from your point of view it would be great to have these changes in libc++, however I’m not convinced it has upsides for libc++ itself.

I’m not familiar with your company so I’m not sure what the best approach is. Maybe it’s possible to use a custom libc++ in your CIs where you apply these changes. Then you don’t need to have the changes upstream. If these changes were accepted upstream you still would need to update the CIs so that work needs to be done anyway. Maybe that is a solution you can investigate.

Our company is Google (@rupprecht and I are both there) and we can have a local patch for this. Local patches have their cost too and we feel that it’s something that could also be useful to other folks upstream who want to deal with C++20 breakages one-at-a-time.

The scale of the codebase is what makes separating individual breaking changes worthwhile. This helps us somewhat distribute the work across our C++ developers without requiring them to know about all things that may break with C++20. It also means that we can find all instances of this particular breakage even if they are hidden by other compiler errors in C++20 mode that have not been fixed yet.

I encountered the same problem while updating libc++ at Apple. All in all I did find some issues, but the number of issues was manageable and I took the approach “fix the issue in each location and update once everything has been fixed”. I did not have issues with code regressing because I was able to make those fixes quickly enough and then make the update. This was my experience, I understand that yours may be different due to various factors.

I must say I share Mark and Nikolas’ opinions. This would be a significant maintenance burden for libc++, for a mode that is non standards compliant. It’s also not clear whether there would be an expiration date on that.

Would it be a viable solution for you to add a static_assert(sizeof(T) != 0, "need a complete type"); to vector’s methods in a downstream patch? Or maybe we even want to do that in an upstream patch so we can catch these misuses in pre-C++20 mode?

Another suggestion – it should be possible to implement a Clang warning that can check that users do not instantiate std::vector with an incomplete type. You could then enable the warning, get everyone to fix their code, and then make the switch. Implementing that warning in Clang would most likely be pretty cheap. I think that would be the best alternative.

More generally, I think that while libc++ itself can try to minimize breakage for downstream users (which we very much do, for example see all the machinery around removing transitive includes), I think this is fundamentally a problem that needs to be tackled at the level of whoever’s updating the library. It may not always be technically feasible for libc++ to provide an incremental adoption path.

Would it be a viable solution for you to add a static_assert(sizeof(T) != 0, “need a complete type”); to vector’s methods in a downstream patch? Or maybe we even want to do that in an upstream patch so we can catch these misuses in pre-C++20 mode?

Unfortunately that does not catch this misuse, otherwise those methods would fail pre-C++20 too (they use the pointer in a way that requires complete types). The problem is that Clang instantiates the bodies of constexpr functions more eagerly than those that don’t have constexpr. Pre-C++20 the function bodies are not constexpr and get instantiated after the class is complete. Starting with C++20 Clang instantiates the functions before the class is complete.

Another suggestion – it should be possible to implement a Clang warning that can check that users do not instantiate std::vector with an incomplete type. You could then enable the warning, get everyone to fix their code, and then make the switch. Implementing that warning in Clang would most likely be pretty cheap. I think that would be the best alternative.

I tried to add a warning and didn’t succeed. The actual rule from the standard is pretty subtle: users are allowed to use vector with incomplete types, it’s invalid to reference any member of vector if T is incomplete. Properly checking if type is complete in Clang has side-effect (to check if the type is complete, we may need to pick the specialization first) and this may cause compile errors. So this check cannot be implemented as a warning, it may cause compile errors in addition to producing those warnings. These warnings are also very chatty and more appropriate for clang-tidy. Overall, the alternative are much more expensive than maintaining a non-standard non-supported mode in libc++.

More generally, I think that while libc++ itself can try to minimize breakage for downstream users (which we very much do, for example see all the machinery around removing transitive includes), I think this is fundamentally a problem that needs to be tackled at the level of whoever’s updating the library. It may not always be technically feasible for libc++ to provide an incremental adoption path.

The desire to keep the set of supported configurations small makes a lot of sense. We are proposing a flag that explicitly does not need to have the same level of support as standard modes (properly documented as such). Is there any reason this is problematic?

Please note that we have considered other options and they don’t work. The downstream patch seems the only option to isolate this breakages from others.

Oh, it looks like I even commented on that patch.

This is problematic because it’s going to be a really ad-hoc patch. Presumably the rest of the code base won’t be changed to match this (for example the test suite won’t work under that flag), nor would it make sense for it to. It also creates an expectation that contributors should maybe apply one macro or the other when they constexpr-ify something, without any systematic guideline for when it makes sense to do it and when it doesn’t. Finally, it creates a precedent for doing these kinds of gymnastics in a new ad-hoc way when we already have well-defined channels to ease upgrades, like the deprecation macro infrastructure. I understand no existing channels can help you in this case, but we should still avoid creating a precedent like that.

This is essentially a vendor-specific hack to aid deployment of the library internally, and while I am extremely cognizant of that situation, I don’t think it has a place upstream. I think it would be more productive to instead work on ways to land these changes incrementally without having to do it in lockstep for all your codebase, but that’s really not something I have control over or even enough context to know whether it makes sense.