[RFC] Remove unused transitive includes from the libc++ headers

Historically almost all of libc++’s headers were public, with the exception of a few internal headers like __tree. In recent years, libc++ has been split up into many smaller internal headers. These implementation-detail headers are included when we want to reuse functionality or when composing a public header. This contrasts with what was done previously, where entire public headers were included in order to get internal helpers. These recent changes greatly reduce cyclic header dependencies in the library and result in smaller headers overall.

However, moving from including entire public headers to targeted internal headers changes the transitive includes that users can rely on, which caused a large amount of (trivial) breakage. Due to that, we have kept a list of (now unused by libc++) transitive public headers in each public header. We provide a flag (_LIBCPP_REMOVE_TRANSITIVE_INCLUDES) to allow removing these transitive includes, which significantly decreases compile times. The libc++ maintainers always expected to drop these includes at some point. The main reason to not drop them right from the start has been that there were many more headers to split up, which would have caused significant breakage every single release. The expectation was that people would rather have a single big breakage instead of somewhat smaller breakages over many releases.

We are now at a point where most of the libc++ headers are split into relatively small headers , which has significantly lowered the frequency at which we remove transitive includes. I therefore propose to bulk remove these transitive includes from the libc++ headers.

Motivation

While there are efforts to get away from headers, the reality is that most projects continue to use them and they are a significant fraction of total compile time. We should not keep reasonable compile times behind a flag for the few people in the world who read their standard library’s documentation. While disruptive, this removal can significantly improve compile times. Fast compile times is an explicit goal according to our own documentation, and we should strive to provide that. This is a list of all the libc++ headers and how their relative size changes when enabling _LIBCPP_REMOVE_TRANSITIVE_INCLUDES:

Header Include Sizes
headerc++03c++11c++14c++17c++20c++23c++26
algorithm-47.85-49.33-47.74-53.28-47.480.000.00
any-98.20-98.22-98.23-97.47-95.31-37.320.00
array-76.88-76.44-75.68-77.40-75.230.000.00
atomic-46.56-45.72-45.15-45.50-41.410.000.00
barrier-67.71-68.52-67.73-70.55-71.890.000.00
bit-99.85-99.90-99.91-99.92-72.840.000.00
bitset-53.52-50.92-51.40-54.73-57.640.000.00
charconv-99.92-99.94-99.94-52.51-45.240.000.00
chrono-92.64-92.19-92.09-92.28-38.68-2.490.00
cmath-17.51-16.96-17.65-23.33-25.030.000.00
codecvt-55.67-52.99-53.54-56.63-58.960.000.00
compare-99.93-99.96-99.96-99.96-70.910.000.00
complex-40.67-40.98-41.03-44.70-51.07-5.070.00
concepts-99.67-99.78-99.79-99.86-41.410.000.00
condition_variable-60.43-61.49-61.78-63.89-62.830.000.00
coroutine-99.94-99.96-99.96-99.96-52.160.000.00
deque-75.62-73.72-72.74-73.90-72.950.000.00
exception-62.01-62.42-62.58-64.68-65.52-38.270.00
execution-53.63-57.56-56.21-65.94-64.350.000.00
expected0.000.000.000.000.000.000.00
filesystem-99.99-99.99-99.99-53.58-58.00-6.860.00
format-99.99-100.00-100.00-100.00-50.750.000.00
forward_list-86.00-83.37-82.76-82.84-80.650.000.00
fstream-45.99-46.35-46.25-50.46-55.57-11.850.00
functional-92.84-87.96-87.50-59.39-60.47-11.910.00
future-46.48-46.94-46.85-49.95-54.52-5.340.00
initializer_list-88.54-55.04-57.33-76.01-76.160.000.00
iomanip-58.34-57.82-57.40-59.80-63.26-8.160.00
ios-47.41-45.36-45.98-49.66-52.520.000.00
iosfwd0.000.000.000.000.000.000.00
iostream-50.15-50.22-50.03-53.11-57.92-4.070.00
istream-50.17-50.23-50.04-53.12-57.93-5.980.00
iterator-45.44-48.28-48.21-53.72-32.040.000.00
latch-56.29-54.96-54.23-54.07-53.660.000.00
limits-64.84-63.39-64.31-71.43-73.540.000.00
list-84.72-82.15-81.49-81.70-79.670.000.00
locale-37.55-36.26-36.40-40.87-44.980.000.00
map-77.34-75.21-73.89-72.25-71.970.000.00
mdspan0.000.000.000.000.000.000.00
memory-57.46-54.11-53.64-55.71-52.090.000.00
memory_resource-99.87-99.92-99.92-79.62-79.640.000.00
mutex-83.99-78.72-78.86-79.84-79.89-2.160.00
new-89.53-89.62-89.49-90.86-91.110.000.00
numbers-94.36-94.57-94.35-95.17-92.960.000.00
numeric-98.05-98.10-98.08-94.44-93.230.000.00
optional-85.08-84.19-84.21-81.22-76.050.000.00
ostream-51.04-51.04-50.82-53.87-58.58-4.150.00
print-75.64-73.72-73.57-74.57-50.740.000.00
queue-65.74-64.18-63.44-64.72-64.85-4.990.00
random-49.32-48.35-48.51-49.94-52.360.000.00
ranges-45.43-48.28-48.21-53.72-65.430.000.00
ratio-72.23-72.69-73.51-79.65-81.350.000.00
regex-44.07-43.82-43.57-46.25-48.87-1.970.00
scoped_allocator-86.67-79.16-79.10-79.48-76.990.000.00
semaphore-48.63-47.66-47.06-47.33-47.320.000.00
set-78.49-76.35-75.09-73.45-73.000.000.00
shared_mutex-84.91-84.46-83.98-84.92-86.590.000.00
source_location0.000.000.000.000.000.000.00
span-93.29-93.39-93.34-93.24-90.290.000.00
sstream-49.63-49.73-49.56-52.65-57.29-5.840.00
stack-75.25-73.31-72.34-73.53-72.570.000.00
stdexcept-89.89-90.12-90.34-91.49-91.780.000.00
stop_token-98.35-98.93-98.97-99.29-39.490.000.00
streambuf-47.02-45.01-45.64-49.34-52.240.000.00
string-61.15-57.89-58.26-60.93-62.760.000.00
string_view-71.35-71.13-71.19-73.64-72.680.000.00
strstream-50.00-50.01-49.83-52.91-57.75-5.940.00
syncstream-45.03-45.15-44.37-45.45-51.44-3.700.00
system_error-60.35-57.21-57.60-60.32-62.300.000.00
thread-48.20-48.59-48.46-51.59-53.99-4.570.00
tuple-81.25-72.60-71.92-70.02-54.960.000.00
type_traits0.000.000.000.000.000.000.00
typeindex-94.16-94.70-94.74-95.19-74.630.000.00
typeinfo-82.28-82.70-82.97-86.58-87.360.000.00
unordered_map-64.77-61.19-61.04-59.45-61.530.000.00
unordered_set-78.94-76.60-76.12-74.14-73.620.000.00
utility-86.76-82.48-81.44-80.73-62.280.000.00
valarray-74.18-74.30-74.41-75.51-75.600.000.00
variant-68.38-68.74-66.70-57.12-53.620.000.00
vector-70.04-67.63-67.24-67.77-67.03-5.740.00
version0.000.000.000.000.000.000.00

The non-existent and relatively small differences in C++26 and C++23 respectively are due to us having dropped transitive includes in these standard modes unconditionally already. We’ve somewhat recently started keeping headers around in C++23 though.
Most numbers close to 100% mean that the header is empty in that C++ version (because it has been introduced in a later standard).

Expected Breakage

It is expected that this change would cause a lot of projects to break, since unknowingly relying on transitive includes is quite common. That will almost always come in the form that previously compiling code will fail to compile due to missing declarations or (much less frequently) fail to link due to a missing definition with a visible declaration. Any breakage should be quite easy to fix, by simply including the appropriate headers. Projects that are also being compiled with standard library implementations other than libc++ will likely see significantly less breakage, since different implementations tend to provide a different set of transitive includes.

Prior Art

libstdc++ is actively trying to reduce header size, removing transitive includes every release and has done so for many years. While this causes some amount of breakage, the libstdc++ headers are significantly smaller than the current libc++ headers by default.

The MSVC STL also drops transitive includes on a regular basis, but is less aggressive than libstdc++.

Timeline

I’m proposing to first flip the default in the LLLVM 23 release, so that transitive includes are removed by default, but can be brought back using an “escape hatch” flag, e.g. _LIBCPP_KEEP_TRANSITIVE_INCLUDES_LLVM23. One release later, the escape hatch would be removed. This gives people time to incrementally fix their missing includes.

Future removals of transitive includes

I expect that we will remove more transitive includes in the LLVM 24 timeframe and later. To avoid shifting the goalpost, we would keep the current policy of keeping transitive includes around after the LLVM 23 branch. We can revise that policy at a later point if we want to.

4 Likes

In general, I am in support of this change.

I wonder if there is any automated tooling to fix the breakage, adding all missing includes? clang-include-fixer sounds promising, but I never used it myself and it requires a symbol index. Do you ship such a symbol index as part of libc++?

I also support this RFC. I think we’ve been very cognizant not to break users at every release, but we’ve accumulated enough unnecessary transitive includes now that we should do this one-time break. There will be a one time cost for some users to pay, but their code will be more portable and will compile faster as a result.

Overall, I think this is the right path forward, with the right approach, and the right timeline. I’ve also checked internally and we are fine with the suggested timeline.

Libc++ does provide a libcxx.imp file generated by a script. We might want to double-check whether this solves the problem it intends to solve, but in theory I believe running IWYU (or maybe clang-include-fixer) against this libcxx.imp file should automatically fix people’s code. That’s the intent, at least.

Bumping this thread to get it a bit more attention, in case people just missed it. So far it looks like we’d be good to move forward, but Nikolas is going to open a PR to implement these changes and ping the libcxx vendors on GitHub to get their attention – to ensure we do our due diligence in getting this the attention it needs.

Makes sense to me. It should be a no-op for most of Google’s internal builds, because we’ve been enabling the define since it was first added in 2022!

I hope that future include removals continue to be enabled via the existing _LIBCPP_REMOVE_TRANSITIVE_INCLUDES, so that folks like us who opt into it continue to get the immediate benefit of smaller transitive inclusions.

1 Like

Yes, we plan to keep the macro active after this removal. This RFC is only about removing the current (at the time of the next release branch) set of transitive includes that we keep around for compatibility. Further improvements after that point will be behind _LIBCPP_REMOVE_TRANSITIVE_INCLUDES again.

Yeah, I think it’s good to rip off this bandaid. I’ve long been a proponent of IWYU; especially since users love to complain about compile times. Cleaning up their includes is one of the best things they can do to help themselves. We’ll need to post timelines for consumers of the Android NDK to help minimize the surprise, but I’m supportive of this RFC.

One thing I think would make this less painful for users would be if we improved the diagnostics in clang a bit for missing headers.

Let’s compare clang-22 and gcc-16: Compiler Explorer

GCC literally tells you what you’re missing and how to fix it:

<source>:3:6: error: 'vector' in namespace 'std' does not name a template type
 std::vector<int> foo;
      ^~~~~~
<source>:3:1: note: 'std::vector' is defined in header '<vector>'; did you forget to '#include <vector>'?
+#include <vector>
 
 std::vector<int> foo;
 ^~~

Clang simply states:

<source>:3:6: error: no template named 'vector' in namespace 'std'
    3 | std::vector<int> foo;
      | ~~~~~^

GCC has had this helpful note since gcc-8.

Bad diagnostic for missing <inttypes.h> include for printf format specifiers · Issue #104155 · llvm/llvm-project · GitHub is the closest issue I could find mentioning this.

I think if we improved the diagnostic first in clang before making this change in libc++, it would significantly improve the user impact to folks upgrading to the version of libc++ that removed these transitive includes, since then clang would spell it out for you rather than give the somewhat vague existing error messages.

cc @AaronBallman

2 Likes

We have logic to do this for the easy cases with standard library functions in C, but not in C++: Compiler Explorer but we don’t do it for types currently: https://godbolt.org/z/ad7xh85c5

There’s no technical reason why we couldn’t do this in C++, though I think I would limit it to just cases where the user qualifies the name. e.g., I think it makes sense to recommend <vector> when the user writes std::vector but less so if the user writes vector (even in the presence of using namespace std;) Clang would need to add the mapping of standard library interfaces similar to what we already do for functions, which is a lot of data to maintain, but this also seems like data that tools such as clang-include-fixer should be making use of, so it seems plausibly worth the effort to do it once in Clang so there’s a definitive mapping people can rely on.

FYI, this is Suggest include file for popular standard functions · Issue #120388 · llvm/llvm-project · GitHub

1 Like

with standard library functions in C, but not in C++: Compiler Explorer

Why does that work for sin/math.h, but not puts/stdio.h?

FYI, this is Suggest include file for popular standard functions · Issue #120388 · llvm/llvm-project · GitHub

Looks like [clang][Sema] Suggest/Hint Standard Library Include File by Mr-Anyone · Pull Request #146227 · llvm/llvm-project · GitHub was an attempt at this.

For the C standard library functions, the current functionality only works for recognized functions (clang/include/clang/Basic/Builtins.td). Our coverage of the C library is not complete: nobody ever went through to try to add everything, we just add stuff as necessary for other purposes.

1 Like