"RFC: Supporting In-Tree Headers for Libc++ Development"

EricWF · May 27, 2024, 1:24am

In recent years there has been disagreement between myself and other contributors about how to handle the in-source version of the libc++ headers headers (or “in-tree headers”).

The disagreement boils down to this:

Should the in-tree headers be usable for building, testing, or otherwise developing the library?

Or said another way

should we enforce the out-of-tree headers as the only valid way to consume the library?

The disagreement has prevented changes that support using in-tree headers in any capacity.

Personally, this has made it harder for to work on the library, since my IDE and other tooling are unable to cope or even understand the in-tree headers I need to edit.

If you’re familiar with the details of the build process, skip ahead to “The Debate” section. Otherwise, background is provided below.

Background
=========

Summary

Currently, when we build, we copy all of the headers into a separate directory inside the build directory used to store build artifacts (the “generated include directory”). That copy of the headers is then used to build and test the library.

The headers are copied verbatim from the source tree to the build directory, except for two headers:

__config_site, which is generated by the build from libcxx/include/__config_site.in. The generated header contains macro definitions that are used to customize the behavior of libc++ depending on how the build was configured.
__assertion_handler: which is copied verbatim from libcxx/vendor/llvm/default_assertion_handler.in.

Libc++ currently supports two layouts of headers in the build directory. One where all of the headers, including the generated __config_site are in the same folder, and another where the __config_site and other target specific headers are placed in a separate directory “target include directory”. Apple does not support this configuration.

Additionally, the <cxxabi.h> file is copied into either the generated include directory or
the target include directory.

In this latter configuration, the files present in the “generated include directory” are the exact same as the headers present in the “in-tree headers”, With the exception of the cxxabi.h file, which comes from the ABI library, and the __assertion_handler file, which we copy from elsewhere in the source tree.[1][2]

The default Linux runtimes build uses the “target include directory”.

The default Apple build does not.

The defacto build and test setup uses the copies of the headers inside the build directory. With minimal modification, roughly ~5 lines of changes, the in-tree headers can be used in the same manner.

Notes

[1] With the notable addition of <cxxabi.h> from whichever ABI library was chosen at configuration time. Though, that header likely belongs in the “target include directory” when present.

[2] I’ve created a patch to simplify how the __assertion_handler default implementation is handled. The patch moves the default implementation into the source tree Simplify the __assertion_handler build logic. Be friendly to IDEs. by EricWF · Pull Request #93333 · llvm/llvm-project · GitHub

A separate document provides a detailed exploration of how C++ standard library headers are used when building, testing, and using the LLVM libc++ library. It covers the different types of headers involved, the main header layouts used, and the importance of include paths and how they are constructed.

The Debate
==========

To be super cheesy about this, let’s get all 8th grade debate class:

Be it resolved that: libc++ should maintain support for building, testing, and developing the library using the in-tree C++ headers, in addition to supporting builds using the copied headers in the generated build directory.

Arguments For
=============

Supporting in-tree headers simplifies development by allowing IDE’s and other developer tools to properly parse, highlight, and analyze the libc++ source code. Many tools stop working when encountering missing headers, which is currently an issue with the in-tree layout.
The in-tree headers are the actual files that developers read, edit, and work with on a daily basis. Ensuring they are valid and usable fosters a better development workflow and catches potential issues earlier.
Diagnostics from building the source or running tests point to the generated header copies in the build directory, making it easy for developers to accidentally edit the wrong files which then get overwritten on the next build. Referencing the in-tree headers avoids this confusion.
Only a single header is actually “generated” (__config_site), while the rest are copied verbatim from the in-tree headers. On Linux the generated __config_site is output in a separate directory. With minimal modifications, roughly 5 lines of changes to CMake, the in-tree headers can be made to work for building and testing.
Libc++ has a history of supporting in-tree builds, and this setup worked well for years. Continuing to allow this configuration, in addition to the generated build directory layout, provides flexibility without harming the library.
Neither the in-tree layout nor the generated build directory layout exactly match the final installed header layout that users will consume. If validating the installed layout is the priority, libc++ should develop a dedicated testing configuration for that purpose, rather than relying on the build directory layout as an imperfect proxy.

Arguments Against
================

The validity and correctness of libc++'s headers are currently dependent on the build system, due to generated files like __config_site and __assertion_handler. Presenting the in-tree headers as a valid, standalone layout risks causing confusion due to subtle differences from the generated build.
Enforcing a clear separation between the in-tree headers and the generated install directory helps avoid issues stemming from any divergences or invalid assumptions. Developers should be steered towards treating the generated headers as the proper source of truth.
Allowing too much flexibility in the supported header layouts risks adding unnecessary variance to the development and testing process. Libc++ should aim to standardize on a single “blessed” layout that is validated by CI and the test suite.
Making the in-tree headers independently valid, in addition to supporting the generated headers, imposes an added maintenance burden on the project. The more configurations supported, the harder it is to make changes and enforce correctness.
Even if the in-tree and generated headers differ from the final installed layout, keeping them as similar as possible helps catch potential bugs or issues that could arise from the installation process. The generated headers are a useful intermediate step.

Suggested Resolution
==================

After considering the arguments on both sides, my recommendation would be for libc++ to continue supporting both the in-tree and generated build directory header layouts in the near-term, while working towards a longer-term solution.

In the short-term, officially sanctioning and documenting the in-tree layout as a valid configuration would ease development friction and assist developers relying on IDE tooling. This would be a temporary measure while the project focuses on developing a more robust testing setup for the installed headers. The historical precedent and relative ease of making the in-tree layout work suggests this would not be an undue maintenance burden.

However, I believe the libc++ project should prioritize developing a dedicated testing configuration that validates the final installed headers, as consumed by end-users. Ensuring the correctness of the shipped product is the most important goal. Once this testing setup is in place, libc++ could consider deprecating the build directory layout in favor of a simpler model of in-tree headers for development and installed headers for testing/validation.

In the meantime, clearly documenting the differences between the in-tree, generated, and installed layouts would help minimize confusion and set appropriate expectations. Automated CI checks could also be added to flag any unintentional divergences between the layouts.

As a step towards making the in-tree headers easier to support, I have opened a pull request (#93333) to simplify the default implementation of __assertion_handler.

This approach would meet the needs of developers while keeping a focus on the long-term goal of shipping high-quality, well-validated headers to users. It aims to avoid an over-emphasis on intermediate build artifacts at the expense of either developer experience or end-user assurance.

Endill · May 27, 2024, 12:07pm

I find an IDE argument compelling, because productivity of libc++ maintainers is important. It’d be nice to list what the differences between source layout and installed layout are, and why it’s not possible to converge on a single layout. If it’s not possible, then I consider additional CI load to test both layouts a worthy trade-off to increase maintainer productivity.

I’m not a libc++ contributor, so I don’t feel eligible to express my support or disagreement with the RFC. So take this as an external opinion.

EricWF · May 27, 2024, 6:03pm

I’ve edited the original post to contain a link to a separate document describing the layouts, the rational for them, and how that actually all plays out today.

The over simplistic answer to your question is: In the two-directory target-specific layout, the layout of the in-tree and build-tree headers is exactly the same, minus an addition or two.

The only headers present in the build-tree that aren’t in the source-tree are cxxabi.h, and until PR #93333 lands, __assertion_handler. __config_site is already placed in a separate target-specific directory. (and cxxabi.h could/should be placed there to)

Converging on a single layout is certainly possible, though it would require by-in from vendors and most importantly Clang driver changes which take time to propagate.

As for additional CI load, my preference would be to standardize behind a single “canonical” layout for the CI. My preference would be that we the installed header layout, whatever it is, rather than using the use neither the “in-tree” or “in-build” headers. That way we test the CMake logic that installs the headers, as well allowing us to “test as we ship” (or as close to that as we can get).

ldionne · May 28, 2024, 1:27am

Thanks for writing this down with so much details about the current state of things. It’s always useful to get a refresher.

Based on what you wrote in the Suggested Resolution and in your follow-up post, I believe the end state you would prefer is the following (please correct me if I am wrong):

We build libc++.so against the in-tree headers (presumably with some funky include paths to make __config_site & friends work).
We install the libc++ headers, libc++.so and anything else relevant (module files?) to a “fake installation prefix”, and then we run the test suite against that.

Thus, there would be no intermediate “generated build directory”. Is (1) what you refer to as “supporting development against the in-tree headers”? Please let me know if I understood your ideal end state correctly to make sure we’re not talking past each other.

First, I am 100% supportive of (2). I think this is clearly the way to go and I don’t see any real technical barriers to doing that, it just needs to be done. In fact, this is what the generated directory is trying to approximate, it just does that poorly. However, I will note that errors in the test suite will still be poiting to the “fake installed” copy of the headers, which is a pain point you mentioned. Put simply, I don’t think there’s any real way to solve that problem, except perhaps clever use of symlinks (but I’d be really careful with that).

For (1), I am not certain how it can be done while still retaining the ability to generate/customize header files, but I don’t think this is necessarily a bad idea. However, I don’t see how that would really change the IDE situation, since we would still need to generate some headers and that would cause the in-tree headers themselves not to be sufficient for an IDE to fully understand the codebase.

Side note mostly about terminology:
I’d like to push back against is the idea that the in-tree headers can be used “as-is” (for a strict definition of “as-is”) for anything. This isn’t the case, it’s never been the case and it can never be the case because we need to generate files like __config_site. We need to keep it clear that people can’t just copy our headers to a location and use that, since that results in a broken (sometimes subtly broken) install, and that’s dangerous. Even if we built libc++.so against the in-tree headers, we wouldn’t be using the in-tree headers “as-is”, since we’d generate the __config_site and add that search path explicitly to make things work.

EricWF · May 28, 2024, 4:52am

I designed and implemented the __config_site configuration mechanism in 2015 See 29ada6d17889.

Please take a look at that commit. It should hopefully clear up any confusion.

The mechanism I initially implemented worked for many years, without the need to copy any headers.

Like Clang and LLVM do today, we simply put generated headers in their own directory, and add an include path for them while building and testing.

Copying the rest of the headers elsewhere is unneeded. I’m proposing we stop.

On Linux there is already a separate include path used for the “generated headers”. It’s defined as LIBCXX_GENERATED_INCLUDE_TARGET_DIR in CMake.

ldionne · May 28, 2024, 11:15am

I don’t think there’s any confusion. Saying “it was never the case” is indeed a slight exaggeration, but there are good reasons we moved away from the original design and started generating + including __config_site all the time. See for example [libc++] Always generate a __config_site header · llvm/llvm-project@53623d4 · GitHub where we went this close to an ABI break because we were not always generating a __config_site file. Going back to a world where we sometimes have a __config_site and sometimes don’t is a no-go, but I don’t think that’s what you’re after anyways (?).

We could do that for building the library, but for testing, the only setup we should support is the fake installation root. There needs to be exactly one way of testing the library, it must be used by everyone and it must be as close as possible to what we ship. Do we agree on this point?

There’s no amount of IDE-friendliness that is worth moving away from this model IMO.

Can you please let me know if this summary (the one I posted above) is indeed what you’re after? Just so we can talk about concrete changes:

We build libc++.so against the in-tree headers, with some additional include for the compiler to find the generated __config_site.
We install the libc++ headers, libc++.so and anything else relevant (module files?) to a “fake installation prefix”, and then we run the test suite against that.

Or are you instead saying that you’d want both the building and the testing to use the in-tree headers with the additional include for the generated headers?

brunodf · May 28, 2024, 12:08pm

Not sure if it applies to this discussion, but in a context where cmake is not available, we build and use libc++ from its source directory: i.e. we add libcxx/include to the include path, we also add a __config_site file (manually edited) to the include path, we create a library project with selected source files from libcxx/src and appropriate compile settings (all manually curated), we build this library, and finally we compile C++ application code and link it against the library. This is a historic situation, but it has always worked well (at least up to libc++ 16).

Obviously, libc++ cannot support this approach, but I hope it does not complicate it either (without technical reason). For example, sophisticated generated headers would make this approach more complex or impossible (yes, __config_site is generated too, but that is really simple), so I hope libc++ does not switch to generated headers unless there is really a technical reason to do so, e.g. something that cannot be achieved well with macros/includes as in the current headers.

EricWF · May 28, 2024, 12:39pm

Thank you for bringing this up. It’s important for us to consider users who don’t use CMake.

I think libc++ can support this approach, and it has for much of my tenure on the project.

I agree that placing sophisticated logic inside the build system is undesirable, and that much of the configuration logic contained within CMake today (default macro definitions, for various platforms for example), could equally well be represented using macros & includes.

EricWF · May 28, 2024, 1:09pm

Always generating the __config_site from CMake caches simply moves where the configuration occurs out of source and into the build. There’s no technical hurdle to going back.

I disagree

Saying that “there need to be exactly one way… and it must be used by everyone” denies the diverse needs of the existing community.

There is no “one size fits everyone” approach here. Not everyone “ships” the library, and even vendors who do, all do so in different manners.

Your needs as a representative of Apple are different from the needs of many other contributors, vendors, and users.

ldionne · May 28, 2024, 2:23pm

I don’t think it’s reasonable for libc++ to support arbitrary unknown build systems that are not officially supported by LLVM. LLVM supports CMake, and while we shouldn’t go out of our way to make it difficult to use other build systems without reason, we shouldn’t start trying to support obscure or unofficial ways of setting up the library. @EricWF you often bring up haunted graveyards as something we want to avoid, and vague requirements like “being friendly to other build systems” is an example of something that creates these grey areas.

We’ve been working really hard for years to deprecate and remove all the different ways of building libc++ (e.g. the LLVM_ENABLE_PROJECTS build, the standalone build, etc) because it was a sorely needed simplification. Let’s not regress on that.

Again, I’m not saying that we should make people’s lives hard without reason, but we should work towards streamlining and reducing the number of supported configurations, not increasing it.

The point I am making is that it’s a brittle and confusing setup, which has caused serious problems in the past. That’s why we ended up moving to a streamlined “always install __config_site” scheme. That way you’re guaranteed by a very unsophisticated mechanism that you’re getting the right configuration.

Libc++ is not just a fun open source project on the side. The value of the project comes from the fact that it gets “shipped”, and I will argue that everyone “ships” the library in one way or another: but our definitions of “ship” might differ. LLVM does it through its release tarballs, Apple does it through a SDK, Android through their NDK, Google deploys it internally (I don’t know how that works), Fuchsia embeds the library into their executables IIUC, etc. The point here is that everyone ships it differently, but they all build it, install it and make it available to developers on their platform/environment/stack. That’s what I call shipping.

The libc++ testing process isn’t (and shouldn’t be) strongly tied to any particular directory layout or way of shipping the library. However, the test suite should test what is being shipped (or as close as can be). This property of the test suite and this project mindset is an absolute necessity in order to have a robust project.

I’m really not speaking for Apple here, I’m speaking as someone who has worked really hard in the past several years to get new users of libc++ supported officially by making our CMake, testing and CI setup general enough for them to use without diverging from upstream too much. I care about this more than you seem to think.

I’m also speaking as someone who has experienced first hand the damage that can be done when a project is not tested how it’s shipped. It’s scary how pretty bad bugs can make it through layers and layers of unit tests when the “test as you ship” property isn’t satisfied.

As a side note, Apple actually used to build libc++ with a Xcode project internally. All of the pain points mentioned here are things I am intimately familiar with since I’ve dealt with them myself.

We’re starting to go in circles here. We both seem to be in favour of getting rid of the generated include directory inside build/, perhaps that’s something we can rally around and see where we can go from there. What do you think?

EricWF · May 28, 2024, 2:54pm

I think a good first step would be to come to consensus on, and move forward with Simplify the __assertion_handler build logic. Be friendly to IDEs. by EricWF · Pull Request #93333 · llvm/llvm-project · GitHub

cjdb · May 29, 2024, 4:57pm

This is indeed something that needs to be tested. Is there a reason we can’t do this in the CI, as a “unit test what is installed” job?

ldionne · May 30, 2024, 1:22am

Our current test setup is actually very close to that – we test against the generated include directory, which is basically a copy of what would get installed. But this is indeed the way we should be testing things all the time, and it’s just a matter of implementing it.

Topic		Replies	Views
[PATCH] Support building libc++ using an in-tree libc++abi Clang Frontend	7	120	July 24, 2014
[libcxx] [libcxxabi] Improve cmake scripts Clang Frontend	3	132	May 25, 2016
Heads up: development workflow for testing header-only changes C++	0	133	October 21, 2020
[PROPOSAL] Add an option to build Clang with in-tree libcxx Clang Frontend	2	121	July 11, 2017
[libc++] NOTICE: Breaking change to libc++'s CMake Configuration flags LLVM Dev List Archives	0	98	March 19, 2015

"RFC: Supporting In-Tree Headers for Libc++ Development"

Notes

Related topics