[RFC] A vision for building the runtimes

Hi folks,

The topic of how to build the runtimes has been brought up several times in the past, sometimes by me, sometimes by others. All in all, building the runtimes is fairly complex: there's several ways of doing it, and they don't always work. This complexity leads to several problems, the first one being that I, as a maintainer, can't currently guarantee a good experience for people trying to build the library. What follows is a concrete proposal to make things better within a reasonable time frame.

The current state of things

Oh, and here it is if you're curious! https://github.com/llvm/llvm-project/tree/master/libcxx/utils/ci/runtimes

Louis

Hi Louis,

big +1 from me. I recently set up the runtimes for our downstream toolchain and to be honest it was quite a pain to get everything working. Due to the complexity we ended up creating our own CMake cache to make the configuration easier. We also tried to use the existing toolchain build. But since it tries to build the runtimes immediately after it built the compiler we cannot use it, as we have to build our C library first. So having a "unified standalone" approach sounds just like something that would make this use-case a lot easier.

Do you already have an idea of how a multilib build would look like with your proposed setup? Also, are you planning of including compiler-rt in this as well or is this strictly meant for libc++, libc++abi and libunwind?

Thanks for doing this!

Dominik

Hi,

Proposal
--------------

My goal with this proposal is to achieve:
1. Decoupling from the top-level LLVM CMake setup (which doesn't work, see above)
2. A simple build that works everywhere, including embedded platforms
3. Remove the need to manually tie together the various runtimes (as in the Standalone builds)

My proposal is basically to have a "Unified Standalone" build for all the runtimes. It would look similar to a Monorepo build in essence (i.e. you'd have a single CMake invocation where you would specify the flags for all runtime projects at once), but it wouldn't be using the top-level LLVM CMake setup [1]. Specifically:

1. Add a `runtimes/CMakeLists.txt` file that includes the runtimes subprojects that are requested through -DLLVM_ENABLE_PROJECTS (open to bikeshed), and sets up minimal stuff like the `llvm-lit` wrapper and Python, but none of the harmful stuff that's done by the top-level LLVM CMake.
2. Deprecate the old individual Standalone builds for this new "Unified Standalone build".
3. Users migrate to the new Unified Standalone build. Users include the current "Runtimes" build, some places in compiler-rt, and various organizations.
4. Remove support for the old individual Standalone builds.

Sounds potentially good, but I hope the removal can wait until e.g. after the next stable release sometime (ideally removed from the master branch only after the next stable release has happened, not only forked off), as I try to have a single set of build scripts that work both for the latest stable release and the current top-of-tree master branch.

I presume the main place where you want to avoid other kinds of builds is for the combination of libcxxabi+libcxx, and that's indeed a bit of a mess at the moment.

Would it still be possible to build only compiler-rt/lib/builtins and not all of compiler-rt?

The prime reason why the current "runtimes" (as you call toolchain) builds aren't uasble for me, is that while it does the right thing (first build a compiler, then use that compiler to build runtimes) is that I need to do a number of other things inbetween building the compiler and various bits of the runtime, and I need (or just want?) to micromanage the process.

My current procedure for bootstrapping a cross toolchain from scratch amounts to this:
- Build the compiler and tools
- Install the mingw headers and base runtime, set up compiler frontend
   wrappers
- Build compiler-rt/lib/builtins only
(At this point, I have an essentially complete toolchain for C code.)
- Build libunwind+libcxxabi+libcxx (with tight interdependencies)
- Build all of compiler-rt (sanitizers require having a working set of C++ headers and other things)

So while it'd be nice to bundle as much as possible up in a Unified Standalone build, I very much like the fact that I can pick the individual build steps in the order I want, needed to assemble things from scratch for my setup. If I later can merge more of them into a single cmake invocation (or at least fewer), that'd be an optional bonus. Building all of libunwind+libcxxabi+libcxx in one cmake build is something I'd like to do in any case.

With Unified Standalone builds, I can probably do most of that by still doing many individual cmake invocations, picking a different set of runtime projects to build each time. But what about e.g. picking only compiler-rt/lib/builtins?

[1] If you're wondering what that would look like:

   $ mkdir <monorepo-root>/build
   $ cd <monorepo-root>/build
   $ cmake ../runtimes -DLLVM_ENABLE_PROJECTS="libcxx;libcxxabi;libunwind" \
       -C <path-to-your-cache-if-desired> \
       -DLIBCXX_ENABLE_XXX \
       -DLIBCXXABI_ENABLE_XXX \
       <other options>
   $ ninja install-cxx install-cxxabi

That looks good, and I tried it out with the cmake file you shared - with a small amount of tweaks I can use that in my current setups - I'll send a patch for that for discussion.

One thing I'm left wanting in this setup, is that for various reasons I've currently been passing different sets of extra flags (compiler and linker flags) to each of libunwind/libcxxabi/libcxx. Some of it is included in the patch I'll post soon, reducing the need for it though. But I've been building e.g. libunwind with -Wno-dll-attribute-on-redeclaration, but not the other libs. (Yes if that's a permanent setup, I should probably upstream setting that flag as well.)

But for general cases, when trying out various build setups, being able to pass individual CMAKE_CXX_FLAGS or CMAKE_SHARED_LINKER_FLAGS to each of libunwind/libcxxabi/libcxx would be very useful, especially if doing something that isn't necessarily supported (yet).

// Martin

Btw, unrelated to the topic at hand, but related to monorepo builds in general, as I've brought up some time before - it'd be great for various reasons to be able to do such a build (building the tools, not the runtimes) by pointing cmake directly at the llvm-project root instead of at the llvm subdir.

Last time this was brought up, Chris [1] mentioned that it'd be doable, but would require a bit of work to make the produced files (<builddir>/bin/clang etc) end up in the root of the build dir, instead of in a subdir, like <builddir>/llvm/bin/clang. My cmake-fu is a bit too weak to easily pinpoint what needs to be fixed to proceed with this though...

[1] http://lists.llvm.org/pipermail/llvm-dev/2019-November/136466.html

// Martin

Hi,

Proposal
--------------

My goal with this proposal is to achieve:
1. Decoupling from the top-level LLVM CMake setup (which doesn't work, see above)
2. A simple build that works everywhere, including embedded platforms
3. Remove the need to manually tie together the various runtimes (as in the Standalone builds)

My proposal is basically to have a "Unified Standalone" build for all the runtimes. It would look similar to a Monorepo build in essence (i.e. you'd have a single CMake invocation where you would specify the flags for all runtime projects at once), but it wouldn't be using the top-level LLVM CMake setup [1]. Specifically:

1. Add a `runtimes/CMakeLists.txt` file that includes the runtimes subprojects that are requested through -DLLVM_ENABLE_PROJECTS (open to bikeshed), and sets up minimal stuff like the `llvm-lit` wrapper and Python, but none of the harmful stuff that's done by the top-level LLVM CMake.
2. Deprecate the old individual Standalone builds for this new "Unified Standalone build".
3. Users migrate to the new Unified Standalone build. Users include the current "Runtimes" build, some places in compiler-rt, and various organizations.
4. Remove support for the old individual Standalone builds.

Sounds potentially good, but I hope the removal can wait until e.g. after the next stable release sometime (ideally removed from the master branch only after the next stable release has happened, not only forked off), as I try to have a single set of build scripts that work both for the latest stable release and the current top-of-tree master branch.

I presume the main place where you want to avoid other kinds of builds is for the combination of libcxxabi+libcxx, and that's indeed a bit of a mess at the moment.

Would it still be possible to build only compiler-rt/lib/builtins and not all of compiler-rt?

Yes. We wouldn't make any such changes to compiler-rt. For now, I don't even want to try to include compiler-rt in that Unified Standalone build: I want to focus on making libc++, libc++abi and libunwind work in that setup.

So while it'd be nice to bundle as much as possible up in a Unified Standalone build, I very much like the fact that I can pick the individual build steps in the order I want, needed to assemble things from scratch for my setup. If I later can merge more of them into a single cmake invocation (or at least fewer), that'd be an optional bonus. Building all of libunwind+libcxxabi+libcxx in one cmake build is something I'd like to do in any case.

I hear you. As I said above, my goal is basically to have a single invocation for libunwind, libc++abi and libc++. We can then try to also make compiler-rt work the same, but I think it might require significant refactoring of compiler-rt's build before we can do that I don't know enough to say for sure, but I'm sure I can get assistance from e.g. beanz and phosek if we do this.

With Unified Standalone builds, I can probably do most of that by still doing many individual cmake invocations, picking a different set of runtime projects to build each time. But what about e.g. picking only compiler-rt/lib/builtins?

Yes, no plan to disable that.

One thing I'm left wanting in this setup, is that for various reasons I've currently been passing different sets of extra flags (compiler and linker flags) to each of libunwind/libcxxabi/libcxx. Some of it is included in the patch I'll post soon, reducing the need for it though. But I've been building e.g. libunwind with -Wno-dll-attribute-on-redeclaration, but not the other libs. (Yes if that's a permanent setup, I should probably upstream setting that flag as well.)

Why doesn't this flag apply to all the runtimes?

But for general cases, when trying out various build setups, being able to pass individual CMAKE_CXX_FLAGS or CMAKE_SHARED_LINKER_FLAGS to each of libunwind/libcxxabi/libcxx would be very useful, especially if doing something that isn't necessarily supported (yet).

I believe the correct place for that is in a downstream-only patch being applied to your fork of llvm. That's how we do it at Apple, and it avoids having to maintain all kinds of hooks to customize X and Y upstream. I'm not saying these hooks don't even make sense -- they make sense most of the time. But if we can avoid adding an option to customize all the flags for each runtime, I think that would be ideal.

Hi Louis,

big +1 from me. I recently set up the runtimes for our downstream toolchain and to be honest it was quite a pain to get everything working. Due to the complexity we ended up creating our own CMake cache to make the configuration easier. We also tried to use the existing toolchain build. But since it tries to build the runtimes immediately after it built the compiler we cannot use it, as we have to build our C library first. So having a “unified standalone” approach sounds just like something that would make this use-case a lot easier.

Glad to see support!

Do you already have an idea of how a multilib build would look like with your proposed setup?

Can you define what you mean by a multilib build? Do you mean building for several architectures at once? Do you produce the libs for each architecture in different directories?

If you’re talking about what I’m thinking of, I believe the simplest and most CMake-friendly way of doing it would be to have multiple CMake invocations. We can hide those behind a “driver” build like what the Runtimes build does, for example, but the libc++/libc++abi/libunwind builds themselves wouldn’t be aware they’re being built for multiple archs.

Also, are you planning of including compiler-rt in this as well or is this strictly meant for libc++, libc++abi and libunwind?

For now, libc++, libc++abi and libunwind. Those are different because it makes sense to ship them alongside the compiler, or not. For example, at Apple we ship those as system libraries, not alongside our compiler. The compiler-rt build is also significantly more complicated.

Louis

With the use-case described above, we simply do one CMake invocation for each library configuration.

Forgive me if this is obvious to everybody but myself, but I just want to be clear on the new proposed behavior for monorepo builds:

What happens if I do `cmake ... -LLVM_ENABLED_PROJECTS="all" ...`? I assume that prior to your change, I would get the runtimes (built using the system compiler), but now I will not?

What happens if I do `cmake ... -LLVM_ENABLED_PROJECTS="...;[some runtime];..." ...`? I assume that prior to your change, I would get the requested runtime, but now I will get an error?

Basically, "if I just want to hack on LLVM, do I need to change my workflow?"

I also wonder if the runtimes build really needs the tip of master in order to build correctly. Could you just make it an error to build with a compiler that doesn't support a new enough C++ standard? Sure, if I'm trying to compile with whatever version of GCC comes with Debian, it won't work, but if I'm running the latest GCC or a fully patched visual studio 2019?

Thanks,
   Christopher Tetreault

You mentioned elsewhere that you are only focussing on libc++,
libc++abi and libunwind for now. Lets say libc joins this scheme
sometime in future. Can one do:

$> cmake ../runtimes
-DLLVM_ENABLE_PROJECTS="llvm;clang;clang-tools-extra;libc" <...>

The libc build currently needs llvm in all modes. For linting, we need
clang and clang-tools-extra.

Yes, I think one CMake invocation per configuration is the way to go. Unlike other build systems, it seems like CMake wasn’t really designed with multi-configuration builds in mind.

Now, one can wrap these multiple CMake invocations in a single one by using a “driving” CMakeLists.txt and calls to ExternalProject_add if so desired, but it’s really another layer on top.

Forgive me if this is obvious to everybody but myself, but I just want to be clear on the new proposed behavior for monorepo builds:

What happens if I do `cmake ... -LLVM_ENABLED_PROJECTS="all" ...`? I assume that prior to your change, I would get the runtimes (built using the system compiler), but now I will not?

I believe the behavior here should be equivalent to -DLLVM_ENABLE_PROJECTS=<all-except-the-runtimes> -DLLVM_ENABLE_RUNTIMES=all.

Basically, I would suggest that we make the current runtimes/toolchain build the default way to build libc++, libc++abi and libunwind. I'm not sure how compiler-rt works today so I'm not sure this is necessary -- I think it already builds using the just-built Clang but I wouldn't bet on it.

This way, you'd get a correctly built toolchain with the runtimes by default when you checkout LLVM, without having to care about any of this. Currently, you get Clang, LLVM and friends built against the system compiler, and the runtimes too (which may or may not work as you intend -- I can easily craft a configuration that won't work).

What happens if I do `cmake ... -LLVM_ENABLED_PROJECTS="...;[some runtime];..." ...`? I assume that prior to your change, I would get the requested runtime, but now I will get an error?

Currently, that would build that runtime with the system compiler, as a subproject of LLVM. This "works", but it's not great because LLVM enables a bunch of flags globally and those can trip up the runtimes, as explained in the OP.
I would suggest that we make it an error to try and build one of libcxx, libcxxabi and libunwind with `llvm/CMakeLists.txt` as the root.

Basically, "if I just want to hack on LLVM, do I need to change my workflow?"

I would expect not, however you'd go from building the runtimes with the system compiler to the runtimes with the just-built compiler (i.e. the correct way).

I also wonder if the runtimes build really needs the tip of master in order to build correctly. Could you just make it an error to build with a compiler that doesn't support a new enough C++ standard? Sure, if I'm trying to compile with whatever version of GCC comes with Debian, it won't work, but if I'm running the latest GCC or a fully patched visual studio 2019?

They don't technically need trunk, but they do need something recent. And it's usually not just about what Standard a compiler pretends to support, but more often about what intrinsics are supported, etc. Also, we don't gain anything from supporting significantly older compilers if the default CMake setup does the right thing and builds it with the just-built compiler.

If you wanted to build the runtimes with something else than the just-built Clang, that's entirely fine. You could do that using the Unified Standalone build I proposed -- this is actually what we do right now at Apple, since we don't ship libc++/libc++abi as part of the toolchain (we ship it as part of the OS, and we build it with a couple-months-old Clang).

Louis

Hi,

One thing I'm left wanting in this setup, is that for various reasons I've currently been passing different sets of extra flags (compiler and linker flags) to each of libunwind/libcxxabi/libcxx. Some of it is included in the patch I'll post soon, reducing the need for it though. But I've been building e.g. libunwind with -Wno-dll-attribute-on-redeclaration, but not the other libs. (Yes if that's a permanent setup, I should probably upstream setting that flag as well.)

Why doesn't this flag apply to all the runtimes?

Normally you'd want to make sure that declarations and definitions have matching dllimport/export attributes. But in libunwind, there are such attributes (set via _LIBUNWIND_EXPORT) on the definitions, but not on the declarations in unwind.h (which is more or less equivalent to the standard-ish unwind.h shipped as part of clang). Fixing it would be mostly unnecessary churn, as it isn't really an issue in this particular case, so it's easiest to just disable the warning for libunwind.

But for other runtimes (that generally are clean in the aspect that the warning checks) we want the warning enabled, as it would point out deviations.

So TL;DR I should just upstream it.

But for general cases, when trying out various build setups, being able to pass individual CMAKE_CXX_FLAGS or CMAKE_SHARED_LINKER_FLAGS to each of libunwind/libcxxabi/libcxx would be very useful, especially if doing something that isn't necessarily supported (yet).

I believe the correct place for that is in a downstream-only patch being applied to your fork of llvm. That's how we do it at Apple, and it avoids having to maintain all kinds of hooks to customize X and Y upstream. I'm not saying these hooks don't even make sense -- they make sense most of the time. But if we can avoid adding an option to customize all the flags for each runtime, I think that would be ideal.

I try hard to avoid downstream patches on top of LLVM in my builds - and try to resolve a fix that is acceptable upstream, but for this aspect I've been lazy when it's been easy to work around it by just passing extra cmake flags.

- Monorepo
  This is the "easy" and most common way to build the runtimes. It builds the runtimes as subprojects of LLVM (with LLVM_ENABLE_PROJECTS), with the same compiler that's used to build LLVM.

Btw, unrelated to the topic at hand, but related to monorepo builds in general, as I've brought up some time before - it'd be great for various reasons to be able to do such a build (building the tools, not the runtimes) by pointing cmake directly at the llvm-project root instead of at the llvm subdir.

Last time this was brought up, Chris [1] mentioned that it'd be doable, but would require a bit of work to make the produced files (<builddir>/bin/clang etc) end up in the root of the build dir, instead of in a subdir, like <builddir>/llvm/bin/clang. My cmake-fu is a bit too weak to easily pinpoint what needs to be fixed to proceed with this though...

Interesting -- I also brought this up recently: [llvm-dev] RFC: A top level monorepo CMake file

However, the issue I've been hitting with this is that it's a huge chunk of work to bite. Making the runtimes work only is something I can chew, but it's another story entirely to make it work for arbitrary sub-projects.

Actually, this sounds like a slightly different thing than what I was getting at.

I still wanted it to behave as if llvm was the root project, and only for building tools, not runtimes, so just like LLVM_ENABLE_PROJECTS today. But for cmake/ccache reasons I'd want to have all built code reside in directories below the main cmake root dir. (I rely heavily on such ccache effects to make the compile times more bearable.) If enabling e.g. clang now via LLVM_ENABLE_PROJECTS, clang's source is referred to as ../clang. To avoid this, I still symlink it into llvm/tools.

(The backstory is that as long as the source is under the cmake root directory, it's referred to with relative paths, so any filename that ends up in the build product, e.g. in assert messages, are identical if building in two separate checkouts, and can share ccache objects. If source files are outside of the main cmake root dir, they are referred to with an absolute path, which breaks the ccache sharing.)

// Martin

Then, I would say libc can't join this. libc++/libc++abi/libunwind are simple and they have few dependencies on the rest of LLVM (off the top of my head only llvm-lit). That's the major benefit of giving them a top-level build. If libc depends on LLVM and Clang, that's fine, however it would gain nothing from being built outside of LLVM.

Louis

Thank you for the proposal! I’m definitely biased given that this is how we’ve been already building runtimes with llvm/runtimes, but I really think that this is the right direction.

I have a few suggestions:

The (toolchain) runtimes build already does much of what you’ve described. Specifically, https://github.com/llvm/llvm-project/blob/48e4b0f/llvm/runtimes/CMakeLists.txt behaves differently based on how it’s used.

  1. When included from the LLVM build, it configures the subbuilds of runtimes for individual targets, this is handled by https://github.com/llvm/llvm-project/blob/48e4b0f/llvm/runtimes/CMakeLists.txt#L222-L651

  2. When invoked from the subbuild, it drives the build for selected runtimes (currently libunwind, libc++abi, libc++ and compiler-rt), this is handled by https://github.com/llvm/llvm-project/blob/48e4b0f/llvm/runtimes/CMakeLists.txt#L54-L220

#2 is similar to what you’ve described in your proposal as step (and is similar to https://github.com/llvm/llvm-project/blob/48e4b0f/libcxx/utils/ci/runtimes/CMakeLists.txt) except that we use -DLLVM_ENABLE_RUNTIMES rather than -DLLVM_ENABLE_PROJECTS.

What I’d propose is splitting up https://github.com/llvm/llvm-project/blob/48e4b0f/llvm/runtimes/CMakeLists.txt into two. We’ll leave https://github.com/llvm/llvm-project/blob/48e4b0f/llvm/runtimes/CMakeLists.txt#L222-L651 in place and move https://github.com/llvm/llvm-project/blob/48e4b0f/llvm/runtimes/CMakeLists.txt#L54-L220 to https://github.com/llvm/llvm-project/blob/master/runtimes/CMakeLists.txt. This provides a direct transition path for the existing users of the (toolchain) runtimes build. We could then start improving the runtimes build as needed, turning it into the unified standalone build as you’ve suggested.

One potential improvement I’d really like to see is deduplicating the CMake modules across runtimes. An open question is whether the shared CMake logic should live in https://github.com/llvm/llvm-project/blob/master/cmake/Modules or https://github.com/llvm/llvm-project/blob/master/runtimes/cmake/Modules.

Currently, we have a special handling for compiler-rt builtins (and soon also crtbegin/crtend) because those need to be built before running any CMake checks (this will also likely apply to libc in the future). I think that unnecessarily complicates the build. Instead, I’d prefer to set CMAKE_TRY_COMPILE_TARGET_TYPE to STATIC_LIBRARY for the unified standalone build and then just build all runtimes in this build. That would allow replacing some of the complicated logic that’s duplicated in multiple runtimes (for example https://github.com/llvm/llvm-project/blob/48e4b0f/libcxx/cmake/Modules/HandleCompilerRT.cmake) with direct dependencies.

In terms of timing, I think it’d be ideal if we could prepare everything in the next few months and ask users to start migrating to the new unified standalone build. We could mark the current standalone and monorepo builds as deprecated immediately after the LLVM 12 branch point, and remove them some time later so the unified standalone build is the only supported way to build runtimes in LLVM 13. Does that sound like a reasonable timeframe?

If the new way of building things is available in LLVM 12 (as an equally functional option to the current standalone builds), that sounds like a good timeframe to me. Ideally, I'd like to be able to build both the latest stable release and the current master branch in the same way, so I could switch to using the new way once LLVM 12 is finalized.

A shorter timeframe (adding the new build structure after LLVM 12 is branched off, and making it the only option before LLVM 13 is released) is manageable but less convenient to me, and probably even more so for people who don't track master closely but only peek into things at the releases.

// Martin