[RFC] Strategies for Bootstrapping Compiler-RT builtins

In the effort to flesh out the CMake build system a problematic issue has come up, and I’d like some feedback on how to best handle it.

For reference this issue has been reported by a few users, one proposed patches that don’t really address the underlying problem here:
http://reviews.llvm.org/D13131

The problem comes when bootstrapping a cross-compiler toolchain. In order to have a cross-compiling toolchain that can build a “hello world” application you need four basic components:

(1) clang
(2) ld
(3) libclang_rt (builtins)
(4) runtime libraries

Today building this toolchain with CMake is impossible because you cannot configure the cross-compiled builtins. The failure is a result of CMake’s try_compile function always testing a full compile then link operation. When bootstrapping a cross-compiler this will always fail because linking even the simplest applications fails when you don’t have libclang_rt prebuilt.

So, how do we fix this? I have a couple ideas, and am open to more.

(1) Roll our own CMake checks

We could roll our own replacement to try_compile and the various check macros that we need. In my opinion this is probably the right solution, but it does have downsides.

The big downside is that when bootstrapping compiler-rt it will need to build differently. In particular, it is probable that a bootstrap build of compiler-rt will not be able to perform the necessary checks to build the runtimes, so when bootstrapping we’ll need to disable building all the runtime libraries. We can probably find clever ways to hide a bunch of the complexity here, but it is not going to be clean.

(2) Provide a way to bootstrap the builtins without CMake

Another alternative would be to provide a way to bootstrap the builtin libraries without CMake. The builtin libraries are actually very simple to compile. It is possible to roll a custom build script for use only bootstrapping the builtins that could run on any platform and just get to a functional compiler. The biggest downside here is that bootstrapping on all supported platforms with all supported compilers is actually a non-trivial matrix, and supporting and maintaining that could be a real pain. This is my least favorite option.

(3) Split the builtins and the runtime libraries

This is the most complicated approach, but I also think it is the best approach. One of the underlying problems here is that the builtin libraries and the runtime libraries have very different requirements for building. The builtins really only require a functional compiler and archiver, and the runtime libraries require a full linker + runtime libraries (libc & libcxx). These additional build-time requirements actually make things very complicated because when bootstrapping a cross toolchain compiler-rt needs to build in two different places in the build order; once before libcxx, and once after.

I believe that the cleanest solution to this problem is going to be to separate the builtins and the sanitizers. Doing this and rolling our own CMake checks would allow us to have a fully CMake solution for building a cross-targeting toolchain. We might even be able to get support for try_compile checks that don’t link from CMake which would allow us to get rid of the hand-rolled checks in the future (I have already started a thread on the cmake-developers list).

Logistically this solution could take many forms. We could break compiler-rt out into two repositories, which would be a huge undertaking, or we could leave it as a single repository and have the builtins be able to build as a sub-project. I think we can make it work such that compiler-rt can be built either from the top-level directory to build it all, or from the builtins sub directory to support bootstrapping cross-compilers.

Either way, supporting this approach will require significant cleanup and refactoring because we’ll need to separate out the build system functionality into three categories: things that apply to builtins, things that apply to runtimes, things that apply to both. That separation will need to be somewhat clearly maintained so that we can prevent inadvertent stream crossing, because that is almost always bad.

Thoughts? Additional suggestions?

Thanks,
-Chris

Chris Bieneman via llvm-dev <llvm-dev@lists.llvm.org> writes:

In the effort to flesh out the CMake build system a problematic issue
has come up, and I’d like some feedback on how to best handle it.

For reference this issue has been reported by a few users, one
proposed patches that don’t really address the underlying problem
here:
⚙ D13131 Allow the builting of the builtins library with a forced compiler.

The problem comes when bootstrapping a cross-compiler toolchain. In
order to have a cross-compiling toolchain that can build a “hello
world” application you need four basic components:

(1) clang
(2) ld
(3) libclang_rt (builtins)
(4) runtime libraries

Today building this toolchain with CMake is impossible because you
cannot configure the cross-compiled builtins. The failure is a result
of CMake’s try_compile function always testing a full compile then
link operation. When bootstrapping a cross-compiler this will always
fail because linking even the simplest applications fails when you
don’t have libclang_rt prebuilt.

So, how do we fix this? I have a couple ideas, and am open to more.

(1) Roll our own CMake checks

We could roll our own replacement to try_compile and the various check
macros that we need. In my opinion this is probably the right
solution, but it does have downsides.

The big downside is that when bootstrapping compiler-rt it will need
to build differently. In particular, it is probable that a bootstrap
build of compiler-rt will not be able to perform the necessary checks
to build the runtimes, so when bootstrapping we’ll need to disable
building all the runtime libraries. We can probably find clever ways
to hide a bunch of the complexity here, but it is not going to be
clean.

(2) Provide a way to bootstrap the builtins without CMake

Another alternative would be to provide a way to bootstrap the builtin
libraries without CMake. The builtin libraries are actually very
simple to compile. It is possible to roll a custom build script for
use only bootstrapping the builtins that could run on any platform and
just get to a functional compiler. The biggest downside here is that
bootstrapping on all supported platforms with all supported compilers
is actually a non-trivial matrix, and supporting and maintaining that
could be a real pain. This is my least favorite option.

(3) Split the builtins and the runtime libraries

This is the most complicated approach, but I also think it is the best
approach. One of the underlying problems here is that the builtin
libraries and the runtime libraries have very different requirements
for building. The builtins really only require a functional compiler
and archiver, and the runtime libraries require a full linker +
runtime libraries (libc & libcxx). These additional build-time
requirements actually make things very complicated because when
bootstrapping a cross toolchain compiler-rt needs to build in two
different places in the build order; once before libcxx, and once
after.

I believe that the cleanest solution to this problem is going to be to
separate the builtins and the sanitizers.

To be clear, you mean split the builtins (as in lib/builtins) and all of
the runtime libraries (as in everything else under lib: sanitizers,
profiling runtimes, etc). Correct?

Doing this and rolling our own CMake checks would allow us to have a
fully CMake solution for building a cross-targeting toolchain. We
might even be able to get support for try_compile checks that don’t
link from CMake which would allow us to get rid of the hand-rolled
checks in the future (I have already started a thread on the
cmake-developers list).

Logistically this solution could take many forms. We could break
compiler-rt out into two repositories, which would be a huge
undertaking, or we could leave it as a single repository and have the
builtins be able to build as a sub-project. I think we can make it
work such that compiler-rt can be built either from the top-level
directory to build it all, or from the builtins sub directory to
support bootstrapping cross-compilers.

Either way, supporting this approach will require significant cleanup
and refactoring because we’ll need to separate out the build system
functionality into three categories: things that apply to builtins,
things that apply to runtimes, things that apply to both. That
separation will need to be somewhat clearly maintained so that we can
prevent inadvertent stream crossing, because that is almost always
bad.

Thoughts? Additional suggestions?

ISTM that (3) essentially includes the work from (1), but ends up in a
more maintainable state. (3) feels like the right approach to me, as (1)
will be error prone and hard to maintain, and (2) is kludgy and complex
in a way that could scare people away from contributing.

Are there side-effects from try_compile that we need? Can we disable it?

Jim Rowan
jmr@codeaurora.org
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by the Linux Foundation

Today building this toolchain with CMake is impossible because you cannot configure the cross-compiled builtins. The failure is a result of CMake’s try_compile function always testing a full compile then link operation. When bootstrapping a cross-compiler this will always fail because linking even the simplest applications fails when you don’t have libclang_rt prebuilt.

The situation is much worse, because it requires not only compiler-rt,
but also all the runtime libraries (libc, libc++, libc++abi) to be
available.

This is the most complicated approach, but I also think it is the best approach.

This won't work due to dependencies on libc / libc++ / libc++abi.

I think either (1) should be implemented. Or (4) - make the libraries
just to be "shimmed" on fly - proving empty libs for the configure
time just to pacify cmake.

Chris Bieneman via llvm-dev <llvm-dev@lists.llvm.org> writes:

In the effort to flesh out the CMake build system a problematic issue
has come up, and I’d like some feedback on how to best handle it.

For reference this issue has been reported by a few users, one
proposed patches that don’t really address the underlying problem
here:
http://reviews.llvm.org/D13131

The problem comes when bootstrapping a cross-compiler toolchain. In
order to have a cross-compiling toolchain that can build a “hello
world” application you need four basic components:

(1) clang
(2) ld
(3) libclang_rt (builtins)
(4) runtime libraries

Today building this toolchain with CMake is impossible because you
cannot configure the cross-compiled builtins. The failure is a result
of CMake’s try_compile function always testing a full compile then
link operation. When bootstrapping a cross-compiler this will always
fail because linking even the simplest applications fails when you
don’t have libclang_rt prebuilt.

So, how do we fix this? I have a couple ideas, and am open to more.

(1) Roll our own CMake checks

We could roll our own replacement to try_compile and the various check
macros that we need. In my opinion this is probably the right
solution, but it does have downsides.

The big downside is that when bootstrapping compiler-rt it will need
to build differently. In particular, it is probable that a bootstrap
build of compiler-rt will not be able to perform the necessary checks
to build the runtimes, so when bootstrapping we’ll need to disable
building all the runtime libraries. We can probably find clever ways
to hide a bunch of the complexity here, but it is not going to be
clean.

(2) Provide a way to bootstrap the builtins without CMake

Another alternative would be to provide a way to bootstrap the builtin
libraries without CMake. The builtin libraries are actually very
simple to compile. It is possible to roll a custom build script for
use only bootstrapping the builtins that could run on any platform and
just get to a functional compiler. The biggest downside here is that
bootstrapping on all supported platforms with all supported compilers
is actually a non-trivial matrix, and supporting and maintaining that
could be a real pain. This is my least favorite option.

(3) Split the builtins and the runtime libraries

This is the most complicated approach, but I also think it is the best
approach. One of the underlying problems here is that the builtin
libraries and the runtime libraries have very different requirements
for building. The builtins really only require a functional compiler
and archiver, and the runtime libraries require a full linker +
runtime libraries (libc & libcxx). These additional build-time
requirements actually make things very complicated because when
bootstrapping a cross toolchain compiler-rt needs to build in two
different places in the build order; once before libcxx, and once
after.

I believe that the cleanest solution to this problem is going to be to
separate the builtins and the sanitizers.

To be clear, you mean split the builtins (as in lib/builtins) and all of
the runtime libraries (as in everything else under lib: sanitizers,
profiling runtimes, etc). Correct?

Yes. lib/builtins is the part that causes all the problems.

-Chris

Today building this toolchain with CMake is impossible because you cannot configure the cross-compiled builtins. The failure is a result of CMake’s try_compile function always testing a full compile then link operation. When bootstrapping a cross-compiler this will always fail because linking even the simplest applications fails when you don’t have libclang_rt prebuilt.

Are there side-effects from try_compile that we need? Can we disable it?

We use try_compile to determine the capabilities of the toolchain you are building with. It is very important that we be able to make these calls in some form to figure out supported compiler flags, architectures, etc…

-Chris

Today building this toolchain with CMake is impossible because you cannot configure the cross-compiled builtins. The failure is a result of CMake’s try_compile function always testing a full compile then link operation. When bootstrapping a cross-compiler this will always fail because linking even the simplest applications fails when you don’t have libclang_rt prebuilt.

The situation is much worse, because it requires not only compiler-rt,
but also all the runtime libraries (libc, libc++, libc++abi) to be
available.

This is the most complicated approach, but I also think it is the best approach.

This won't work due to dependencies on libc / libc++ / libc++abi.

That is not entirely correct. Only some of the builtins have dependencies on libc, and none of them should have dependencies on libc++ or libc++abi. I believe we can construct a working partial builtin library without libc, to make bootstrapping work.

-Chris

That is not entirely correct. Only some of the builtins have dependencies on libc, and none of them should have dependencies on libc++ or libc++abi. I believe we can construct a working partial builtin library without libc, to make bootstrapping work.

It's not that the builtins depend on them, but cmake depends on them.
Requiring the full compile & link cycle effectively requires the .a /
.so . /.dylib files to be present.

That is not entirely correct. Only some of the builtins have dependencies on libc, and none of them should have dependencies on libc++ or libc++abi. I believe we can construct a working partial builtin library without libc, to make bootstrapping work.

It's not that the builtins depend on them, but cmake depends on them.
Requiring the full compile & link cycle effectively requires the .a /
.so . /.dylib files to be present.

The checks that cause this issue can be bypassed by setting CMAKE_C_COMPILER_WORKS=On and CMAKE_CXX_COMPILER_WORKS=On. That isn’t ideal, and I’m currently talking with members of the CMake development community to figure out a proper solution.

-Chris

The problem comes when bootstrapping a cross-compiler toolchain. In order to
have a cross-compiling toolchain that can build a “hello world” application you
need four basic components:

(1) clang
(2) ld
(3) libclang_rt (builtins)
(4) runtime libraries

Today building this toolchain with CMake is impossible because you cannot
configure the cross-compiled builtins. The failure is a result of CMake’s
try_compile function always testing a full compile then link operation. When
bootstrapping a cross-compiler this will always fail because linking even the
simplest applications fails when you don’t have libclang_rt prebuilt.

In my case, I had to use the CMAKE_{C,CXX}_COMPILER_FORCED variables and
guestimate the appropriate C/C++ flags in order to avoid CMake getting in the
way with the various checks. My scripts build clang & lld, then install the
linux's kernel & Musl's C library headers and finally build compiler-rt's
builtins library (by setting COMPILER_RT_BUILD_SANITIZERS=Off). The next steps
build Musl, libunwind, libcxx and libcxxabi. This way I'm able to "use" CMake
for every LLVM project.

What isn't clear to me is whether you intend to extend your work to libunwind,
libcxx & libcxxabi because they do require a working C++ compiler too. Also,
my understanding is that the same is true for the sanitizers libraries.

So, how do we fix this? I have a couple ideas, and am open to more.

...

(3) Split the builtins and the runtime libraries

This is the most complicated approach, but I also think it is the best
approach. One of the underlying problems here is that the builtin libraries
and the runtime libraries have very different requirements for building. The
builtins really only require a functional compiler and archiver, and the
runtime libraries require a full linker + runtime libraries (libc & libcxx).
These additional build-time requirements actually make things very
complicated because when bootstrapping a cross toolchain compiler-rt needs to
build in two different places in the build order; once before libcxx, and
once after.

IMHO, from the options that you've mentioned, (3) is the cleanest one.
Conceptually, as you mentioned the builtins library does not belong with the
sanitizers and it would make sense splitting them in separate repositories.
Would it be too difficult to create an initial split given that we already
provide/support the COMPILER_RT_BUILD_BUILTINS and COMPILER_RT_BUILD_SANITIZERS
options?

Either way, supporting this approach will require significant cleanup and
refactoring because we’ll need to separate out the build system functionality
into three categories: things that apply to builtins, things that apply to
runtimes, things that apply to both.

I don't have enought experience with CMake so I don't know how feasible this is,
but for the last two categories it would be nice to keep in mind that the user
might have a C library other than Glibc.

- Vasileios

Hi Chris - Many thanks for airing all this. I'm now hopeful for an
end to my own hacks and false starts trying to fix these same
problems. My response is coming from the perspective of an
out-of-tree target without binutils or libgcc support.

(3) Split the built-ins and the runtime libraries

+1. Eliminating unnecessary entanglement is good engineering and
helps especially in the overwhelming early days of new target
development.

I believe that the cleanest solution to this problem is going to be to separate the built-ins and the sanitizers.

+1, again for sensible modularity and being merciful on new targets.

Logistically this solution could take many forms. We could break compiler-rt out into two repositories, which would be a huge undertaking, or we could leave it as a single repository and have the built-ins be able to build as a sub-project. I think we can make it work such that compiler-rt can be built either from the top-level directory to build it all, or from the built-ins sub directory to support bootstrapping cross-compilers.

I'm not sure if this jibes with your statement, but IMHO, the
refactored built-ins should move into the LLVM repo. In this spirit,
I believe gcc includes libgcc in-tree. Being both a small library and
intimately tied to LLVM, the complexity of a new separate built-ins
repo is a little dubious. The sanitizers are a separate can of worms.

Thanks,
-steve

Hi Chris,

Thanks for taking the time to raise this issue.
I know I have been a bit pushy about getting it solved.

I think either (1) should be implemented. Or (4) - make the libraries
just to be “shimmed” on fly - proving empty libs for the configure
time just to pacify cmake.

Unfortunately option 4 here wont work for some targets.
The specific one I am dealing with is mingw-w64.
The mingw-w64-crt depends on some functions internally in libgcc/compiler-rt to even build a return 0 from main empty executable.
So this would still fail at link time.
This could have been a solution if we were dealing with linux targets only.

In my case, I had to use the CMAKE_{C,CXX}_COMPILER_FORCED variables and
guestimate the appropriate C/C++ flags in order to avoid CMake getting in the
way with the various checks. My scripts build clang & lld, then install the
linux’s kernel & Musl’s C library headers and finally build compiler-rt’s
builtins library (by setting COMPILER_RT_BUILD_SANITIZERS=Off). The next steps
build Musl, libunwind, libcxx and libcxxabi. This way I’m able to “use” CMake
for every LLVM project.

This is exactly how I am working around this atm for the mingw-w64 target.
Do you have your scripts somewhere I would like to compare?

IMHO, from the options that you’ve mentioned, (3) is the cleanest one.
Conceptually, as you mentioned the builtins library does not belong with the
sanitizers and it would make sense splitting them in separate repositories.
Would it be too difficult to create an initial split given that we already
provide/support the COMPILER_RT_BUILD_BUILTINS and COMPILER_RT_BUILD_SANITIZERS
options?

The issue seems to be that even if we set COMPILER_RT_BUILD_SANITIZERS to off it still does all the checks.
Maybe it would be possible from within the cmakelists of compiler-rt to seperate them out more to avoid this when we are just building the builtins?
This seems like a decent first step forward regardless if it is agreed to separate out into 2 different projects.

I’m not sure if this jibes with your statement, but IMHO, the
refactored built-ins should move into the LLVM repo. In this spirit,
I believe gcc includes libgcc in-tree. Being both a small library and
intimately tied to LLVM, the complexity of a new separate built-ins
repo is a little dubious. The sanitizers are a separate can of worms.

This was something I also was thinking about.
There are complexities here though because unlike gcc, llvm is multi target based.

  • Martell

The problem comes when bootstrapping a cross-compiler toolchain. In order to
have a cross-compiling toolchain that can build a “hello world” application you
need four basic components:

(1) clang
(2) ld
(3) libclang_rt (builtins)
(4) runtime libraries

Today building this toolchain with CMake is impossible because you cannot
configure the cross-compiled builtins. The failure is a result of CMake’s
try_compile function always testing a full compile then link operation. When
bootstrapping a cross-compiler this will always fail because linking even the
simplest applications fails when you don’t have libclang_rt prebuilt.

In my case, I had to use the CMAKE_{C,CXX}_COMPILER_FORCED variables and
guestimate the appropriate C/C++ flags in order to avoid CMake getting in the
way with the various checks. My scripts build clang & lld, then install the
linux's kernel & Musl's C library headers and finally build compiler-rt's
builtins library (by setting COMPILER_RT_BUILD_SANITIZERS=Off). The next steps
build Musl, libunwind, libcxx and libcxxabi. This way I'm able to "use" CMake
for every LLVM project.

Setting the *_FORCED variables is roughly equivalent to setting the *_WORKS variables. I’d really like it if we didn’t need to set those.

What isn't clear to me is whether you intend to extend your work to libunwind,
libcxx & libcxxabi because they do require a working C++ compiler too. Also,
my understanding is that the same is true for the sanitizers libraries.

I have a separate set of patches I’m working on now to generalize our use of ExternalProject so we can build libcxx, libcxxabi, libunwind, and anything else that comes along with just-built compilers to fix this.

So, how do we fix this? I have a couple ideas, and am open to more.

...

(3) Split the builtins and the runtime libraries

This is the most complicated approach, but I also think it is the best
approach. One of the underlying problems here is that the builtin libraries
and the runtime libraries have very different requirements for building. The
builtins really only require a functional compiler and archiver, and the
runtime libraries require a full linker + runtime libraries (libc & libcxx).
These additional build-time requirements actually make things very
complicated because when bootstrapping a cross toolchain compiler-rt needs to
build in two different places in the build order; once before libcxx, and
once after.

IMHO, from the options that you've mentioned, (3) is the cleanest one.
Conceptually, as you mentioned the builtins library does not belong with the
sanitizers and it would make sense splitting them in separate repositories.
Would it be too difficult to create an initial split given that we already
provide/support the COMPILER_RT_BUILD_BUILTINS and COMPILER_RT_BUILD_SANITIZERS
options?

I think there is a lot of work involved here. We’ll need to generalize and split up some of the configuration content (stuff like config-ix.cmake) into the bits that apply everywhere, the bits that are needed only for builtins, and the bits that are only needed for sanitizers. I’m willing to start tackling that work if this is what the community thinks the right approach is.

Either way, supporting this approach will require significant cleanup and
refactoring because we’ll need to separate out the build system functionality
into three categories: things that apply to builtins, things that apply to
runtimes, things that apply to both.

I don't have enought experience with CMake so I don't know how feasible this is,
but for the last two categories it would be nice to keep in mind that the user
might have a C library other than Glibc.

I’m certainly not going to expect the user has Glibc. I do all my work on Darwin platforms and we don’t have Glibc.

-Chris

Hi Chris - Many thanks for airing all this. I'm now hopeful for an
end to my own hacks and false starts trying to fix these same
problems. My response is coming from the perspective of an
out-of-tree target without binutils or libgcc support.

(3) Split the built-ins and the runtime libraries

+1. Eliminating unnecessary entanglement is good engineering and
helps especially in the overwhelming early days of new target
development.

I believe that the cleanest solution to this problem is going to be to separate the built-ins and the sanitizers.

+1, again for sensible modularity and being merciful on new targets.

Logistically this solution could take many forms. We could break compiler-rt out into two repositories, which would be a huge undertaking, or we could leave it as a single repository and have the built-ins be able to build as a sub-project. I think we can make it work such that compiler-rt can be built either from the top-level directory to build it all, or from the built-ins sub directory to support bootstrapping cross-compilers.

I'm not sure if this jibes with your statement, but IMHO, the
refactored built-ins should move into the LLVM repo. In this spirit,
I believe gcc includes libgcc in-tree. Being both a small library and
intimately tied to LLVM, the complexity of a new separate built-ins
repo is a little dubious. The sanitizers are a separate can of worms.

Sadly, I believe there are licensing reasons why the builtins can’t be in the LLVM repo. Specifically the LLVM license has an attribution clause that the compiler-rt license deliberately doesn’t have. The LLVM attribution clause if applied to the builtins would mean anyone who builds an application that links the builtins would need to distribute their software with a notice saying it includes LLVM code.

-Chris

Setting the *_FORCED variables is roughly equivalent to setting the *_WORKS variables. I’d really like it if we didn’t need to set those.

Yes, I believe the *_FORCED variables are a little bit more intrusive than the *_WORKS variables.

IMHO, from the options that you've mentioned, (3) is the cleanest one.
Conceptually, as you mentioned the builtins library does not belong with the
sanitizers and it would make sense splitting them in separate repositories.
Would it be too difficult to create an initial split given that we already
provide/support the COMPILER_RT_BUILD_BUILTINS and COMPILER_RT_BUILD_SANITIZERS
options?

I think there is a lot of work involved here. We’ll need to generalize and split up
some of the configuration content (stuff like config-ix.cmake) into the bits that apply
everywhere, the bits that are needed only for builtins, and the bits that are only
needed for sanitizers. I’m willing to start tackling that work if this is what
the community thinks the right approach is.

Uh, you're right. I forgot about the common checks in config-ix.cmake.

I don't have enought experience with CMake so I don't know how feasible this is,
but for the last two categories it would be nice to keep in mind that the user
might have a C library other than Glibc.

I’m certainly not going to expect the user has Glibc. I do all my work on Darwin platforms and we don’t have Glibc.

What I meant here is that some user might want to use another C library, other than the one installed in the system.
This would require us to be able to build the C library in between the LLVM projects, ie. after builtins. This
is just a suggestion as I don't know how much work/effort this would take (or if it requires any work at all).

- Vasileios

Setting the *_FORCED variables is roughly equivalent to setting the *_WORKS variables. I’d really like it if we didn’t need to set those.

Yes, I believe the *_FORCED variables are a little bit more intrusive than the *_WORKS variables.

IMHO, from the options that you've mentioned, (3) is the cleanest one.
Conceptually, as you mentioned the builtins library does not belong with the
sanitizers and it would make sense splitting them in separate repositories.
Would it be too difficult to create an initial split given that we already
provide/support the COMPILER_RT_BUILD_BUILTINS and COMPILER_RT_BUILD_SANITIZERS
options?

I think there is a lot of work involved here. We’ll need to generalize and split up
some of the configuration content (stuff like config-ix.cmake) into the bits that apply
everywhere, the bits that are needed only for builtins, and the bits that are only
needed for sanitizers. I’m willing to start tackling that work if this is what
the community thinks the right approach is.

Uh, you're right. I forgot about the common checks in config-ix.cmake.

I don't have enought experience with CMake so I don't know how feasible this is,
but for the last two categories it would be nice to keep in mind that the user
might have a C library other than Glibc.

I’m certainly not going to expect the user has Glibc. I do all my work on Darwin platforms and we don’t have Glibc.

What I meant here is that some user might want to use another C library, other than the one installed in the system.
This would require us to be able to build the C library in between the LLVM projects, ie. after builtins. This
is just a suggestion as I don't know how much work/effort this would take (or if it requires any work at all).

Sure. I’m not sure if everyone will agree with me, but my general assertion is that if you’re doing a full bootstrap of a cross compiler, we’re not going to fully support that from a single CMake build tree. We didn’t support that in autoconf, so I don’t think CMake should be different in that regard. I envision bootstrapping a full cross-platform to be something like:

(1) build in-tree clang
(2) build out-of-tree builtin library
(3) build out-of-tree runtime libraries

-Chris

Repos and licenses are orthogonal, but I get the concern.

Switching gears to other questions:
Should the bootstrap build automatically produce a built-ins library
for each target in "llvm-config --targets-built" or is the developer
expected to provide an explicit list? Hopefully the former.

Is it reasonable that bootstrap build not depend on GNU binutils?
This might be mission creep, but it's a drag that the unspoken first
step in developing an LLVM based toolchain is "port binutils". I
haven't kept watch on LLD progress, but perhaps it's far enough along
that bootstrap process can depend on it.

Regards,
-steve

Sadly, I believe there are licensing reasons why the builtins can’t be in the LLVM repo.

Repos and licenses are orthogonal, but I get the concern.

Switching gears to other questions:
Should the bootstrap build automatically produce a built-ins library
for each target in "llvm-config --targets-built" or is the developer
expected to provide an explicit list? Hopefully the former.

Today there isn’t really a simple answer. For Linux I think the bootstrap only builds for the host architecture by default. For Darwin, it builds all supported Darwin architectures, which can be complicated to determine.

Is it reasonable that bootstrap build not depend on GNU binutils?

Not necessarily. To my understanding LLVM doesn’t fully replace bin utils on any platform yet, and I think we’re a long way from saying the LLVM toolchain is the default builtin toolchain.

This might be mission creep, but it's a drag that the unspoken first
step in developing an LLVM based toolchain is "port binutils". I
haven't kept watch on LLD progress, but perhaps it's far enough along
that bootstrap process can depend on it.

On Darwin ar and lld are the biggest pieces that aren’t fully featured yet. On other platforms I think there are still places that lld isn’t fully fleshed.

-Chris

Sure. I’m not sure if everyone will agree with me, but my general assertion is that if you’re doing a full bootstrap of a cross compiler, we’re not going to fully support that from a single CMake build tree. We didn’t support that in autoconf, so I don’t think CMake should be different in that regard. I envision bootstrapping a full cross-platform to be something like:
(1) build in-tree clang
(2) build out-of-tree builtin library
(3) build out-of-tree runtime libraries

This seems reasonable to me.
The default int-tree stuff should be for the host.
I don’t think many will have issue with doing out of tree builds or the builtins or the runtime if they are cross compiling.

Today there isn’t really a simple answer. For Linux I think the bootstrap only builds for the host architecture by default. For Darwin, it builds all supported Darwin architectures, which can be complicated to determine.

Not quite sure why IOS is built by default on darwin it should probably be OSX only and out of tree for IOS like you said above.
Doubt many will agree with changing this now though.

This might be mission creep, but it’s a drag that the unspoken first
step in developing an LLVM based toolchain is “port binutils”. I
haven’t kept watch on LLD progress, but perhaps it’s far enough along
that bootstrap process can depend on it.
On Darwin ar and lld are the biggest pieces that aren’t fully featured yet.

Yes Darwin seems to be the biggest blocker here with the least amount of work gone into it.
The COFF and ELF linkers have been replaced with section based ones as that makes more sense.
I don’t know enough about MachO to say if would be better with switching or staying with an atom based design.
It would be great if maybe an apple engineer more in the know would nominate them selves and bring it up to par with the rest and then maybe become an OWNER.

On other platforms I think there are still places that lld isn’t fully fleshed.

Yes there are some parts still missing and that are wip but the new COFF and ELF section based linkers should bootstrap just fine.

Just as a point for building the builtins shouldn’t we just need llvm-ar ?
While not being feasible for the Darwin target as you said above it should be perfectly fine for windows and linux.
I don’t think we should even need the linker to bootstrap the builtins or am I incorrect in saying this ?
If so I think it think it would be reasonable to have that as the mission creep.

Sure. I’m not sure if everyone will agree with me, but my general assertion is that if you’re doing a full bootstrap of a cross compiler, we’re not going to fully support that from a single CMake build tree. We didn’t support that in autoconf, so I don’t think CMake should be different in that regard. I envision bootstrapping a full cross-platform to be something like:
(1) build in-tree clang
(2) build out-of-tree builtin library
(3) build out-of-tree runtime libraries

This seems reasonable to me.
The default int-tree stuff should be for the host.
I don’t think many will have issue with doing out of tree builds or the builtins or the runtime if they are cross compiling.

Cool. This then makes your other point about requiring LLVM tools less of an issue because the out-of-tree builds can use whatever tools you choose. We just need to make the builtins work so that you don’t need them already built.

Today there isn’t really a simple answer. For Linux I think the bootstrap only builds for the host architecture by default. For Darwin, it builds all supported Darwin architectures, which can be complicated to determine.

Not quite sure why IOS is built by default on darwin it should probably be OSX only and out of tree for IOS like you said above.
Doubt many will agree with changing this now though.

I’m the one who landed a lot of that code, and I want to rip it all out. We just need to make it work the right way before we can rip it out.

This might be mission creep, but it’s a drag that the unspoken first
step in developing an LLVM based toolchain is “port binutils”. I
haven’t kept watch on LLD progress, but perhaps it’s far enough along
that bootstrap process can depend on it.
On Darwin ar and lld are the biggest pieces that aren’t fully featured yet.

Yes Darwin seems to be the biggest blocker here with the least amount of work gone into it.
The COFF and ELF linkers have been replaced with section based ones as that makes more sense.
I don’t know enough about MachO to say if would be better with switching or staying with an atom based design.

Not to re-hash an old argument, but Apple has no interest in a section-based MachO linker. We already have a great one of those that doesn’t have the licensing issues that come with binutils. What we do want is a better optimizing linker, which is what is leading us to support the atom-based design.

It would be great if maybe an apple engineer more in the know would nominate them selves and bring it up to par with the rest and then maybe become an OWNER.

Lang has been putting a lot of work into that, and he did nominate himself as code owner, it is even reflected in the CODE_OWNERS list. The only parts of LLVM that I think don’t build correctly with LLD on MachO are some of the runtime libraries, but I expect that will get fleshed out soon enough too.

On other platforms I think there are still places that lld isn’t fully fleshed.

Yes there are some parts still missing and that are wip but the new COFF and ELF section based linkers should bootstrap just fine.

Just as a point for building the builtins shouldn’t we just need llvm-ar ?
While not being feasible for the Darwin target as you said above it should be perfectly fine for windows and linux.
I don’t think we should even need the linker to bootstrap the builtins or am I incorrect in saying this ?
If so I think it think it would be reasonable to have that as the mission creep.

I think that since bootstrapping will be an out-of-tree build there is no reason to require a fully LLVM toolchain as it doesn’t get us anything. Once we make the builtins work just requiring a compiler, ar, and ranlib there is nothing stopping anyone from bootstrapping with a fully LLVM toolchain on Linux.

-Chris