In the effort to flesh out the CMake build system a problematic issue has come up, and I’d like some feedback on how to best handle it.
For reference this issue has been reported by a few users, one proposed patches that don’t really address the underlying problem here:
http://reviews.llvm.org/D13131
The problem comes when bootstrapping a cross-compiler toolchain. In order to have a cross-compiling toolchain that can build a “hello world” application you need four basic components:
(1) clang
(2) ld
(3) libclang_rt (builtins)
(4) runtime libraries
Today building this toolchain with CMake is impossible because you cannot configure the cross-compiled builtins. The failure is a result of CMake’s try_compile function always testing a full compile then link operation. When bootstrapping a cross-compiler this will always fail because linking even the simplest applications fails when you don’t have libclang_rt prebuilt.
So, how do we fix this? I have a couple ideas, and am open to more.
(1) Roll our own CMake checks
We could roll our own replacement to try_compile and the various check macros that we need. In my opinion this is probably the right solution, but it does have downsides.
The big downside is that when bootstrapping compiler-rt it will need to build differently. In particular, it is probable that a bootstrap build of compiler-rt will not be able to perform the necessary checks to build the runtimes, so when bootstrapping we’ll need to disable building all the runtime libraries. We can probably find clever ways to hide a bunch of the complexity here, but it is not going to be clean.
(2) Provide a way to bootstrap the builtins without CMake
Another alternative would be to provide a way to bootstrap the builtin libraries without CMake. The builtin libraries are actually very simple to compile. It is possible to roll a custom build script for use only bootstrapping the builtins that could run on any platform and just get to a functional compiler. The biggest downside here is that bootstrapping on all supported platforms with all supported compilers is actually a non-trivial matrix, and supporting and maintaining that could be a real pain. This is my least favorite option.
(3) Split the builtins and the runtime libraries
This is the most complicated approach, but I also think it is the best approach. One of the underlying problems here is that the builtin libraries and the runtime libraries have very different requirements for building. The builtins really only require a functional compiler and archiver, and the runtime libraries require a full linker + runtime libraries (libc & libcxx). These additional build-time requirements actually make things very complicated because when bootstrapping a cross toolchain compiler-rt needs to build in two different places in the build order; once before libcxx, and once after.
I believe that the cleanest solution to this problem is going to be to separate the builtins and the sanitizers. Doing this and rolling our own CMake checks would allow us to have a fully CMake solution for building a cross-targeting toolchain. We might even be able to get support for try_compile checks that don’t link from CMake which would allow us to get rid of the hand-rolled checks in the future (I have already started a thread on the cmake-developers list).
Logistically this solution could take many forms. We could break compiler-rt out into two repositories, which would be a huge undertaking, or we could leave it as a single repository and have the builtins be able to build as a sub-project. I think we can make it work such that compiler-rt can be built either from the top-level directory to build it all, or from the builtins sub directory to support bootstrapping cross-compilers.
Either way, supporting this approach will require significant cleanup and refactoring because we’ll need to separate out the build system functionality into three categories: things that apply to builtins, things that apply to runtimes, things that apply to both. That separation will need to be somewhat clearly maintained so that we can prevent inadvertent stream crossing, because that is almost always bad.
Thoughts? Additional suggestions?
Thanks,
-Chris