How to pass cmake flags to multi stage clang builds

I want to build a 3 stage bootstrapped clang compiler from source with default linker set to “lld”. For normal builds I can use the cmake flag “CLANG_DEFAULT_LINKER=lld” to achieve this. However for multi stage builds I don’t see if there’s a recommended solution to do stuff like this.

For bootstrapped (2 stage) compiler I came across this cmake flag “CLANG_BOOTSTRAP_PASSTHROUGH” which allows me to pass on certain cmake flags to 2nd stage compiler build.

Inspired by this I came up with 2 hacks to achieve the same for a 3 stage build
Hack 1) Use these cmake flags

-DBOOTSTRAP_CLANG_DEFAULT_LINKER=lld
-DBOOTSTRAP_BOOTSTRAP_CLANG_DEFAULT_LINKER=lld

BOOTSTRAP_ prefix will set whatever flag follows it for the next build.

Hack2) Pass CLANG_BOOTSTRAP_PASSTHROUGH to itself

-DCLANG-DEFAULT_LINKER=lld
-DCLANG_BOOTSTRAP_PASSTHROUGH="CLANG_BOOTSTRAP_PASSTHROUGH;CLANG_DEFAULT_LINKER"

These might not be “hacks” and maybe the right way to pass cmake flags but I couldn’t find any documentation so asking here.

I recommend using CMAKE_PROJECT_INCLUDE, or one of its variants.

1 Like

We do document BOOTSTRAP_BOOTSTRAP in the bolt build configuration here: Advanced Build Configurations — LLVM 19.0.0git documentation

We also use it for a few things (like the PGO builds) where the various stages in the build do radically different things.

I don’t think CMAKE_PROJECT_INCLUDE_BEFORE helps here. This is specifically an issue of how you communicate build options to recursive clang CMake invocations which are used for bootstrap building (building clang, then using that clang to build clang again, rinse and repeat).

In general I recommend using CMake caches for bootstrap configurations rather than direct command line arguments because the command lines get complicated quickly.

1 Like

I don’t know if there is a ‘right’ way to do this, but I prefer to use BOOTSTRAP_* and BOOTSTRAP_BOOTSTRAP_* variables rather than CLANG_BOOTSTRAP_PASSTHROUGH, because it’s much easier for someone to look at the cache file and understand which flags are being passed to which stage.

1 Like

I don’t think CMAKE_PROJECT_INCLUDE_BEFORE helps here.

It’s unclear to me how someone might come to that conclusion.

I’m advising the following:

# On the command line:
-DCMAKE_PROJECT_INCLUDE=my_llvm_configuration.cmake

# In my_llvm_configuration.cmake:
set(BOOTSTRAP_CMAKE_PROJECT_INCLUDE "${CMAKE_CURRENT_LIST_FILE}" CACHE PATH "")

if(CLANG_STAGE STREQUAL "stage1")
  # stage1-specific settings here

if(CLANG_STAGE STREQUAL "stage2")
  # stage2-specific settings here

if(CLANG_STAGE STREQUAL "stage3")
  # stage3-specific settings here

Then they can do whatever is appropriate for each stage in those files (which may well go beyond setting variables. Maybe they want to define debug macros that print the output from feature test procedures to see why they’re getting different results in stage2 and stage3, for example.).

Fuchsia takes one of the approaches Singh described, of prepending a prescribed list of variables with BOOTSTRAP_:

# From clang/cmake/caches/Fuchsia.cmake:

foreach(variable ${_FUCHSIA_BOOTSTRAP_PASSTHROUGH})
  get_property(is_value_set CACHE ${variable} PROPERTY VALUE SET)
  if(${is_value_set})
    get_property(value CACHE ${variable} PROPERTY VALUE)
    get_property(type CACHE ${variable} PROPERTY TYPE)
    set(BOOTSTRAP_${variable} "${value}" CACHE ${type} "")
  endif()
endforeach()

Then later the BOOTSTRAP_ prefix gets stripped and each corresponding variable is added to PASSTHROUGH_VARIABLES:

# From clang/CMakeLists.txt:

# Find all variables that start with BOOTSTRAP_ and populate a variable with them.
get_cmake_property(variableNames VARIABLES)
foreach(variableName ${variableNames})
  if(variableName MATCHES "^BOOTSTRAP_")
    string(SUBSTRING ${variableName} 10 -1 varName)
    string(REPLACE ";" "|" value "${${variableName}}")
    list(APPEND PASSTHROUGH_VARIABLES -D${varName}=${value})
  endif()
  if(${variableName} AND variableName MATCHES "LLVM_EXTERNAL_.*_SOURCE_DIR")
    list(APPEND PASSTHROUGH_VARIABLES -D${variableName}=${${variableName}})
  endif()
endforeach()

Then the contents of PASSTHROUGH_VARIABLES are handed to ExternalProject_Add:

# From clang/CMakeLists.txt:

ExternalProject_Add(${NEXT_CLANG_STAGE}
  DEPENDS clang-bootstrap-deps
  PREFIX ${NEXT_CLANG_STAGE}
  SOURCE_DIR ${CMAKE_SOURCE_DIR}
  STAMP_DIR ${STAMP_DIR}
  BINARY_DIR ${BINARY_DIR}
  EXCLUDE_FROM_ALL 1
  CMAKE_ARGS
              # We shouldn't need to set this here, but INSTALL_DIR doesn't
              # seem to work, so instead I'm passing this through
              -DCMAKE_INSTALL_PREFIX=${CMAKE_INSTALL_PREFIX}
              ${PASSTHROUGH_VARIABLES}
              ${CLANG_BOOTSTRAP_CMAKE_ARGS}
               -DCLANG_STAGE=${NEXT_CLANG_STAGE}
              ${COMPILER_OPTIONS}
              ${${CLANG_STAGE}_TABLEGEN}
              ${LTO_LIBRARY} ${verbose} ${PGO_OPT}
              ${${CLANG_STAGE}_LINKER}
              ${${CLANG_STAGE}_AR}
              ${${CLANG_STAGE}_RANLIB}
              ${${CLANG_STAGE}_OBJCOPY}
              ${${CLANG_STAGE}_STRIP}
  BUILD_COMMAND ${CMAKE_COMMAND} --build ${BINARY_DIR}
                                 --config ${build_configuration}
                                 ${build_tool_args}
  INSTALL_COMMAND ""
  STEP_TARGETS configure build
  USES_TERMINAL_CONFIGURE 1
  USES_TERMINAL_BUILD 1
  USES_TERMINAL_INSTALL 1
  LIST_SEPARATOR |
  )

That certainly works (unless you exceed the maximum command buffer size, which I’ve encountered on Windows), though you can run into issues with escape sequences, semicolons, pipe symbols, and spaces during the translation from CMake to the command line, and then back into CMake (which I’ve encountered). It’s also doing a lot of string manipulation, and it’s possible (I haven’t measured it) that that’s burning enough cycles to be meaningfully contributing to build times.

A similar situation arises when trying to pass configuration settings to the “Builtins” or “Runtimes” external projects, where you need to prepend BUILTINS_<target>_ or RUNTIMES_<target>_ to each variable you want to pass through (see: Fuchsia-stage2.cmake, CrossWinToArmLinux.cmake, etc). Instead, I would suggest adding the following to the my_llvm_configuration.cmake example above:

# in my_llvm_configuration.cmake:

# Configure the builtins and runtimes projects to load this file as well.
set( BUILTINS_<target>_CMAKE_PROJECT_INCLUDE "${CMAKE_CURRENT_LIST_FILE}" CACHE PATH "" )
set( RUNTIMES_<target>_CMAKE_PROJECT_INCLUDE "${CMAKE_CURRENT_LIST_FILE}" CACHE PATH "" )

if(CMAKE_PROJECT_NAME STREQUAL "Runtimes")
  # runtimes-specific settings here

if(CMAKE_PROJECT_NAME STREQUAL "CompilerRTBuiltins")
  # builtins-specific settings here

# Note that CMAKE_PROJECT_NAME will not be set when this
# configuration file is invoked if you've used one of the
# *_BEFORE variants (such as CMAKE_PROJECT_INCLUDE_BEFORE).

This allows you to avoid rewriting options using these various prefixes multiple times. For example:

# in clang/cmake/caches/Fuchsia.cmake:
foreach(target aarch64-unknown-linux-gnu; ... )
  set(RUNTIMES_${target}_LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")

if(BOOTSTRAP_CMAKE_SYSTEM_NAME)
  if(STAGE2_LINUX_${target}_SYSROOT)
    set(RUNTIMES_${target}_LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")

# in clang/cmake/caches/Fuchsia-stage2.cmake:
if(APPLE)
  set(LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")

foreach(target aarch64-unknown-linux-gnu; ... )
  if(LINUX_${target}_SYSROOT)
    set(RUNTIMES_${target}_LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")

if(FUCHSIA_SDK)
  foreach(target x86_64-unknown-fuchsia; ... )
    set(RUNTIMES_${target}_LIBCXX_USE_COMPILER_RT ON CACHE BOOL "")

(And this isn’t to pick on Fuchsia; I’m using their project configurations in my examples here because they’re by far and away the most thoroughly specified project configurations currently in the codebase.)

1 Like

There are 1000 ways to skin a cat using CMake…

Many of the advanced build configurations we document use CMake cache scripts (see: (Advanced Build Configurations — LLVM 19.0.0git documentation) which are similar to CMAKE_PROJECT_INCLUDE_BEFORE, but with different variable scoping.

In the 3-stage builds we even set cache file for subsequent stages: llvm-project/clang/cmake/caches/3-stage-base.cmake at main · llvm/llvm-project · GitHub (we do something similar for the PGO builds).

All of that is overkill to the original user’s question of “how do I pass one option through to a 3rd stage?” For that BOOTSTRAP_BOOTSTRAP_ is the recommended approach.

If the user needs to do something more complicated, I recommend CMake cache scripts, which is what we have documented in the project documentation.

IMO, CMake cache scripts have an advantage over CMAKE_PROJECT_INCLUDE_BEFORE because of the restricted scope of influence they can have. That of course is also a limiting factor on the types of use cases they can be used for, but they’re perfectly applicable to setting CMake variables that would otherwise be set via the command line.

All of that is overkill to the original user’s question of “how do I pass one option through to a 3rd stage?” For that BOOTSTRAP_BOOTSTRAP_ is the recommended approach.

Oh, certainly.

My expectation was that Singh created a minimal representative example for the sake of clarity. The LLVM codebase makes use of 831 different CMake configuration settings (by my count), so it’s easy to imagine that they need, or will need, to set more than two in the course of getting their work done.

I recommend CMake cache scripts, which is what we have documented in the project documentation.

Just to clarify, is the suggestion here to do something along the lines of:

# from the command line:
-C my_llvm_configuration.cmake

# from my_llvm_configuration.cmake:
set(CLANG_BOOTSTRAP_CMAKE_ARGS "-C my_bootstrap_configuration.cmake" CACHE STRING "" )
set(RUNTIMES_CMAKE_ARGS        "-C my_runtimes_configuration.cmake"  CACHE STRING "" )
set(BUILTINS_CMAKE_ARGS        "-C my_builtins_configuration.cmake"  CACHE STRING "" )

That is, those three *_CMAKE_ARGS variables are intended to be populated with paths to “initial-cache” files?

The LLVM documentation does describe the intended use of CLANG_BOOTSTRAP_CMAKE_ARGS, but it doesn’t mention the other two, and this discussion of RUNTIMES_CMAKE_ARGS in 2022/2023 doesn’t make mention of this approach, so I suspect I may be misunderstanding you.

Thanks @DustinGadal for the detailed answer! For my current use, as suggested in other answers just prefixing cmake flag with BOOTSTRAP_BOOTSTRAP_ will suffice. However, I’m interested in the first snippet you shared which might be relevant for customizing our local build scripts.

if(CLANG_STAGE STREQUAL "stage1")
  # stage1-specific settings here

if(CLANG_STAGE STREQUAL "stage2")
  # stage2-specific settings here

if(CLANG_STAGE STREQUAL "stage3")

How do these checks work? Does clang now which stage compiler is being built and sets CLANG_STAGE appropriately? Or is it just a placeholder example to get the point across?