Introduction
This RFC outlines a long-term vision and aggregated ideas for improving the build times. Generally this is more about mechanical and structural changes that can considerably improve build times in the long term.
The short-term (<1 year) objective is to run a full build graph within the same, long-lived LLVM daemon process, moving beyond the traditional model of spawning a new OS process for each build action. This is motivated by the significant process creation and I/O overhead on Windows, but the benefits extend to achieving a truly efficient, modern, and incremental compilation model across all platforms.
The first practical step toward this (sequential in-process execution) is discussed in [14]. Following steps after that RFC will be, multithreaded (concurrent) in-process execution and a build daemon. Those ideas were presented at past LLVM conferences in [2] and [3].
In the long term (if there’s an agreement on the above), this vision tends towards a system where overall local build times are directly proportional to the incremental changes done by a local user (on the target / compiled codebase). Changes done elsewhere (by other users on the same codebase) should already be incorporated asynchronously into the local build cache.
Short-term steps
This consists in four practical steps:
-
In-Process Execution: Run build commands sequentially within a single, long-lived process to reduce process overhead.
-
Multithreaded Execution: Extend the previous to run build commands concurrently within the single process, requiring the removal of global state in LLVM for thread safety.
-
Cached I/O: Introduce a shared, thread-safe Virtual File System (VFS) to bypass disk I/O; pass intermediate object state in shared memory between tools (e.g., Clang and LLD).
-
Build Daemon: Create a long-lived LLVM compilation service to manage entire build graphs, perform on-demand compilation, and shorten the build loop for client applications.
Here we only succintly describe the high-level intention; later RFCs will go fully into details.
In-Process execution
This first step of this work focuses on enabling sequential in-process execution of LLVM tools. This work is described in detail in RFC: In-process execution of LLVM tools [14].
A visible outcome of this work is the ability to run a sequence of build commands from the Clang driver:
> clang-cl file1.cpp file2.cpp file3.cpp -fuse-ld=lld
Or with a compilation database:
> clang-cl /compilation-database compile_commands.json
This sequential in-process execution could be used for example for Bazel persistent workers [1] or for faster Lit execution [17].
Multithreaded in-process execution
This second step extends the previous step to allow concurrent execution of tools within a single process. Traditional build systems like Ninja or Unreal Build Tool (UBT) could now see faster execution by delegating complete execution of build commands to a LLVM tool. This work will also make possible the daemon in step four.
In addition to the LLVM changes to enable this, the Clang driver will be able to execute jobs concurrently. Given no explicit dependencies defined, jobs could run concurrently in a thread pool using a new flag (-j) controlling the threads count:
> clang-cl file1.cpp file2.cpp file3.cpp -fuse-ld=lld -j4
Similarly,
> clang-cl /compilation-database compile_commands.json -jall
Internalizing execution like proposed here would greatly reduce OS friction on Windows (essentially the time spent in the kernel or OS libraries) and most likely on other systems as well. In our past llvm-buildozer prototype [7] we have been observing steady 99% CPU usage in user space while building a large game project; whereas regular compilation without llvm-buildozer oscillates today around 70-85% CPU usage in user space – or less, when building non-Jumbo targets such as LLVM or Chromium.
Most notably, this step involves removing some global state throughout LLVM and isolating it on the stack or on the heap instead. Only globals on the “golden path” of build execution are to be removed. Global state in smaller utility programs or libraries unaffected by the new concurrency model will not be modified. Coding guidelines will be changed, and mechanisms for avoiding global state in the future will be added in the test suite. As an example of this work, global state was already removed in LLD, see [8] and [9].
We can identify at least three classes of global state in LLVM: ManagedStatics, cl::opts and function-local statics. These are the vast majority of globals which would need to be adapted, to make most LLVM tools thread-safe for in-process invocations. Other global states exist in the CRT (C runtime libraries) and in other OS libraries. A notable example is the CWD (current directory) pointer which is stored per-process on Windows and indirectly affects many Win32 API calls.
We will also need to identify ManagedStatics (or global variables in general) that must remain process-global, which do not affect concurrent build execution. As an example of this, in llvm/Support/Parallel.h we might want to keep the parallel execution global across all tool invocations. Sharing a global ThreadPool would be useful to better use hardware resources – for example when multiple LLD tools are running in parallel. Cmake flags like LLVM_PARALLEL_LINK_JOBS will not be required anymore in this concurrent in-process mode, as long as all jobs are using LLVM tools.
Cached I/O
I/O is a major performance issue on Windows in general, and avoiding any kind of I/O would most likely favor all platforms. This third step would inctroduce a shared, in-memory cache, thread-safe VFS across all concurrent tool invocations. In essence, treating the file system view as immutable after a build starts allows for bypassing system I/O, by caching files during the LLVM tools execution. External file system changes would be supported, by reading the NTFS journal, or using inotify / fanotify on Linux. Recent sandboxing in [10] simplifies this work. While prior art exists in clang-scan-deps and in llvm-cas, the goal would be to extend and reuse these systems accross all LLVM tooling during a build.
On a similar topic, we can also take shortcuts between tool executions. Currently, when the Clang tool writes .OBJ files to disk, the LLD tool re-opens and reads them. For instance, in the example below, the .OBJ files contributing to the binary are needed immediately by the linker after compilation:
> clang-cl file1.cpp file2.cpp file3.cpp -fuse-ld=lld
The flow here is suboptimal because it necessitates .OBJ serialization and creates blocking I/O requests for both writing in Clang and subsequent reading in LLD. A better approach would be to pass the unserialized state directly in shared memory between tools, while a background thread concurrently writes the .OBJ files for later usage. In a way, this mirrors how Clang avoids outputting an intermediate .ASM file when producing .OBJ files. In our proposed concurrent build model, .OBJ files are not strictly required for the immediate link step above, only for subsequent builds.
Build daemon
The previous steps are focusing on efficiently executing a single build graph by invoking different tools (compiler, librarian, linker, etc.) in-process. This fourth step introduces a long-lived LLVM compilation service (similar to clangd) to tie these steps together. While this new service application could live separated from the main LLVM codebase, it may be beneficial for the LLVM project to drive its development.
Invoking the daemon could be as simple as a Clang driver flag:
> clang -s
This will start a background detached Clang process, which will wait for commands provided by the build system through IPC. The service would actually perform compilation or linking on-demand. Clients could be typical build systems such as Ninja or Unreal Build Tool (UBT), IDEs, or hotpatching applications like Live++ [13]. Along the way, we could look into progressively keeping compiler or linker state in memory to improve build iterations.
The daemon’s execution model could be either a pool of LLVM processes (such as llvm.exe from a llvm-driver build) or a pool of threads internally in the Clang daemon process itself. If favoring process isolation, note that processes are heavier on Windows, and context switching might be slower than on Linux. A pool of threads, conversely, would be more beneficial on Windows and could match Linux performance. Both modes could be implemented, allowing selection between absolute process isolation and build performance.
The daemon process could either remain in the background indefinitely, or it could be started with a default timeout, like sccache does. Furthermore, facilities would be provided for dealing with different toolchain versions across branches. For example, if a project branch A uses LLVM 22 and branch B uses LLVM 23, switching compilation between branches will re-launch the daemon – if it was previously started from another branch. We will assume a daemon instance is tied to specific branch(es), codebase, or C++ project.
Long-term ideas
The above foundational short-term steps – focused on internal execution efficiency – pave the way for a more ambitious, long-term strategy aimed at a complete paradigm shift in how LLVM interacts with modern, large-scale C++ codebases.
Our ultimate objective is to deliver a new development model where local build times are directly proportional to the incremental changes made, regardless of the overall codebase size (e.g., Chromium or Unreal Engine). This is a transition to an advanced toolchain featuring out-of-the-box features such as incremental compilation, runtime hotpatching, progressive optimization via live PGO, and on-demand debug information, among other things.
This vision focuses on four major, systemic areas of improvement that require deeper integration and coordination across the toolchain.
Caching and incremental compilation
Probably the most basic form of caching is .OBJ files. They avoid recompilation of previous build commands for which inputs were not modified.
Another form of front-end caching is .PCH files (or header units, or modules). The work in [15] is largely improving the situation for the LLVM project at least. However, for large projects, maintaining a good set of precompiled header files is not trivial. At Ubisoft, a custom ClangTooling pipeline was used to intelligently generate precompiled header sets, aggregating metrics such as AST node weight and build times across different platforms and compilers. This kind of smart PCH management is a crucial feature that should be considered in LLVM, possibly under different forms.
Memoization and more granular semantic caching offer significant opportunities for improvement in both the compiler front-end and back-end. Past prototypes like the program repo [4] or the zapcc LLVM fork [5] have shown interesting improvements in terms of build times. Other proposals around these topics might come later this year.
As far as network caching is concerned, traditionally we delegate this to external tooling like ccache, sccache, FASTBuild or others. Integrating caching as a first-class citizen into LLVM opens possibilities for both more granularity and more efficient asset sharing between users. While llvm-cas [11][12] largely paves the way ahead, several key aspects here are:
-
Generalizing or simplifying the cache key calculation for internal compiler computations.
-
Ensuring computations are deterministic everywhere.
-
Maintaining a live local catalog of already-performed remote computations (without necessarily transferring the assets themselves).
-
Ensuring the caching cycle overhead is less than the original computation time.
-
Developing facilities for testing, such as replacing a computation with its cached version at runtime and vice versa.
Caching and incrementality can take many different forms. Even with a remote network cache, the sheer size of today’s codebases can sometimes nullify the cache’s effectiveness. Reducing the amount of compiled data (GBs worth of compressed .OBJ files) to be transferred between hosts could be a great improvement. This reduction can be achieved in two primary ways:
-
Early on, by reducing the input source code to only reflect incremental changes.
-
After the build, by performing artifact diffing with domain knowledge after the .OBJ file is produced.
There is prior art such as ELFShaker [16] or MCCAS [18] as far as artifact diffing is concerned. However, these solutions need more natural integration into LLVM tooling to be simple to integrate into external network caching/distribution tooling.
The codebase as a whole
There are also opportunities for LLVM (tools, daemon?) to view and acknowledge that a modern C++ codebase is a complete unit with the following characteristics:
-
It evolves in (small) steps, over time.
-
It has branches.
-
It has many users, each syncing the same codebase across multiple machines.
-
It is stored in a VCS (version control system).
Acknowledging these realities allows us to move toward a more incremental, sharable, and distributed build approach. Traditionally, compiler toolchains like LLVM delegate these knowledge to external tooling (the build system). This creates missed opportunities, as the compiler treats every input file on every invocation as if it has never seen it before; and equally ignores that the same commands run across a fleet of machines on the same files.
While the original GCC build model has a certain purity in its referentially transparent compiler invocations, the practical complexity of C++ codebases has grown significantly over the past 20 years. A full non-distributed, non-cached Chromium rebuild now exceeds 1 hour 40 minutes on 32-core/64-threads machine. Many of its developers leverage distributed or network caching systems, but this infrastructure might not be universally available, particularly for those with unoptimal ISP connections. We must ask what build times are ultimately tolerable before taking action? Similarly, modern game engines such as Unreal Engine, are often difficult to build locally without high-end hardware or robust build distribution infrastructure.
We should aim for the ability to take small patches (.DIFFs) as an input, while maintaining prior compiler and linker state on disk or in memory. While current technologies like C++ modules, header units, PCH, and Clang precompiled preambles offer ways to achieve this, they are often complex to implement and maintain, placing an undue burden on toolchain users (who may not have the domain expertise or time to properly implement those solutions).
With that domain knowledge, LLVM could more efficiently distribute build actions across a fleet of machines when building the same codebase. This does not require bringing entire build systems, backend caching solutions, or distribution tools into LLVM, but rather providing helpers, APIs, and services to enable these external systems to work more efficiently.
Debug information on-demand
Generation of debug information is one of the reason for slower build times and one of the primary reasons for the size of the generated artifacts. Large-scale applications generate gigabytes (GBs) worth of debug information. As a motivating example, Chromium’s browser_tests.pdb is over 6.5 GB in size; most of it will never be consumed by users or any automated processes. Having a LLVM build daemon presents an opportunity to generate debug information on-the-fly to serve a debugger (such as LLDB or Visual Studio). A more involved daemon, with an entire view of the codebase and its evolution (across time and branches), could also act like a source server [6] or a debuginfod service, generating debug info only when needed or requested.
Live optimization and hotpatching
There are also opportunities in live optimization of code in running binaries. Because of the long edit-build-run iterations, the game industry typically uses hotpatching tools like Live++ [13] which can build any incremantal changes on-the-fly and dynamically patch a running process. Most codebase modifications – including changes to function bodies and class methods – can be made in the C++ source code and then hotpatched. This is tremendously improving the iteration times for developers and inherently allows for increased quality of games.
Despite the speed of hotpatching, realtime applications like games still need runtime performance at all times, even during development. Therefore, even during production, we can only afford optimized builds. For cases where more involved debugging is required, Live++ offers the ability to un-Jumbo-ify or deoptimize a .CPP file at runtime, by calling the compiler with appropriate flags; then hotpatch the corresponding TU functions in the target running process. This is typically something that could be improved with the help of a LLVM compilation daemon which would keep internal state in-memory between runs, drastically reducing overhead for such repetitive compilation tasks. This is also a great oportunity for incrementally optimizing runtime code, to avoid long build times upfront. Advanced features like live PGO could also be possible, where profile data is collected from a running process and injected back into the optimizer, in the daemon.
Conclusion
The vision I am articulating here is two-phased – from immediate, mechanical gains of In-Process Execution – to systemic transformation of the Long-Term Strategic Vision. I think it provides a actionable roadmap for tackling the challenges of build times in modern C++ development.
By implementing this short-term foundation, my aim is to establish the necessary platform for a toolchain that is deeply integrated, highly incremental, and optimized for the developer’s iteration loop. My ultimate goal is for us to move towards an experience where the frustration of minutes- or hour-long rebuilds are relics of the past.
I’m looking forward for discussions around these topics! With this RFC my goal is also to aggregate existing efforts in these ares in a common document / thread, and possibly come up with a roadmap.
Thank you for reading!
References
[2] 2019 LLVM Developers’ Meeting: A. Ganea “Optimizing builds on Windows”
[3] 2024 LLVM Dev Mtg - Manifesto for faster build times
[5] Zapcc
[7] llvm-buildozer
[8] RFC: Revisiting LLD-as-a-library design
[9] Removing global state from LLD
[10] [RFC] File system sandboxing in Clang/LLVM
[11] RFC: Add an LLVM CAS library and experiment with fine-grained caching for builds
[13] Live++
[14] RFC: In-process Sequential Execution of LLVM Tools
[15] [RFC] Use pre-compiled headers to speed up LLVM build by ~1.5-2x
[16] elfshaker stores binary objects efficiently
[17] RFC: Reducing process creation overhead in LLVM regression tests