Meta-RFC: Long-term vision for improving build times

aganea · February 18, 2026, 11:21pm

Very good points, thanks for the feedback!

However, the proposal is very ambitious and lacks details (I know it is still meta, and I just suggest something I would like to see in future detailed RFCs). I am, in spirit, like to see RFCs for build performance.

Yeah this comes a lot in the comments. I can prepare RFCs for each of the short term steps at least. The in-process compilation has already its own RFC.

On the other hand, multi-threaded in process execution might involve completely replace cl::opt with something better or make it thread_local, which is a very big task.

cl::opts have specifically come up a lot in discussions over the past years. I will prepare an RFC just for the global state removal. Replacing cl::opts by something else seems way too much work for my taste. The approach I’ve taken here was to force usage of cl::location at all times and redirect the storage to a single thread-local buffer (for all cl::opts in the process). It could be something along those lines, but using a tool-local buffer instead, and use a TLS context to point to that buffer. When a tool starts, it would set its cl::opt buffer in the TLS, when it ends, it would clear the TLS. This mechanism doesn’t require any change to the existing cl::opts, except the ones that are already using cl::location.

Secondly, we need to write down the exact methodology to achieve each task so we know how feasible it is and how interruptive it is.

Agreed.

We also might need some initial experiment to collect some data to understand the potential benefits.

Absolutely.

Here is a counter example for your in-process compilation model: Some experiments were run on macOS (where launch time is not as big a problem comparing to windows) for in-process compilation years ago and we found the saving from no process launch is completely overshadowed by the cost of freeing memory (cannot use -disable-free for in process compilation). This is not a prove that in-process model doesn’t work, just mean that we need to squeeze more saving from elsewhere, which also need data to back that up.

Yes I remember that you folks put back CLANG_SPAWN_CC1=ON for MacOS when I introduced -fintegrated-cc1. In that regard, the prototype here uses multithreaded in-process compilation and is actually disabling -disable-free. Meanning that heap cleanup occurs between each tool invocation. However the heap memory pages remain committed (at least when using rpmalloc). I found that a lot of the churn that happens on shutdown is because of the VAD tree cleanup by the OS, reclaiming back the physical memory pages. The more you call short-lived processes with a lot of page allocations (like a compiler), the worse it gets. The OS has to clear the pages before handing them back to another process, and at some point, the zero queue is filling up and becomes a blocker. When building LLVM or Chromium this shows a lot. In some extreme cases, like linking Chromium’s browser_tests.exe, shutting down the LLD process (after the CRT finished its cleanup) can take between 5-7 seconds. However when the pages remain mapped in the process, there’s no hit between tool invocations (but this also greatly depends on what your CRT allocator does upon free() on MacOS).

cachemeifyoucan · February 18, 2026, 11:33pm

It is great to see you have solutions in mind for many of the problems. Looking forward to read the detailed follow up RFC. For the meta RFC, it might be good to come up with a timeline (at least in what order and if you are depending on some other changes) so we know what to expect.

Yes I remember that you folks put back CLANG_SPAWN_CC1=ON for MacOS when I introduced -fintegrated-cc1.

That is probably unrelated to daemonize things. I think it is about crash reporting if I remembered correctly.

However the heap memory pages remain committed (at least when using rpmalloc).

Like I said, I don’t think it is blocker for what you propose, and it is highly depending on the OS and malloc library, and there are other things to we can do to mitigate the impact (like using more bumpPtrAllocator). That is why I would like to see some experiments with data.

jmorse · February 19, 2026, 12:48pm

Hi folks,

(I share a corporate overlord with Alexandre; although in a different tentacle of the business),

I think this is a future vision of a world I want to live in; it fits a theme of large-scale integration to achieve efficiency. The topic of Windows is coming up a lot – our (the Sony bunch) customers use Windows, we care about it a lot and use it a lot, hence the in-process-testing work [0]. It’s certainly something we’d put effort into maintaining + monitoring. For build system integration, it isn’t something I have a lot of familiarity with, but experimenting with closer integration and seeing what performance results come of that seems feasible to plan out and try.

The only thing that truly makes me nervous is ensuring the steady-state of a long running LLVM daemon doesn’t subtly change behaviour over time: the cleanliness of single processes that terminate is very attractive from a sanity/safety point of view. I think there’s a direct trade-off between compile-time performance and complexity here; but it’s a design space we can explore and measure.

We’re becoming much more interested in compile times – there are various paths to be taken to reduce work, further process integration is one of them. With past learnings from the program-repo project Alexandre mentions, we feel work-reduction from the frontend is another important topic (most game-projects are frontend-dominated). I’d like to chime in with @aengelke that

In my opinion, if we care about C++ compile times, we should work on (a) getting C++ modules into a widely usable state in which they provide substantial improvements and (b) a faster, performance-focused, and well-engineered C++ front-end (preferably without an expensive AST). IME, the front-end dominates compile times for larger code bases and a faster front-end could get improvements >2x for everyone.

This is our experience too. I feel the AST is incredibly powerful, and that’s a double-edged sword because of the corresponding compile times. We’re considering shortcuts we could take, but another full frontend seems infeasible. Incremental compilation would be ideal (but hard); I understand C++ Modules can lead to serious work reduction, but there are few case studies demonstrating clear benefits.

We’d certainly chip-in to help prototype and evaluate the ideas that come out of this meta RFC.

[0] https://discourse.llvm.org/t/rfc-reducing-process-creation-overhead-in-llvm-regression-tests/88612

R-Goc · February 19, 2026, 6:16pm

I think the issue of memory cleanup after tool call termination can be solved easily enough. As mentioned previously this is os and malloc library dependent. There are existing solutions both at the OS and malloc library level. At the OS level Windows offers Heapapi.h for easily freeable heaps. At the library level, which is what I would suggest, heaps are a first class primitive in mimalloc [0]. If each of the tool calls only allocated on one of the heaps the cleanup would be very cheap as it happens in one go. This would of course mean replacing rpmalloc, and as such would need benchmarking performance of the different mimalloc versions for llvm.

[0] mi-malloc: Heaps

aganea · February 19, 2026, 6:27pm

I think the issue of memory cleanup after tool call termination can be solved easily enough. As mentioned previously this is os and malloc library dependent. There are existing solutions both at the OS and malloc library level. At the OS level Windows offers Heapapi.h for easily freeable heaps. At the library level, which is what I would suggest, heaps are a first class primitive in mimalloc [0]. If each of the tool calls only allocated on one of the heaps the cleanup would be very cheap as it happens in one go. This would of course mean replacing rpmalloc, and as such would need benchmarking performance of the different mimalloc versions for llvm.

Yes having separate heaps is something that I’ve suggested in the other RFC: [RFC] In-process execution of LLVM tools - #10 by R-Goc

I don’t think going back to Windows Heaps is a good idea, until Microsoft comes up with a more performant allocator. We can also create custom heaps in rpmalloc. mimalloc is good solution, on par with rpmalloc. See below some metrics from a few years ago:

github.com/mjansson/rpmalloc

Reduce peak commit on memory-alloc intensive apps

opened 01:12AM - 04 Feb 20 UTC

closed 07:41AM - 27 Aug 20 UTC

aganea

performance

I've recently integrated rpmalloc and mimalloc into LLVM, please see thread: htt…ps://reviews.llvm.org/D71786 I discovered along the way that rpmalloc takes more memory than mimalloc when linking with LLD & ThinLTO. For example: ``` | Working Set (B) | Private Working Set (B) | Commit (B) | Virtual Size (B) rpmalloc - 36-threads | 25.1 GB | 16.5 GB | 19.9 GB | 37.4 GB mimalloc - 36-threads | 25.6 GB | 16.3 GB | 18.3 GB | 33.3 GB rpmalloc - 72-threads | 33.6 GB | 25.1 GB | 28.5 GB | 46 GB mimalloc - 72-threads | 30.5 GB | 21.2 GB | 23.4 GB | 38.4 GB ``` There's a difference in terms of execution time, in favor of mimalloc. It seems the difference is proportional to the difference of the commit size between the two. ![bla](https://reviews.llvm.org/file/data/hbqjcaof7yi5wpr56lfj/PHID-FILE-uwk4fio6j3bkkqbeagku/mimalloc_rpmalloc_compare_clang_thinlto.png) To repro (windows bash, but you could probably repro all this on Linux as well), ``` $ git clone https://github.com/llvm/llvm-project.git # Download patch from https://reviews.llvm.org/D71786 $ git apply D71786.txt # ROOT is where LLVM was checked out by git clone above, modify accordingly $ set ROOT=d:/llvm-project $ set LLVM=c:/Program Files/LLVM # Ensure cmake, python 3.7, gnuWin32, git, ninja build and a LLVM package (llvm.org) are installed first. $ cd %ROOT% $ mkdir stage1 $ cd stage1 # Feel free to fiddle the following flags according to your hardware config $ set OPT_AVX=/GS- /D_ITERATOR_DEBUG_LEVEL=0 /arch:AVX $ set OPT_SKYLAKE=/GS- /D_ITERATOR_DEBUG_LEVEL=0 -Xclang -O3 -Xclang -fwhole-program-vtables -fstrict-aliasing -march=skylake-avx512 $ cmake -GNinja %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=ON -DLLVM_ENABLE_ASSERTIONS=ON -DLLVM_ENABLE_LIBXML2=OFF -DCMAKE_C_COMPILER="%LLVM%/bin/clang-cl.EXE" -DCMAKE_CXX_COMPILER="%LLVM%/bin/clang-cl.EXE" -DCMAKE_LINKER="%LLVM%/bin/lld-link.EXE" -DLLVM_ENABLE_PROJECTS="llvm;clang;lld" -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_LLD=ON -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_CXX_FLAGS="%OPT_AVX%" -DCMAKE_C_FLAGS="%OPT_AVX%" $ ninja check-all # This should yield no errors, or if it does, they were there before on trunk # Now build the stage2: $ cd %ROOT% $ mkdir stage2 $ cd stage2 $ set LLVM_LOCAL=%ROOT%/stage1 $ cmake -G"Ninja" %ROOT%/llvm -DCMAKE_BUILD_TYPE=Release -DLLVM_OPTIMIZED_TABLEGEN=true -DLLVM_ENABLE_LIBXML2=OFF -DLLVM_USE_CRT_RELEASE=MT -DCMAKE_C_COMPILER="%LLVM_LOCAL%/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="%LLVM_LOCAL%/bin/clang-cl.exe" -DCMAKE_LINKER="%LLVM_LOCAL%/bin/lld-link.exe" -DLLVM_ENABLE_LLD=ON -DLLVM_ENABLE_PDB=ON -DLLVM_ENABLE_PROJECTS="llvm;clang;lld" -DCMAKE_CXX_FLAGS="%OPT_SKYLAKE%" -DCMAKE_C_FLAGS="%OPT_SKYLAKE%" -DLLVM_ENABLE_LTO=THIN -DCLANG_TABLEGEN="%LLVM_LOCAL%/bin/clang-tblgen.exe" -DLLVM_TABLEGEN="%LLVM_LOCAL%/bin/llvm-tblgen.exe" $ ninja check-all # This should take a lot longer, because we're now building the LLVM .exes with ThinLTO. # Ensure you've got at least 150 GB free on the SSD. The ThinLTO cache takes a lot of space. # Prepare for the test (pwd is still in the stage2 folder) $ rm bin\clang.exe $ ninja clang -v # This will print the cmd-line to use to link clang. Copy-paste it in a file stage2\link.rsp. # While the above ninja cmd-line links, duplicate stage2\CMakeFiles\clang.rsp to another file, say clang2.rsp. This is a temp file which is deleted once linking ends. # Reference clang2.rsp instead of clang.rsp from stage2\link.rsp # Ensure you remove the LTO cache flag from link.rsp $ bin\lld-link @link.rsp /time # This is your final test, which will use the stage2 lld-link.exe to link the stage2 clang.exe. # Try it once to see the time it takes. You would probably want to re-run it with rpmalloc's stats enabled. ``` To compare with mimalloc, you'd need to compile first mimalloc as a static lib (disable /GL). You can reference it then in place of rpmalloc, by using the following patch (simply revert this file from the previous patch, before applying): ``` diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt index 26332d4f539..77c7645592c 100644 --- a/llvm/lib/Support/CMakeLists.txt +++ b/llvm/lib/Support/CMakeLists.txt @@ -51,6 +51,31 @@ else() set(Z3_LINK_FILES "") endif() +# if(LLVM_ENABLE_RPMALLOC) +# set(RPMALLOC_FILES rpmalloc/rpmalloc.c) +# else() +# set(RPMALLOC_FILES "") +# endif() +set(ALLOC_BENCH_PATH "D:/git/rpmalloc-benchmark/benchmark/") + +# mimalloc +set(ALLOCATOR_FILES "${ALLOC_BENCH_PATH}mimalloc/benchmark.c") +set(ALLOCATOR_INCLUDES "${ALLOC_BENCH_PATH}mimalloc/include/" "${ALLOC_BENCH_PATH}") +set(system_libs ${system_libs} "D:/git/mimalloc/out/msvc-x64/Release/mimalloc-static.lib" "-INCLUDE:malloc") + +# rpmalloc +# set(ALLOCATOR_FILES "${ALLOC_BENCH_PATH}rpmalloc/benchmark.c" "${ALLOC_BENCH_PATH}rpmalloc/rpmalloc.c") +# set(ALLOCATOR_INCLUDES "${ALLOC_BENCH_PATH}rpmalloc/" "${ALLOC_BENCH_PATH}") + +# tcmalloc +# set(ALLOCATOR_FILES "${ALLOC_BENCH_PATH}gperftools/benchmark.c") +# set(ALLOCATOR_INCLUDES "${ALLOC_BENCH_PATH}gperftools/" "${ALLOC_BENCH_PATH}") +# set(system_libs ${system_libs} "D:/git/rpmalloc-benchmark/benchmark/gperftools/x64/Release-Override/libtcmalloc_minimal.lib" "-INCLUDE:malloc") + +# ptmalloc3 +# set(ALLOCATOR_FILES "${ALLOC_BENCH_PATH}ptmalloc3/benchmark.c" "${ALLOC_BENCH_PATH}ptmalloc3/malloc.c" "${ALLOC_BENCH_PATH}ptmalloc3/ptmalloc3.c") +# set(ALLOCATOR_INCLUDES "${ALLOC_BENCH_PATH}ptmalloc3/" "${ALLOC_BENCH_PATH}" "${ALLOC_BENCH_PATH}ptmalloc3/sysdeps/windows") + add_llvm_component_library(LLVMSupport AArch64TargetParser.cpp ABIBreak.cpp @@ -163,6 +188,8 @@ add_llvm_component_library(LLVMSupport xxhash.cpp Z3Solver.cpp + ${ALLOCATOR_FILES} + # System Atomic.cpp DynamicLibrary.cpp @@ -197,3 +224,8 @@ if(LLVM_WITH_Z3) ${Z3_INCLUDE_DIR} ) endif() + + target_include_directories(LLVMSupport SYSTEM + PRIVATE + ${ALLOCATOR_INCLUDES} + ) \ No newline at end of file ``` You don't need to rebuild stage1, only stage2. You don't need to call cmake again, you can simply call `ninja all -C stage2` after applying the mimalloc modification above. You can then switch between rpmalloc and mimalloc by commenting-out the relevant sections in this file, and re-running ninja. At this point, you should see a difference in terms of peak Committed memory. I'm using UIforETW (https://github.com/google/UIforETW) to take profiles on Windows. You can probably repro this on Linux as well, and maybe linking a smaller program instead of clang.exe if you want faster iteration. Please don't hesitate to poke me by email if any of these doesn't work or if you're stuck.

rnk · February 19, 2026, 10:34pm

Thanks Alexandre! I hope I can help make some of what’s proposed here happen.

I want to emphasize that, if you dig just a little bit into Linux loader implementation details, you’ll realize that Linux process startup is slow too. Linux process startup overhead appears to be a significant bottleneck on Windows. @boomanaiden154-1 migrated our lit tests to the internal shell to avoid one bash invocation per test and pickup a ~10% runtime savings.

These are good goals, and we should 100% do this. I filed an issue in the tracker for this migration. Many LLVM downstreams, especially graphics drivers, provide LLVM-as-a-library and attempt to offer multi-threaded compilation, but they run into races on global state.

I actually think object file serialization is probably still valuable because the unserialized object file representation (MC) is bloated, and not dense. Serializing to a flat object file representation is valuable, since you can free and reuse all that assembler heap memory. I think an interesting direction here would be to add some kind of flat-per-function content hashing layer, since IMO, that’s what the linker needs to be redesigned around, and that’s what’s going on in the CAS workstream.

I feel like having the process pool model is valuable, since LLVM contains many fatal error paths. The process pool model can’t share as much cached filesystem state in memory, but it makes error recovery and cache flushing more reliable, since you can just restart the process after any action failure or when it exceeds some memory usage quota.

We never implemented this, but back when I was working with Chrome, I was advocating strongly for distributed ThinLTO, which is finally being implemented today, mostly because I wanted process isolation. Mostly I wanted to reduce the support cost of LTO for us. I didn’t want to get bug reports of the form “the linker crashed non-deterministically after 20 minutes of compilation”, I wanted to get bug reports like “this backend compilation action fails deterministically on this bitcode”. Having the ability to re-execute failed build actions in a clean process is very helpful.

I agree strongly with this. I’m a little bit leery of developing an entire build system inside of LLVM, but I think the entire C/C++ developer community has been held back by our inability to make changes, even incremental ones, that cut across traditional build system boundaries. If LLVM had a build action subgraph executor, that would open up a lot of possibilities.

rnk · February 19, 2026, 11:12pm

I think CodeView is actually pretty well-suited for this purpose. If you dig into the global type hashing implementation, it’s all just content hashing all the way down.

efriedma-quic · February 20, 2026, 12:02am

As someone who needs to debug compiler toolchains when they inevitably break, determinism and serialization are absolutely critical to developing LLVM.

If we can avoid overhead by passing files between tools without waiting for them to be written to disk, that’s okay, but we need to be very careful to ensure that this is just a performance shortcut which doesn’t affect the semantics of the tools. I’m very afraid of a system where we have some sort of in-memory cached representation of a program, with no corresponding on-disk representation. When that database is corrupted, or has a race condition, it’ll be impossible to debug.

Along the same lines, incremental/live compilation systems are hard to debug. If you’re careful, you can log the user’s actions and store intermediate outputs, but if the system silently behaves differently based on which intermediate files are present, it’ll be impossible to debug.

You don’t need a daemon for this. You just need to encode instructions to invoke the compiler to generate a dwo or equivalent. Actually, I’m not sure what the daemon is even doing here, except act as a middleman to invoke the compiler.

compnerd · February 20, 2026, 12:55am

Yeah, CodeView itself should fit well into it, but I’ve not looked into MCCAS’ handling of the debug info so it is difficult to tell what dragons lie there.

aganea · February 20, 2026, 2:43pm

Hello @efriedma-quic,

As someone who needs to debug compiler toolchains when they inevitably break, determinism and serialization are absolutely critical to developing LLVM.

If we can avoid overhead by passing files between tools without waiting for them to be written to disk, that’s okay, but we need to be very careful to ensure that this is just a performance shortcut which doesn’t affect the semantics of the tools. I’m very afraid of a system where we have some sort of in-memory cached representation of a program, with no corresponding on-disk representation. When that database is corrupted, or has a race condition, it’ll be impossible to debug.

Along the same lines, incremental/live compilation systems are hard to debug. If you’re careful, you can log the user’s actions and store intermediate outputs, but if the system silently behaves differently based on which intermediate files are present, it’ll be impossible to debug.

I agree. @jmorse has the same concerns above. My feeling is that we shouldn’t retain raw compiler state in-memory, but a light serialized format. This goes in hand with a CAS and calculation of hash keys for a “computation” which generates an outcome (the serialized state). We’re talking granular state here, intermediate artifacts such as token streams, AST fragments, types, IR, machine code, sections, debug info, etc. The goal is to also persist this state in a index+CAS when the LLVM daemon is not running. Additionnally, we will store along a history of the build actions that were triggered / sent to the daemon during its lifetime.

On one hand, all that would give us a key→value mapping for computations, which can be triggered again to assert their validity / determinism. For example, building a project from scratch twice shouldn’t generate new assets in the CAS. On the other hand, we could achieve reproducibility by knowing the damon’s initial state on startup (the root index hash of a Merkle tree, as stored in the CAS) + the history of actions that were executed.

I think the same ideas apply for multi-threading, if we share raw state accross threads, we’re at risk of non-deterministic behavior.

I wasn’t thinking of monolithically generating debug info for a TU. More along the lines of only generating a subset of debug information on the fly. This ties back a bit to the previous paragraph, where we need need to get back quickly in a state where we can generate the debug info from existing generated code.

An option along those lines is for the daemon to act as a DAP debugger server. Another aspect of this is that ultimately I am envisionning the daemon to act as a VM for C++, where it can dynamically build a process in-memory and modify it on the fly – ie. democratize even more the usage of LLVM ORC, ex. Julia or Cling. Right now we’re resorting to external tooling like Live++ because there’s no other way for achieving this, but ideally I’d prefer having a more integrated solution.

aganea · February 20, 2026, 3:03pm

Hello @rnk,

I feel like having the process pool model is valuable, since LLVM contains many fatal error paths. The process pool model can’t share as much cached filesystem state in memory, but it makes error recovery and cache flushing more reliable, since you can just restart the process after any action failure or when it exceeds some memory usage quota.

Yeah that sounds like a good compromise. I’d like to have both ideally at least for testing purposes. From the OS scheduler’s perspective, having to manage a pool of processes is more expensive than a single process with a pool of threads. With the actual build model, just the context switching alone was significant enough to show up in profiles on Windows. However if we keep a pool of llvm.exe processes alive, and schedule actions on them without shutting them down, that should pin them to a specific core, so context switching shouldn’t be as bad.

I’m a little bit leery of developing an entire build system inside of LLVM

I think only a minimal amount of work needs to be done here. I am also against bringing of all the target management and build scripts inside LLVM. However, managing the dependency graph of actions makes sense since LLVM has deeper domain knowledge and of the implications.

dblaikie · February 24, 2026, 8:02pm

(aside: If you want to rerun the compiler to generate debug info on the fly, do realize that LLVM isn’t robust against debuginfo-affects-codegen, so you have to produce debug info LLVM IR the first time and /maybe/ the backend can be made (probably isn’t already) robust against debuginfo/emission/-affects-codegen issues - if you want to be able to skip all the debug info handling/merging/etc during the middle end, then substantial work would need to be done to ensure debug info doesn’t affect LLVM’s optimizations, etc, in any way (it’d be good, valuable work - it’d remove substantial sources of heisenbugs, etc, - but it so far hasn’t been important enough for anyone to prioritize the kind of quality level in this area that would be needed to build incremental debug info on top of))

aganea · March 3, 2026, 5:26pm

Hello @dblaikie,

(aside: If you want to rerun the compiler to generate debug info on the fly, do realize that LLVM isn’t robust against debuginfo-affects-codegen, so you have to produce debug info LLVM IR the first time and /maybe/ the backend can be made (probably isn’t already) robust against debuginfo/emission/-affects-codegen issues - if you want to be able to skip all the debug info handling/merging/etc during the middle end, then substantial work would need to be done to ensure debug info doesn’t affect LLVM’s optimizations, etc, in any way (it’d be good, valuable work - it’d remove substantial sources of heisenbugs, etc, - but it so far hasn’t been important enough for anyone to prioritize the kind of quality level in this area that would be needed to build incremental debug info on top of))

I guess what I am asking in-between the lines with this RFC, is this effort worth investing into, from this community’s standpoint? As in, transitionning to a incremental compiler / toolchain? I am ready to invest my time into the first part, but the second part requires a more involved community support, which I suppose should also come with a strong corporate buy-in. Or maybe I am too optimistic and we should just scratch the long-term vision, and let’s just talk about the first part until we have a functionnal daemon?

dblaikie · March 3, 2026, 6:52pm

To that broader question, here’s my rather pessimistic answer:

It’s pretty hard (read: assume impossible) to get investment from others on a project this experimental. It’s unlikely to align sufficiently with corporate goals that they’d be willing to invest significant resources into it if it’s not already the thing they’re working on (which it isn’t, or we’d see that in the communitty) or at least if there’s already some other major investment (see, for instance, Apple’s build cache thing - even that, with a strong corporate backing, wasn’t quite interesting enough for Google to engage with (some amount of how well/poorly things align with the current goals of the company/teams, etc) but merely to cheer on, with some reservations, from the sidelines).

So, essentially - assume you’ll be doing this alone/with whoever you’re already working with.

And that, I think, is then where some of the concerns in this thread come from - the concern that there’s insufficient backing for a project of this magnitude, that it’ll place a substantial maintenance burden on the project and the users won’t pan out, or the investment won’t be enough to complete it, etc.

So, generally the best way to approach something like this is, as with most of the LLVM project, with small increments that are self-supporting in their value/return-on-complexity, and hopefully appeal to existing users/contributors/use cases without major rework. (that said, I also really get a bit frustrated with solutions that are overly isolated/drop-in - Apple’s implicit modules, then implicit ThinLTO are examples - they work great to drop into a system you can’t change, but don’t scale well (no way to distribute them, for instance - though, admittedly, Google and Apple still got a lot out of sharing the common parts, even if in both cases Apple went with implicit systems that were easy to roll out to users and Google went with explicit systems that were easier to integrate into a very rigid distributed build system))

thakis · March 18, 2026, 6:15pm

This is motivated by the significant process creation and I/O overhead on Windows

Do you have numbers do back up the process creation overhead?

I remember that when we made clang no longer spawn a cc1 subprocess, we also thought that’d help a ton (⚙ D69825 [Clang][Driver] Re-use the calling process instead of creating a new process for the cc1 invocation), but the actual measured wins after that went in were pretty small. (Low one-digit percentage, if I remember correctly?) (It was IMHO still a great change for other reasons too, but it was also much much smaller in scope…)

Topic		Replies	Views
[RFC] In-process execution of LLVM tools LLVM Project	11	588	February 19, 2026
RFC: Upcoming Build System Changes LLVM Dev List Archives	113	993	November 3, 2011
RFC: Add an LLVM CAS library and experiment with fine-grained caching for builds LLVM Project	50	10750	February 24, 2025
If LLVM is so slow is anything being done about it? LLVM Project	67	10137	February 27, 2024
llvm and clang are getting slower LLVM Dev List Archives	37	460	April 1, 2016

Meta-RFC: Long-term vision for improving build times

Related topics