1. Introduction and Motivation
LLVM and its tools, such as Clang and LLD, have increasingly adopted internal parallelism to speed up complex tasks. For example, Clang can use multiple threads for device offload compilation (--offload-jobs=N
), and LLD can parallelize Link-Time Optimization (--thinlto-jobs=N
). These features are controlled by LLVM’s thread pool implementation (StdThreadPool
) and the llvm::parallel
library, which typically scale based on the number of hardware cores available.
This creates a significant performance problem when LLVM tools are run by a parallel build system like make -jN
or ninja -jN
. The build system, unaware of the tool’s internal parallelism, dispatches N
independent LLVM processes. Each of these processes may in turn spawn M
of its own threads. This leads to a “thread explosion” of N * M
total threads, which can severely overload the system, increase CPU and memory contention, and ultimately make the entire build slower than a more constrained approach.
To add a concrete example of this pain point, in our own out-of-tree development, we use a --parallel-jobs=N
flag that suffers from this lack of coordination. In our Continuous Integration (CI) environment, we are constantly forced to make a difficult trade-off: setting N
to a high value risks overloading the system and causing build timeouts, while setting it to a low value leads to inefficient resource utilization and slower builds. This highlights the practical need for a robust coordination mechanism.
This RFC proposes a solution: to make LLVM’s parallelism primitives “cooperative” by integrating support for the GNU Make jobserver protocol.
2. Background: What is a Jobserver?
For context, it might be useful to provide a brief background on the jobserver protocol, as it’s a specialized feature of build systems.
When you run make -jN
, make
needs a way to ensure that no more than N
recipes are running at once. It also needs to solve a deeper problem: what if one of its child processes (e.g., a shell script or another tool) wants to run its own parallel sub-tasks? Without coordination, the system would once again become overloaded.
The jobserver is GNU Make’s solution to this. It’s a communication mechanism passed from a parent process (make
) to its children. In summary:
- A Pool of “Job Slots”: Before starting,
make
creates a pool ofN
job slots, or “tokens”. - Communication Channel: On Unix, this is typically a pipe.
make
writesN-1
single-character tokens into the pipe. On Windows, a named semaphore is used. The details are passed to child processes via theMAKEFLAGS
environment variable. - The Implicit Slot: Every child process is automatically granted one job slot just for being invoked. This is its “implicit” slot.
- Acquiring More Slots: If a child process wants to use more than one core for its own tasks, it must read additional tokens from the jobserver pipe. If the pipe is empty, the read will block until another process finishes and returns a token.
- Releasing Slots: When a child process completes a unit of parallel work, it must write its acquired token back to the pipe for others to use.
This simple but effective protocol ensures that the total number of active threads across the parent make
process and all its children never exceeds the user-specified limit N
. This protocol has become a de-facto standard for coordination, supported not only by GNU Make but also recently by Ninja. This broad adoption makes it a modern and widely applicable solution.
3. The Problem in LLVM
LLVM’s current parallelism, found in our StdThreadPool
implementation and the llvm::parallel
library (which provides parallelFor
, parallelSort
, etc.), is completely unaware of the jobserver. It queries the system for hardware concurrency and acts independently. This is the root cause of the thread explosion issue seen in key areas:
- Device Offloading: As my initial patch demonstrates, running
make -j16
on a project that uses--offload-jobs=4
is a recipe for system overload. - Link-Time Optimization (LTO): A parallel LTO link using
lld --thinlto-jobs=8
inside amake -j16
build exhibits the exact same problem.
This is a generic limitation of LLVM’s parallel support. It lacks a mechanism to coordinate with the larger build environment.
4. Proposed Solution
I propose to add a native, platform-agnostic jobserver client to LLVM and integrate it into our threading libraries. The implementation is already available for review in PR #145131: https://github.com/llvm/llvm-project/pull/145131
The high-level design is as follows:
-
New Library (
llvm/Support/Jobserver
): A new, lightweight library provides aJobserverClient
. It handles parsingMAKEFLAGS
and contains platform-specific backends for Unix (pipe/FIFO) and Windows (semaphores). -
New Concurrency Strategy: A new
jobserver_concurrency()
strategy is added tollvm/Support/Threading.h
. -
Integration with LLVM’s Parallelism Libraries: The new strategy is integrated into both of LLVM’s main parallel execution mechanisms:
StdThreadPool
: The thread pool’s worker loop is modified to acquire a job slot before executing a task.ThreadPoolExecutor
: The executor backing thellvm::parallel
library is also updated. This ensures that high-level parallel algorithms likeparallelFor
,parallelForEach
,parallelSort
, andparallelTransformReduce
will all respect the jobserver limit automatically when the strategy is enabled.
-
User-Facing Control: Tools like Clang and LLD can then expose an option (e.g.,
--offload-jobs=jobserver
or--thinlto-jobs=jobserver
) to enable this cooperative behavior. When specified, the number of threads will be governed by the jobserver instead ofhardware_concurrency()
.
5. Addressing Potential Concerns
Comments on the initial PR raised valid questions about feature scope.
-
Concern: “Is build job management the compiler’s job?”
This proposal is not about making LLVM a build system. It’s about making LLVM a “good citizen” that can coordinate with existing build systems. The jobserver protocol is the standard, established way for tools to do this. The alternative of LLVM’s parallel tools remaining ignorant of their environment is the direct cause of the performance issues we’re seeing. -
Concern: “Alternative: Decompose the work for the build system.”
One could imagine a system where Clang or LLD outputs a dependency graph (e.g., a JSON file) of its sub-tasks and lets the build system schedule them. While feasible for some scenarios, this would be a massive undertaking for tightly integrated tasks like LTO or the new offload driver. It would require significant changes to our tools and deep, complex integration with each build system. The jobserver approach, by contrast, is a widely supported and comparatively simple solution that directly solves the resource contention problem.
6. Feedback and Discussion
I believe this feature would be a valuable addition to LLVM, making our tools perform better and more predictably in standard parallel build environments. I’m opening this RFC to gather feedback on the idea and the proposed direction.
Some topics that might be useful to discuss include:
- The significance of the “thread explosion” problem in use cases like parallel LTO and offloading.
- The suitability of the jobserver protocol as a coordination mechanism for LLVM tools.
- The proposed introduction of a new
llvm/Support/Jobserver
library. - General thoughts on the proposed implementation and its integration with LLVM’s parallelism libraries.
Thank you for your time and feedback.