[RFC] Modules Build Daemon: Build System Agnostic Support for Explicitly Built Modules

Abstract

Modules have the potential to significantly improve compile-time performance, as they eliminate the repetitive processing of header files that occur during textual inclusion builds. By processing a module once and reusing it across all associated translation units, modules offer a more efficient approach to managing dependencies. Clang currently supports two methods for building modules: implicit and explicit. With the explicit method the build system has complete knowledge of a translation unit’s dependency graph before building begins while with the implicit method the build system discovers dependencies as modules are built. While the explicit method boosts speed, they demand considerable development effort from build systems. The implicit method, on the other hand, integrate seamlessly with existing workflows but is inefficient. “Modules Build Daemon: Build System Agnostic Support for Explicitly Built Modules” aims to balance these two approaches, enabling developers to reap the benefits of explicit modules irrespective of their build system. This project aims to implement a daemon that serves as a module manager. By simply incorporating a single command line flag, each Clang invocation registers its translation unit with the daemon, which then scans the unit’s dependencies. As translation units are registered and analyzed, the daemon constructs a dependency graph for the entire project. Concurrently, it uses the emerging graph to schedule and compile each module. This approach allows for a single entity to effectively coordinate the build of modules.

Scope

For the purpose of Google Summer of Code, development will be focused on providing support for Unix-like systems. I would like to keep Windows and other operating systems in mind so that nothing needs to be re-architected when support for other operating systems are implemented down the road.

Clang Driver

Option Parsing

The clang driver will recognize -fmodule-build-daemon as a valid command line option

# example
$ clang++ -fmodule-build-daemon foo.cpp -o foo
// llvm-project/clang/include/clang/Driver/Options.td
def fmodule-build-daemon : Flag<["-"], "fmodule-build-daemon">, 
	Group<f_Group>, Flags<[NoXarchOption]>, 
	HelpText<"Enable module build daemon functionality">;

Tool Specific Argument Translation

When -fmodule-build-daemon is passed to the clang driver, the driver will check to see if a module build daemon is already running. If so, the driver will only launch clang with -cc1.

# module build daemon already running
$ clang++ -### -fmodule-build-daemon foo.cpp -o foo

"clang-17" "-cc1" "-fmodule-build-daemon" "-o" "/tmp/foo-73584c.o" "-x" "c++" "foo.cpp"

If the daemon is not running, the driver will launch clang with -cc1modbuildd to spawn the module build daemon then launch clang with -cc1.

# module build daemon not already running
$ clang++ -### -fmodule-build-daemon foo.cpp -o foo

"clang-17" "-cc1modbuildd"
"clang-17" "-cc1" "-fmodule-build-daemon" "-o" "/tmp/foo-73584c.o" "-x" "c++" "foo.cpp"

Integration

If the clang binary is run with the flag -cc1modbuildd then cc1modbuildd_main() will be called instead of cc1_main(). By creating a separate entry point for the module build daemon, the daemon specific behavior can be encapsulated preventing the the compiler from turning into a build system.

// llvm-project/clang/tools/driver/driver.cpp

int clang_main(int Argc, char **Argv, const llvm::ToolContext &ToolContext) {
    if (Args.size() >= 2 && StringRef(Args[1]).startswith("-cc1"))
        return ExecuteCC1Tool();
}

static int ExecuteCC1Tool(SmallVectorImpl<const char *> &ArgV, 
                          const llvm::ToolContext &ToolContext) {
	
    StringRef Tool = ArgV[1];

    if (Tool == "-cc1")
        return cc1_main(ArrayRef(ArgV).slice(1), 
                        ArgV[0], 
                        GetExecutablePathVP);

    if (Tool == "-cc1modbuildd")
        return cc1modbuildd_main(ArrayRef(ArgV).slice(1), 
                                 ArgV[0], 
                                 GetExecutablePathVP);
}

Daemon

The daemon will use Unix sockets as its form of IPC. The Windows 10 April 2018 update included support for Unix sockets, making this IPC portable.

Overview

Requirements

  • The number of active threads managed by the daemon should be equal to the number of registered clang invocations to comply with the -j limit
  • An activated thread will first complete a dependency scan of the registered clang invocation and then begin building dependencies
  • The dependencies built by an active thread do not necessarily have to be required by the translation unit that activated the thread
// pseudo-code of cc1modbuildd_main.cpp

void scanDependencies(Client client, 
                      ThreadSafeGraph<Dependency>& depsGraph) {
    // code for scanning dependencies
    // dependencies are put into depsGraph.
}

void buildDependencies(Client client, 
                       ThreadSafeGraph<Dependency>& depsGraph) {
    // code for building dependencies
    // dependencies are fetched from depsGraph
    // runs until a client disconnects
    //     - when a thread builds the last dependency for a TU, the clang 
    //       invocation will disconnect, and the daemon will tell whichever 
    //       thread completed the build to shutdown 
}

void handleConnection(Client client, 
                      ThreadSafeGraph<Dependency>& depsGraph, 
                      llvm::ThreadPool& Pool) {
    scanDependencies(client, depsGraph);
    Pool.async(buildDependencies, depsGraph);
}

void BuildServer::listen() {

    // shared dependency graph
    ThreadSafeGraph<Dependency> depsGraph;
    llvm::ThreadPool Pool;
    
    while (true) {
        Client client = listenForClangInvocation();
    
        if (client != NULL) {
            // If a new client has connected, allocate a 
            // thread for handling the client.
            Pool.async(handleConnection, client, depsGraph, Pool);
        }
    }
}

int cc1modbuildd_main() {

    BuildServer.start();
    BuildServer.listen();
}

Cache Validation

The daemon needs a way to check if source files have changed since the last time they were built. There are two common approaches: compare timestamps or compare hashes. Timestamps can be unreliable, especially with remote or virtual file systems, so the build system will use the hash of each file to check for changes. Once a module is built a new entry will be added to the cache_map file. If the daemon detects that the hash of the source file has changed then it knows to rebuild the module.

# cache_map

hash                                      file

7dd3a27d375652b36ef2a9e2d92a4c6f2e8845ec  DependencyScanningFilesystem.cppm
b1a43877e38980b3b73b1e39e2badf81a8157c72  DependencyScanningService.cppm

Build Session

By managing build sessions, cache validation becomes more reliable and efficient. At the beginning of a build session, as clang invocations request compiled modules, the daemon will validate the cache by comparing hashes. Once the daemon validates a source’s cache, the daemon only needs to check that any subsequent clang invocations come from the same build session to reuse the cache. Luckily clang provides support for defining a build session.

-fbuild-session-file=<file>
Use the last modification time of <file> as the build session timestamp

-fbuild-session-timestamp=<time since Epoch in seconds>
Time when the current build session started

The build session ID will be stored alongside hash information and file names in the cache_map file.

# cache_map

build_ID   hash                                      file

1685846174 7dd3a27d375652b36ef2a9e2d92a4c6f2e8845ec  DependencyScanningFilesystem.cppm
1685846174 b1a43877e38980b3b73b1e39e2badf81a8157c72  DependencyScanningService.cppm

Cache Management

The cache will consist of compiled dependencies and the dependency graph from previous builds.

The build daemon will cache precompiled modules as Clang AST files. These Clang AST files encode the AST and associated data structures in a compressed bitstream format. Initially, the daemon will store cached dependencies exclusively on disk. Most modern systems can hold all of the built dependencies on disk. However, the build module will incorporate a cache management mechanism to ensure support for systems with limited resources. The daemon will support the same two flags provided by clang to clean the cache.

# Specify the interval (in seconds) after which a module file will be considered unused
-fmodules-prune-after=<seconds>

# Specify the interval (in seconds) between attempts to prune the module cache
-fmodules-prune-interval=<seconds>

Scanning

If no cached dependency graph exists, the daemon will construct one for each translation unit as they register with the daemon. If a cached dependency graph exits, the daemon must validate it with every new build session. To validate the dependency graph, the daemon will check the consistency of each file in the translation unit’s dependency graph, including the translation unit itself. If no files have changed and the context hash matches the previous build session, the daemon does not have to re-scan the translation unit. If either the translation unit, any of its dependencies, or the context hash has changed since the last build session, the daemon will re-scan the translation unit.

The daemon will use the DependencyScanning utilities provided under llvm-project/clang/lib/Tooling originally developed for clang-scan-deps to complete the scan.

// llvm-project/clang/tools/driver/cc1modbuildd_main.cpp

scanDependencies(Client client, ThreadSafeGraph<Dependency>& depsGraph()) {

    TranslationUnitDeps TUDeps;
    TUDeps = DependencyScanningTool::getTranslationUnitDependencies();    
    handleTranslationUnitResults();
}

handleTranslationUnitResults() will merge TranslationUnitDeps into ThreadSafeGraph<Dependency>.

Scheduling

There are two potential scheduling strategies:

  1. The daemon will schedule dependencies based on a topological sort, prioritizing modules required by more translation units. If two dependencies are of the same priority in the topological sort, but one dependency is required by two translation units while the other is required by five translation units, the daemon will schedule the dependency required by five translation units to be built first.

  2. The daemon will schedule dependencies based on a topological sort, prioritizing one translation unit at a time. The daemon will schedule all the dependencies for translation unit A to be built before it schedules any dependencies for translation unit B to be built.

I will conduct an analysis to determine which strategy results in faster build times.

Termination

The build daemon will automatically terminate after “sitting empty” for a specified amount of time. For example, if a clang invocation de-registers with the daemon, leaving it with zero registered clang invocations. The daemon will wait the specified amount of time for a new clang invocation to register with itself before terminating.

Thank you to everyone who has taken a look at the RFC! I appreciate any feedback.

CC: @iains @Bigcheese @dblaikie @tahonermann @jansvoboda11 @vsapsai

2 Likes

CC @ChuanqiXu

Thanks for CCing me.

@cpsughrue There are multiple kinds of modules in clang now. The clang modules and c++20 modules. What kinds of modules do the RFC wants to support? I feel like you only want to support clang modules from the wording. It would be better to make it explicit.

BTW, besides the goal of the RFC, I feel it may be better to implement this in a separate tool like clang-scan-deps instead of clang itself.

This looks very interesting (while I completely agree with SG15 colleagues that large-scale systems will need a build system, also strongly believe that we should support small and intermediate projects that are more self-contained).

Have you looked at https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1184r2.pdf in case that is relevant to the communication scheme?

I have some patches in my queue that probably make some of the 1184 things easier (there are drafts of Phabricator, but probably need rebasing and checking for bitrot)

1 Like

I apologize for not CCing you originally.

This RFC encompasses support for clang modules and c++20 modules. I will make sure that is stated explicitly. My understanding is that for Google Summer of Code, clang modules are the priority. So, I plan to implement support for them first and afterward add support for c++20 modules. Is there an aspect of the RFC that is incompatible with c++20 modules? I don’t want the RFC to sound like it was written solely for clang modules.

I think implementing the module build daemon as a separate tool is a good idea. A build system, like make, would still compile source files with -fmodule-build-daemon, but instead of spawning the daemon with clang -cc1modbuildd, the driver would spawn the daemon with a separate tool like clang-module-build-daemon. Is that what you had in mind? I would still need to modify the compiler to communicate with the daemon, but by creating a separate tool, I could minimize this project’s impact on the compiler’s core.

who proposed/is mentoring this project?

(& +1 to @iains’s post - this seems like some overlap with @urnathan’s mapper work - and it’d be nice if the design of this feature was made with that work in mind/factoring in how it might be an improvement to/addition to/compatible with that work, if relevant)

Do you think there is a better solution for small and intermediate projects? Maybe the correct answer is to have different solutions for different sized projects, but it would be nice to have a unified solution that supports all well.

1184 is super relevant. I need to define the message scheme used by the daemon and compiler to communicate with one another, and it would be great to use a specification that’s already gone through a few rounds of revisions. Thanks for bringing it to my attention.

@Bigcheese and @jansvoboda11 are mentoring this project. Here is the original post: [Clang] Modules build daemon: build system agnostic support for explicitly built modules.

I agree that adapting this project to @urnathan’s mapper work would be a good idea. I am working on updating the RFC with some details.

Got it. Makes sense to me.

Is there an aspect of the RFC that is incompatible with c++20 modules? I don’t want the RFC to sound like it was written solely for clang modules.

One point I got now is that the lack of compilation database. For C++20 modules, we can’t assume different module units are compiled by the same set of compilation flags. The introduction of the compilation database looks necessary.

A build system, like make, would still compile source files with -fmodule-build-daemon , but instead of spawning the daemon with clang -cc1modbuildd , the driver would spawn the daemon with a separate tool like clang-module-build-daemon . Is that what you had in mind?

1184 is super relevant. I need to define the message scheme used by the daemon and compiler to communicate with one another

I don’t have a clear mental model yet. I mean, in the RFC, I don’t see any specific requirement to the build systems. This is a big difference with the P1184 proposal mentioned Iain and David. IIUC, P1184 wants a server-client model for build systems and compilers. So it should require the involvement of build systems. Then it looks conflicting with the goal of the proposal, “enabling developers to reap the benefits of explicit modules irrespective of their build system.”.

In the neutral position, a concern is that you want to do too many things in the project. IIUC, you are planning the following things in the page:

  1. The Build Daemon for explicit built clang modules.
  2. The support for C++20 modules. (I am not sure if you know the difference between named modules and header units)
  3. The support for P1184 proposal.

So my concern is that the complexity is growing and it looks not easy to handle.

CCing some guys in SG15, maybe they have insights for this: @ben.boeckel @ruoso

I think that solutions like your proposal are the kind of thing that will make the modules “entry barrier” lower for new adopters, with small(er) initial projects.

A unified solution seems some way off given the complexity of discussions in SG15; one should never rule such a solution out - but, as @ChuanqiXu says, even what you are aiming for now is quite a big bite.

As noted, I have some patches in my queue that implement parts of the compiler-side of this. They are not ready for posting yet (and maybe will need updates to make them work) - but if/when you get to the point of considering that side - please reach out to me and we can see if there’s some way to combine things.

My understanding was that @urnathan had implemented a P1184 mapping implementation that was entirely local/didn’t require coordination with the build system (essentially implementing implicit modules via a custom mapper implementation) - so perhaps the proposal in this thread could be to have a slightly less trivial mapper that uses the scheme here for better multithreading, etc, while still using the P1184 mapping mechanism.

I am not super sure that I understand things correctly. AFAIK, both Build2 and Make are modified to use the protocol. So I feel P1184 may not be build system agnostic.

They did, for efficiency certainly - but I don’t think it’s the only option. I thought @urnathan had implemented a trivial mapper that could mostly/entirely mimic implicit modules.

Yes, there is an in-process version of the module mapper that implements a simple manager. I’d say it’s not quite implicit modules - since the generation of the dependent modules is not part of the interface - there is still an assumed external build system (even if that is only the user generating PCMs to be found).

There’s also g+±mapper-server which implements a similar thing with a (say) socket interface and allowing multiple query-ers … in the limit that server could be viewed as a window to the build system in a discovery-based dependency environment.

I think the PoC I showed you and @zygoloid may have done that – on demand compilation of a module. but the trivial mapper in libcody merely specifis the CMI file given a module name or header-unit name. (It could of course do some on-demand compilation of those, but then one gets into ‘what compilation flags’ and ‘is this a concurrent build’ questions – things I beleive should be in the build system.

Of course, if you want a non-build system implementation one might add query responses along the lines of ‘tell me your compilation options’ and then go build on demand, or something.

Ah, thanks @urnathan - am I reading it right, though - do you think one could implement a module mapper that implements the above proposal? If so, I think that’d be a great direction for this proposal to go in - it’d make the implementation portable between compilers (well, I guess the missing part you mentioned - needing to be able to query from the module mapper back to the client initiating the request to get command line arguments… so maybe that feature would only ever get implemented in clang, so it wouldn’t be especially portable - but might then at least be a good test for the module mapper interface?)?

Yes, the mapper protocol should be able to support that. Right now the reponse to ‘give me the CMI file name for module X’ is expected to be ‘it is file Y’, but there’s no reason the reponse couldn;t be ‘Please tell me Z first’,

I had a conversation with @Bigcheese and @jansvoboda11 about implementing the module build daemon as an external tool or clang plugin. One point brought up was they’ve actually had issues with clang-scan-deps not being integrated into the clang binary. There are some compatability issues with the -cc1 options and driver where if you aren’t using the correct version of clang-scan-deps and clang then clang-scan-deps won’t work correctly making the project more difficult to use for anyone not building from source. By integrating the module build daemon into the clang binary the functionality becomes more accessible to a larger group of people.

Generally, I think this is a good idea. Thanks for working on this

A couple of observation:

  • Protocol
    I’m a bit concerned about encouraging a loose text based IPC protocol, I would encourage something more structured for reliability and maintenance ease, whether that’s binary or json objects or something else entirely, I don’t have a strong preference at all, as long at it’s extensible in the long term, documented and impervious to whitespaces, encodings and line endings on different platforms.

  • Cache
    Arguably orthogonal to this effort, We ought to consider in the cache key the file path, but also compilation flags. But ideally, we should ignore flags that have no effect on the compilation, ie flags can’t affect the preprocessed source because they are not observable through macros, defines, and do not impact semantics in any way. At a first approach, maybe we can just ignore warning flags.
    As such, having an interface in clang that takes a command line and return a hash, such that the hash is identical if the build output would be identical would be very useful, both for this demon, as well as for external build tools.
    I’m not sure how easy that would be, because in effect we would have to rebuild a sorted list of flags, ignoring every warning flags and maybe a few other things.

Another question I have:

Given clang++ -fmodule-build-daemon foo.cpp bar.cpp baz.cpp -o foo, do you intend this to be able to compile all 3 files if they are all modules/module consumers?

This would be neat.