[RFC][Modules] Support simple C++20 modules use from the Clang driver without a build system

Introduction

Currently, there is no easy way to compile a collection of source files using standard C++ named modules into an executable without using a dedicated build system.
This proposal aims to lower the barrier of entry for using standard C++ named modules by providing native support for simple module use directly from the Clang driver.

With the proposed feature implemented, users will be able to compile source files using C++20 modules with a single invocation, similar to:

clang -o program -std=c++20 main.cpp A.cppm B.cppm ...

The proposed feature will support module imports defined in other source files on the command line or in the standard library without adding overhead when modules are not used.

This project is part of this year’s Google Summer of Code. The original project description can be found here.

Impact

This feature will only affect -std=c++20 and newer, and will be enabled by default for any compilation with more than one source file as input.

If any command-line argument indicates that the user or a dedicated build system is managing the modules build (e.g., -fmodule-file , -fprebuilt-module-path ), the feature will be disabled by default.

Using -f[no-]modules-driver should allow explicitly enabling/disabling the feature.
While experimental, this feature will only be enabled when explicitly opted into.

Design

The feature will be implemented using Clang’s existing support for scanning C++ modules to first discover module dependencies between source files and then perform an explicit module build using the reduced BMI format.

For this, I propose adding a new phase to the Clang driver that substitutes the current logic for building compilation actions in builds which should support standard C++ module use from the Clang driver.

The new phase consists of three major steps:

1. Checking for Module Presence

Because of the constraints described in P1857R3, we can detect if a source uses standard C++20 modules by inspecting its first page without any preprocessing.
If its first page contains a module-related text line, the file is known to use C++20 modules.
If the file’s first page contains only whitespace, comments, or other non-relevant lines, additional pages are checked until a decision can be made.

This quick check is done to avoid the overhead of a full dependency scan in cases where modules are not used.
For files with module-specific extensions (e.g., .cppm, .ixx, etc.), module usage is assumed without performing any check.

2. Dependency Scanning

For this, we use Clang’s existing dependency scanning tooling located in clang/include/clang/Tooling/DependencyScanning.
The scan is performed using the P1689 scanning output format and the fast DependencyDirectivesScan.

3. Build Graph and Action Generation

The per-file scan results are merged into a unified dependency graph, which is then topologically sorted to determine the build order and generate the compilation actions.
After the compilation actions have been generated, the driver continues as usual with the modified command-line.

The following illustration is a modified version of the illustration found in https://clang.llvm.org/docs/DriverInternals.html :

Implementation Roadmap

The proposed feature will be implemented and landed in three phases, each corresponding to a step in the new driver phase.

Impl. Phase Component Testing Approach
1 Module Presence Detection Compiler remarks
2 Dependency Scanning Compiler remarks
3 Build Graph and Compilation Action Generation Behavioral tests

The compiler remarks will be enabled by -Rmodules-driver.
For testing of the “Module Presence Detection” component, Clang will emit remarks similar to:

remark: module text-line found in 'A.cpp' (pages read: 1)
remark: module text-line found in 'B.cpp' (pages read: 2)

For testing of the “Dependency Scanning” component, Clang will emit remarks similar to:

remark: 'A.cpp' depends on module 'B'
remark: 'A.cpp' provides module 'A'
remark: 'B.cpp' provides module 'B'

The final step in the new driver phase will be tested using -ccc-print-phases to verify that the generated compilation phases are correct, along with behavioral tests to ensure that test programs compile successfully.

Future Work

Implementing this feature will lay the groundwork for supporting the combined use of both Clang header modules and standard C++ named modules natively from the Clang driver.
This would involve first extending Clang’s existing tooling for dependency scanning to allow scanning for both module types in a single unified scan.
If time allows, this will also be pursued as part of this GSoC project.

CC: @aaronballman, @Bigcheese, @ben.boeckel, @ChuanqiXu, @jyknight, @petrhosek

2 Likes

From the perspective of user interface, maybe you can take a look at GCC. They did similar thing.

For this idea, if we would like to make it practical, I think we have to support import std. No matter in teaching or thoughts sharing, the standard library is almost necessary.

Hi! I wanted to ask for clarification regarding this point.
To my understanding, GCC does not currently have a feature similar to this.

import std is planned to work with this feature and should also be supported with -std=c++20 .

Hi! I wanted to ask for clarification regarding this point.
To my understanding, GCC does not currently have a feature similar to this.

I didn’t use GCC a lot. But in my memory, GCC can support build an executable with multiple TU with modules in one command line. They will create a module cache dir in the CWD and they will look into that cache dir. You can take a concrete look. This may not be a famous feature.

That is the “here is a directory” implicit module mapper functionality. I would investigate:

  • incremental build behavior
  • behavior of pre-existing module files in the directory (e.g., importing a module with a BMI there but not on the command line)

The latter is fixable by using a temporary directory every time, but then there is no potential for incremental builds (which may be fine).

Initially, we’d like this feature to mirror the behavior of regular builds (without driver support for C++20 modules use).
Since Clang doesn’t currently support incremental builds for other compilation types, the feature would therefore initially not support incremental builds and use a temporary directory for each compilation.

I think we should support incremental builds in the future. Even the simplest use-cases would have significant overhead if the std module needs to be recompiled each time.

There needs to be some way to verify any cached results. At that point, clang basically has P2581 that build systems can use to also reuse installed BMIs (when possible).