This comes from [C++20] [Modules] Embed all source files for C++20 Modules by ChuanqiXu9 · Pull Request #102444 · llvm/llvm-project · GitHub and Shouldn't `-fmodules-embed-all-files` be the default? · Issue #72383 · llvm/llvm-project · GitHub. Given the history are pretty long, and there are a lot of discussed topics, I’ll try to describe the background here.
Background
Source location is a pretty fundamental concept to clang. During the compilation of the frontend, the compiler needs to have a reference to the contents of the source file some times.
For modules, currently we will embed the source location as offsets instead of embedding the file contents. So when we compile with modules and we need to get the contents of a file, we will try to open the file on disk and get the contents according to the recorded source location.
The behavior has two results:
(1) The BMI became invalid after we change the source file.
(2) When we compile with modules, we always need the dependent source files to be present in the compiling environments.
The first point is minor and it may only affect some debugging experience. But the second point more or less affects the experience of distributed builds and sandboxed builds. If we can get rid of it, we can avoid downloading the source files or putting the files into the sandbox when compiling with modules.
And according to the discussion in the issue’s thread, both GCC and MSVC don’t have such constraints.
To achieve this, there is an issue to require to make -fmodules-embed-all-files
to be the default option. Then the source contents will be embedded to the BMI, then we won’t be required to open the referenced file actually during the compilation of modules.
Security concerns on IP
@AaronBallman points out such strategy has security concerns. Since after this, the source contents in the BMI are technically public to the consumers.
My thought to the security question
- According to the consensus of SG15/WG21, it is not expected to distribute the BMIs. Since BMI is pretty sensitive to the compiler (with different versions) and configurations. But as pointed by @iains, there are end users who consistently require distributable BMI. (*)
- The BMI is built module interface. And the module interfaces may only take the position of headers. Then headers are never private. So I think the option may never make things worse than the era of headers.
- Today when we use BMIs, we would always require the source files to be available in the environments. So I think the option won’t make existing usage worse.
(*): It is true that distributable BMI is technically possible. The primary blocking issue is that it is too hard to (design and) implement.
Summary and options forward
(1) Fix the underlying issue. Readers may already recognize that the two topics (whether or not embedding source files) (security concerns) are not technically mutually exclusive. The fundamental technical problem may be that clang require to open the actual file during the compilation. It looks like both GCC and MSVC doesn’t have the problem. But the problem is, it is too fundamental and I actually, don’t have an idea about how to fix it. If we choose this, from the perspective of the actual process, it may be the same thing as not doing any thing in a relative long period.
(2) Make -fmodules-embed-all-files
a user option. Currently -fmodules-embed-all-files
is a Xclang option. And after we make it a user option. The users who want it can enable it themselves. Then we don’t need to care about the security problems. While this may be the most people prefer, a problem in detail is, should we make it enabled by default? If we don’t make it enabled by default, it is hard for the test to cover. And I think the most people may prefer to embed the source files.
(3) Just embeding the source files. As I mentioned above, I don’t think it will be security a problem since it won’t make any thing worse. And the reason why to not add a new option is, there are already too many options, especially for modules. We talked this many times in modules developers meeting that we don’t like to many options. (The new added -fexperimental-modules-reduced-bmi
option is planned to be enabled by default and be removed).
Clang: consensus called in this message