[RFC] About the default location for std modules

Project Background

We’re implementing std modules in ⚙ D144994 [Draft][libc++][modules] Adds std module.. Although I thought to discuss this after we land that patch, it looks like people are more interested in this topic and it should be beneficial to receive more opinions.

Language & Implementation Background

To use modules, we need to compile *.cppm files to (.pcm files (Binary Module Interface, BMI for short) first. Then we can use the modules. For the std module, we need to compile the std-*.cppm files to std.pcm files first, then the consumers can import std modules.

The goal of standard c++20 modules is to port the BMI files. However, due to the current implementation limitations, we can only distribute the *.cppm files only instead of BMI files now. This is the choice of MSVC too. See Tutorial: Import the standard library (STL) using modules from the command line (C++) | Microsoft Learn.

For the compatibility limitations in clang, we can find more information in Standard C++ Modules — Clang 17.0.0git documentation. Simply, the BMI may only be reusable if the triple {Compiler, Flags, Sources} are the same. Note that the Compiler here doesn’t mean GCC and Clang only. Clang16.0.0 is different from Clang16.0.1 in this case.

What do we want to discuss here?

As mentioned above, we need to distribute *.cppm files and we should compile the *.cppm files into std.pcm for users to import it. Then the question may be:

(1) Where should the *.cppm files live by default?
(2) Where should the std.pcm live by default?

Especially for the second question, we need to tell the answer to the compiler as a default location. So that we can use clang++ -std=c++23 Hello.cpp to compile the following codes:

import std;
int main() {
    std::cout << "Hello World.\n";
    return 0;
}

It sounds bad to tell the user that he need to know more to compile a hello world example.

For the second question, my suggestion may be ${CLANG_EXECUTABLE}/../lib/clang/{version}/modules. Since ${CLANG_EXECUTABLE}/../lib/clang/{version}/lib is the default location for the compiler_rt libraries, like libclang_rt.asan_cxx-x86_64.a. Since both the BMI and the libclang_rt libraries are tight to the compiler versions, I feel it may be good to keep the logic for std modules too.

For the first question, I feel we should keep the same logic with headers. For example, given we’re going to install headers to $PREFIX/include, we should install the *.cppm files to $PREFIX/modules. I don’t want to resue $PREFIX/include since we shouldn’t include *.cppm files.

How should we distribute/package std modules?

This is an open question that need to explore.

Previously I use the following scripts in the downstream to build rpm packages:

# std-module.spec

...

Requires: clang_xxx # for example clang17

%prep
%build
%install

mkdir -p <path-to-the-install-destination-of-std-module>
cp `.cppm` files to  <path-to-the-install-destination-of-std-module>
cp ../../../build.sh <path-to-the-install-destination-of-std-module>

...

%post
<path-to-the-install-destination-of-std-module>/build.sh

The meaning of the scripts is that we will only port *.cppm files in the rpm package. Then when users try to install the rpm package, the installer will try to compile the *.cppm files automatically.

I guess people may feel it is an overkill to compile files during the installation. My intention is that the users (especially new comers who use modules first) may feel better if they can compile a hello world example by the following command lines:

apt install cmake clang libcxx
clang++ -std=c++23 Hello.cpp

I feel it is much more friendly than asking the users to learn how to compile std modules (like Tutorial: Import the standard library (STL) using modules from the command line (C++) | Microsoft Learn ) first.

Summary

I think we need to discuss the default location for std modules at first, including the location for *.cppm files and the location for std.pcm file. Then we can discuss how to distribute them.

Since the named modules is a brand new topic, I guess every one don’t know how to handle it properly. So every opinion/question is welcomed.

CC: @mordante @aaronmondal @ldionne @philnik @h-vetinari

1 Like

Very roughly, how far away are we from (ever…) being able to distribute .pcm instead of .cppm files? If we ever get there, could they reasonably live in the same folder as the one we’re now coming up for the .cppm files (I guess the answer should trivially be yes since the extensions are different, but just trying to spell this out for completeness)? If so, I think $PREFIX/modules would be an attractive option.

CC (for the OP, not my comment) @mgorny @ben.boeckel @ruoso @Bigcheese @urnathan @iains @STL_MSFT @tahonermann

Very roughly, how far away are we from (ever…) being able to distribute .pcm instead of .cppm files?

It is too hard to answer. I don’t know really. I only would say not this year for sure and possibly not the next year. I am not saying we’ll get it in 3 years later.

If we ever get there, could they reasonably live in the same folder as the one we’re now coming up for the .cppm files

I feel it is a little bit odd. Since, personally, the directory for *.cppm files shouldn’t be the same for the *.pcm files.

I do not think that there was (or yet is) a goal to distribute BMIs. (that is consistently what I’ve been told by the Modules feature designers). [AFAIU] The intent was that BMIs were ephemeral build-time artefacts, present to improve some aspects of language isolation and also build performance.

I realise that some folks would like to distribute some form of BMI … but that seems to require significant amendments to the current designs (at least for GCC and clang). Otherwise a distributed BMI is going to be restricted to “works with -O2 -g” or some similar statement.

This is because current situation for GCC and clang is that the BMIs are tightly coupled to the AST representation and thus dependent on the options with which they are produced.

[I think] We would have to investigate a design that encapsulates some more general representation of the source (and perhaps allows sub-sections with different build conditions as a form of cache).

3 Likes

I remember Gabby want a portable BMI (or compatible BMI?) and this is the intention of IFC format. And I agree it is pretty hard.

[I think] We would have to investigate a design that encapsulates some more general representation of the source (and perhaps allows sub-sections with different build conditions as a form of cache).

If we can find such a representation, it will be equal to BMI from the perspective of users. (Then the current BMI become transparent). So we may get compatible BMI finally : )

I would agree that (of the offerings currently on the table) IFC shows the most promise for portability - however, it is currently quite tightly tied to the Windows platform, so that some engineering would definitely be needed even to test the idea(s) on other platforms.

Sure; in fact, ideally the end user does not care about BMIs at all - it’s all “compiler and distribution magic”.

However, right now, I’d be very cautious of packaging current BMIs; since (IMO) that will most likely be a very poor end user experience (and thus potentially will detract from modules adoption). Instead we should present a clear message about building them “on the fly”, with good support from the build systems to ensure that the process is as painless as possible. I believe these are the objectives that SG15 (and the represented build system folks) have.

LLVM provides tooling for integrating it into out-of-tree projects:
https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

Maybe this could be a solution, libc++ ships the cppm files and a CMake file and I am going to build the BMIs in my project. The flags that I use will wildly differ from the ones that you used to build the std.pcm file.

Yeah, I believe we (you and I) are on the same page. I told that we should consider distributing *.cppm files now and may be someday we can distribute something processed. But we don’t know when it is.

I didn’t know this before. But it looks like a technique to use LLVM as a library. I feel it may not be close to distribute libcxx, is it?

Exactly. But a similar mechanism could help me to use libcxx as a library and build my own BMIs.

Is clang expected to support only the libc++ implementation of std modules, or should we expect that at some point in the future we may also need to support e.g. supporting the implementation from GCC?

Distributing .pcm files instead of .cppm files isn’t, and shouldn’t be, a goal. Since .pcm files are (and always will be) sensitive to build configuration, it isn’t realistic for a single .pcm artifact to suffice for all uses. Additionally, other tools (like static analyzers) will require source files to be present and will not be able to consume distributed .pcm files.

SG15 has been discussing, for quite some time now, ways to support BMI distribution such that a build process that would otherwise produce compatible BMI files can consume the distributed ones instead. There are a lot of challenges that need to be resolved before this can happen (like how a build system can determine if a pre-existing BMI file is suitable for use in a particular project). I think we should wait for SG15 to issue recommendations before we begin distributing BMIs.

1 Like

Both Clang and gcc use a format that is closely tied to their internal representations which means that 1) stability and reusability is broken by changes to those internal representations (which are required as the language evolves), and 2) means interop would require very highly compatible internal representations.

My personal belief is that there is too much implementation specific detail that ends up in BMI files regardless of the BMI format for such interop to ever be realistic.

I don’t think this is possible. A user specifying any flag that changes the preprocessor -march, macOS -arch, -mno-bmi, __has_builtin responses, etc. is going to want to regenerate the BMI. Unless Clang (or any compiler) commits to shipping any possible combination and knowing how to look it up implicitly based on the current flag set.

Any portable IFC format will need to encode lookup rules for preprocessor state to change what is exposed (say a template instantiation) and encode the differences in…some way. Knowing how to do something like this for the following sounds very difficult (if you want to up the difficulty, make the #ifdef checks just around the different tokens via void ifdef name else othername endif arglist):

#ifdef __AVX2__
void _with_avx2(void*) { /* impl */ }
#else
void _without_avx2(void*) { /* impl */ }
#endif

export inline void pub_func(void* arg)
{
#ifdef __AVX2__
  _with_avx2(arg);
#else
  _without_avx2(arg);
#endif
}

What to do with -stdlib=libstdc++ (that a lot of Linux distros use AFAIK).

+1. Even for the sources, I would request to make the interface via llvm-config, a variable provided by find_package(…), and/or a special key in .pc files so as to encourage not hard-coding any location and having to provide both once a standard location is agreed upon (between FHS, distros, SG15, and/or implementors).

2 Likes

For prior art, Microsoft STL is shipping a JSON file to describe what’s going on. Agreeing on something like this would help build systems support standard modules properly a great deal.

I agree with all post above that BMIs are not portable and should not be the way to distribute libc++'s modules. If they ever become portable we can reconsider that.

The patch already has a way to generate a CMakeLists.txt which builds the BMIs. I’ve used this to test import std; in the libc++ lit tests. This makes it “easier” for users to generate their BMIs. However I believe it would be better when build tools like CMake know how to generate the BMIs for the std and std.compat module. Note since the patch is still very much work in progress I haven’t reached out to the CMake developers to see how they think about this, but I see they are aware of this thread.

@ben.boeckel A similar JSON file for libc++ will need more information. Vendors can decide to omit parts of the library, for example locales. Then the BMIs need to be generated with the proper build flags to disable them. (This is part of the example CMakeLists.txt, but completely untested.) I think it would be good to also discuss this in SG15 and see whether it’s possible to let all Standard libraries use a similar JSON file.

I was asking about .cppm files.

Ah, assuming you were asking if Clang should be able to compile .cppm files from libstdcxx in order to produce a BMI for later -stdlib=libstdc++ compilations with Clang, then sure, subject to limitations in build systems, module dependency scanning, etc… to obtain the information needed to construct the appropriate Clang invocations in the first place.

It looks like the consensus is
(1) We should wait for the consensus from SG15.
(2) This might be a job for the build system.

I’ve long said that I think Clang modules were successfully deployed because they could be used without build system changes (other than having to pass -fmodules). It won’t look good for C++ if the answer to “how do I build hello world” is to first choose a build system. I think clang hello.cpp -o hello (with or without additional options) must be made to work. That implies that some form of implicitly built modules, at least for the standard library, would be required.