[RFC] About the default location for std modules

Implicitly build modules are gone. There is clang-scan-deps for explicitly build modules.

The bigger issue for me is to teach many build systems how to build a std.pcm file from the cppm files provided by libcxx: CMake, Meson, Bazel, Buck, my Makefile and many more.

Modules add a bunch of semantics around exported symbols, but maybe it’s possible to implement the semantics without actually building BMIs?

That is, I’m wondering if it’d be feasible to support a “just works, but slow” fallback mode where we implement the semantics of modules within a single clang process – without prebuilds and without spawning sub-clangs to emit and then import BMIs (to avoid the overhead of spawning sub-compilers and AST serialization/deserialization). Maybe it won’t be too costly for trivial cases if we can do it all in-process.

It’ll definitely be slower than pre-building the modules and wastes a lot of compute if you are building many source files, but it seems like it may be valuable to at least allow this to be possible?

1 Like

According to who? They are still the dominant manner by which Clang modules are used. C++20 modules have no significant deployment at present.

So long as the compiler has the metadata needed to associate a module name with module interface unit source file, yes.

I believe so, particularly for existing build systems that will require major updates to handle explicit modules (the same build systems that struggle with generation of header files) and, as mentioned, for small programs that have just one or a few source files.

Strongly agreed that we should provide a simple quickstart hello world for beginners. But we don’t want implicitly building modules clearly. @dblaikie said this to me at least twice. Also the traditional Clang Modules want to get rid of it ([Clang] Modules build daemon: build system agnostic support for explicitly built modules - #4 by Bigcheese). So implicit building modules should be gone.

My thought for the quickstart is that the vendors can try to build the std modules (by the default configuration) when the users install the packages. This is implementable and we tried it in our downstream. Our experience is: it is good at least for experiencing quickly. But we generally need to build the std modules again in the real product. But I think this is not bad.

1 Like

Yeah! This is what I am thinking about for a long time. But my true intention for this is not about the distribution nor the quickstart turorial.

I feel such mode is pretty meaningful for static analysis tools like clangd. Since the current BMI design is too heavy for them.

I agree it would be great when Clang knows how to build libc++'s, libstdc++'s, and MSVC STL’s modules without additional assistance, just like it know which Standard library to link against. But I don’t think that needs to be a primary objective, so I would be happy when initially when CMake can do it.

I think it would be great if there would be a standardized description to teach tools how to build standard modules. I haven’t looked in detail at P1689 to see whether it contains enough information. If not, I think we shouldn’t wait on SG15, but instead we should reach out to SG15. (I haven’t looked at whether there are papers in this specific area.)

1 Like

In my mind, that qualifies as a solution for implicitly built modules (presumably the build daemon would know how to build at least the standard modules or those that are defined by module.map files. This is a reasonable approach though it might create challenges for tools built on BEAR or for tools like Coverity that depend on being able to observe build execution. But this is tangential to this topic.

I just had a chat with @mordante and we both agree on the following:

(1) Where should the *.cppm files live by default?

Right now, we install libc++ headers to <PREFIX>/include/c++/v1.

I think it would make the most sense to install .cppm files to a similar path like <PREFIX>/modules/c++/v1. This way, we would have something like:

<PREFIX>/modules/c++/v1/
                        std.cppm        // guaranteed to be there
                        std.compat.cppm // guaranteed to be there
                        [...]
                        std-variant.cppm // Implementation details, name not guaranteed and users don't rely on this.
                        std-foo.cppm     // We might want to move all of those to a subdirectory just for clarity.

I am not 100% attached to having the /v1/ subdirectory, however I feel pretty strongly that we should avoid putting std.cppm and all our other .cppm files at the top-level <PREFIX>/modules directory if we expect that subdirectory to be a fairly standard location to put modules files for various projects to avoid name collisions and for namespace hygene generally. This is really similar to how we wouldn’t want to throw our libc++ headers directly under /usr/include (and BTW the fact that the C library uses /usr/include directly is a common source of issues for vendors, at least over here).

(2) Where should the std.pcm live by default?

I am not sure we actually need to answer that question. Since the .pcm files are going to be built by users (or their build systems) for the foreseeable future, it doesn’t make sense to put those .pcm files relative to the compiler or relative to the libc++ installation (which could be relative to the compiler or relative to some SDK, depending). In fact, I would expect that in most cases users might not even have write access to those paths. Instead, I believe build systems should simply build the .pcm files from the .cppm files and put those .pcm files somewhere in their build directory. Build systems would then pass -fprebuilt-module-path=<PATH> to the compiler explicitly.

It would also be possible for Clang itself to go and do that work of building the .pcm files for your particular compiler invocation, however I would not try to encode whether and how Clang should do that if they decide to. At that point, Clang would be acting as a build system so as long as it knows where to find our .cppm files, everything should work.

This has been discussed a bunch, but whether BMIs will ever be portable is an open question, so I would recommend going for a simple solution that can unblock build systems sooner rather than later.

I think a lot of people also want to lower the barrier to users trying out C++ modules so they don’t have to build std.pcm & friends themselves. I completely agree that the user experience needs to be good, however I also want to exercise caution in shipping anything build-system specific. If we start shipping a .cmake file that allows building std.pcm more easily, we may open the floodgates for everyone to start contributing support for their favorite build system, and we should avoid that. IMO it would be acceptable to ship an “experimental” .cmake file at first to ease the integration for early adopters, but it should be made clear that that is not something that’s going to be stable and that we won’t add support for other build systems. This would basically be a way to ease the development and testing modules as we’re bootstrapping things, but nothing else. And once CMake officially supports building std.pcm from our .cppm files that .cmake file wouldn’t be necessary anymore and I would remove it.

Thoughts?

3 Likes

That all sounds like a good plan to me.

Looks pretty good to me too!

For the system compiler that would require a new directory in the /usr hierarchy, which could be a problem for Linux systems following the Filesystem Hierarchy Standard.

Do you really want to make a land grab on /usr/modules with the same status as /usr/include and /usr/lib just for C++ modules?

Good point. As libc++ claims ownership of <PREFIX>/include/c++/v1, it would make sense to put the modules somewhere in that directory structure, e.g. <PREFIX>/include/c++/modules .

That is indeed what we propose to get feedback how people feel about it. I know on Debian based systems libc++ is installed under /usr/lib/llvm-<version>/ so there it does not interfere with other system parts. I’m not sure where other distributions install libc++.

Do you already have an idea where libstdc++ will store its module sources?

I really dislike the idea of storing non-header files in an include directory. It’s not intended for users to write code like #include <std.cppm>

For Fedora and RHEL we use /usr/include/c++/v1/ where libstdc++ uses /usr/include/c++/$version

I think /usr/include/c++/modules could be reasonable (with sub-dirs for libc++ and libstdc++), since that isn’t in the default header search path, but keeps the files ina C+±specific location.

Do you already have an idea where libstdc++ will store its module sources?

No, but I think it would make sense to add a new compiler flag that says “find me the compiler-provided module definition files”, similar to how g++ -fmodules-ts -c -x c++-system-header vector compiles <vector> as a header unit. So maybe something like g++ -x c++-system-module std.cc (or some other appropriate file extension, or just omit the extension and have gcc know to look for std.cc when you say -x c++-system-module std).

I’ll comment here the same thing I said in the Phabricator review, for the sake of completeness:

$PREFIX/modules is a new path that doesn’t exist in the Filesystem Hierarchy Standard.

The source files for the module units is indeed a arch-independent resource, therefore the correct directory would be something under $PREFIX/share/

If we ever intend to ship BMI files, they would belong in $libdir.

Here’s how it would look like if we go in that direction:

$PREFIX/
   $libdir/
       libc++.so
       libc++.so.module-info
       c++/
           modules/
                libc++/
                      std.gcm.deadbeef1234
   share/
       c++/
           modules/
                 libc++/
                       std.cppm

the libc++.so.module-info file would have the metadata necessary for someone to understand how to produce their own BMI as well as potentially reuse the shipped BMI if it just so happen that they can.

There is one important bit that worries about splitting off the source files, tho.

The module metadata shipped alongside the library itself needs to reference those source files, and I am concerned that requiring the use of ../../../../share/ in order to address the source location can lead to fragility in the deployment.

So, even though it’s technically arch-independent, I would also consider the following option:

$PREFIX/
   $libdir/
       libc++.so
       libc++.so.module-info
       c++/
           modules/
                libc++/
                      std.gcm.deadbeef1234
                      std.cppm

Because at that point, the metadata could reference a relative path from the library location without the awkwardness that using …/ can cause when directories are symlinks.

No, that’s a great point. I was coming from the POV where those headers are in a SDK, so where <PREFIX> is not /usr.

That would mean /usr/include/c++/modules/libc++/v1/std.cppm and /usr/include/c++/modules/libstdc++/std.cppm (or something along those lines)? I’d be fine with that. It’s not super pretty, but it satisfies all the criteria I have. I can already imagine folks being confused by whether it’s /usr/include/c++/v1/modules/libc++/std.cppm or /usr/include/c++/modules/libc++/v1/std.cppm, but that’s probably acceptable.

However I must admit I would have found it pretty neat if there was an established place for even non-standard-libraries to install their .cppms if we think that’s going to be a common need. That established place would ideally not be under /usr/include/c++, since we generally think of it as being reserved for the C++ implementation (or am I the only one making that assumption)?

We’ll never be able to do it “instead of” the sources. But we might be able to ship them “in addition” to sources. This was the topic of P2581R2: Specifying the Interoperability of Built Module Interface Files.

I expect Linux distributions and other “coherent-compiler-usage” environments to ship BMI for the cases where the user would be able to use it.

2 Likes

I would strongly suggest not to go in that direction. I wouldn’t want to create the impression that those files are meant to be used as #include <c++/modules/std.cppm>.

I think module source files should go into an entirely new directory that is not in the current search path for the preprocessor.

For the system libc++ package, I think it would mean something like:

/usr/include/c++/modules/v1/std.cppm

And for the system libstdc++, something like:

/usr/include/c++/modules/14/std.cppm

i.e. the libc++ headers are in a v1 subdir and the libstdc++ ones in a plain number (without v) based on the GCC release (either just 14 or 14.1.0 depending on the configuration). I’m not sure how much of that v1 path is based on upstream convention and how much is Fedora-specific, @tstellar knows though.

For non-system packages, where the entire compiler is in some other <PREFIX> that isn’t /usr, we’re already going to be disambiguated by the prefix, I think. But <PREFIX>/include/c++/modules/v1 doesn’t seem too bad, even if it’s a bit longer than just <PREFIX>/modules.

However I must admit I would have found it pretty neat if there was an established place for even non-standard-libraries to install their .cppms if we think that’s going to be a common need.

Yes, but I think it’s too early to try and establish that convention, and leading with the std module might not be the best idea (it’s already going to be a bit special compared with other modules). If a convention is established later, we can move std.cppm there … but we’d always want some hierarchy below there so that e.g. qt-5.x/qt.cppm and qt-6.x/qt.cppm can co-exist. Maybe /usr/share/modules or /usr/src/modules would be better. Trying to be FHS compatible will ease adoption on linux, even if it’s not universally used by non-linux systems.

:face_vomiting:

Is that really a problem though? Users can already do #include <c++/12.2.0/vector> with GCC today, and nobody’s stupid enough to do that (as far as I’m aware … and I see some pretty stupid things in bug reports). I appreciate that least <vector> is a header file, even if that way of including it is wrong, and module definition files aren’t headers at all.

But I’m not attached to the idea of putting them under /usr/include, I just really dislike a new /usr/modules dir. Something like $prefix/share/modules or $prefix/share/gcc-14.1.0/modules would work too.