Make command line support for C++20 module uniform with GCC

Hi all,

Recently I am playing with C++20 modules and I found that the command line support of GCC
is much better than Clang. Here is an example:

// say_hello.cpp
module;
#include <iostream>
#include <string_view>
export module Hello;
export void SayHello
(std::string_view const &name)
{
std::cout << "Hello " << name << "!\n";
}
// main.cpp
#include <string_view>
import Hello;
int main() {
SayHello("world");
return 0;
}

To compile the example, in gcc we need:

g++ -std=c++20 -fmodules-ts say_hello.cpp main.cpp

And in clang, we need:


clang++ -std=c++20 -fmodules-ts -Xclang -emit-module-interface -c say_hello.cpp -o Hello.pcm

clang++ -std=c++20 -fmodules-ts -fprebuilt-module-path=. main.cpp say_hello.cpp

Yeah, in clang we need to another line to emit module interface explicitly and another option
to tell the prebuilt-module-path. And in GCC, this happens by default, when GCC find it is compiling
a c++20 module, it would generate the module interface automatically to the path:


gcm.cache/filename.gcm

It would create gcm.cache in case it doesn’t exist.

And GCC would search prebuilt module interface in gcm.cache automatically.

It looks much more friendly to me. The intention of this mail is to ask if you think it is the right direction
to make the clang’s command line support for c++20 module more like GCC. The different I see now includes:

  • Generate prebuilt module interface automatically. (And generate it to a specific directory automatically)
  • Have a default value for prebuilt module path.

I am wondering if any one more familiar with the clang’s command line and file system would love to
support this (I am not so familiar with it). Although It may take more time, I would love to support if others are busy.

Thanks,
Chuanqi

+Nathan and Richard as folks with some context here.

Hi all,

Recently I am playing with C++20 modules and I found that the command line support of GCC
is much better than Clang. Here is an example:

// say_hello.cpp
module;
#include <iostream>
#include <string_view>
export module Hello;
export void SayHello
(std::string_view const &name)
{
std::cout << "Hello " << name << "!\n";
}
// main.cpp
#include <string_view>
import Hello;
int main() {
SayHello("world");
return 0;
}

To compile the example, in gcc we need:

g++ -std=c++20 -fmodules-ts say_hello.cpp main.cpp

And in clang, we need:


clang++ -std=c++20 -fmodules-ts -Xclang -emit-module-interface -c say_hello.cpp -o Hello.pcm

clang++ -std=c++20 -fmodules-ts -fprebuilt-module-path=. main.cpp say_hello.cpp

Yeah, in clang we need to another line to emit module interface explicitly and another option
to tell the prebuilt-module-path. And in GCC, this happens by default, when GCC find it is compiling
a c++20 module, it would generate the module interface automatically to the path:


gcm.cache/filename.gcm

It would create gcm.cache in case it doesn’t exist.

And GCC would search prebuilt module interface in gcm.cache automatically.

It looks much more friendly to me. The intention of this mail is to ask if you think it is the right direction
to make the clang’s command line support for c++20 module more like GCC. The different I see now includes:

  • Generate prebuilt module interface automatically. (And generate it to a specific directory automatically)
  • Have a default value for prebuilt module path.

This sort of interaction is probably not going to be how modules are generally built/supported as far as I understand it - it’s opaque to the build system, so may make it more difficult for the build system to know when things need to be rebuilt, and also wouldn’t support any kind of distributed build system.

The specifics of how GCC, Clang, (& maybe other compilers) and various build systems may end up interacting on the command line and filesystem (module discovery outside the current build for system-installed library dependencies) is still being discussed and debated in places like C++'s SG15 tooling working group.

That doesn’t mean we can’t experiment further with things like this, but I’m not sure we will/should be supporting cross-compiler interface compatibility until we are a bit more sure about what the best thing to standardize no is.

Hi Blaikie,

Hi Blaikie,

This sort of interaction is probably not going to be how modules are generally built/supported
as far as I understand it - it’s opaque to the build system, so may make it more difficult for the
build system to know when things need to be rebuilt, and also wouldn’t support any kind of distributed
build system.

The specifics of how GCC, Clang, (& maybe other compilers) and various build systems may end up interacting
on the command line and filesystem (module discovery outside the current build for system-installed library
dependencies) is still being discussed and debated in places like C++'s SG15 tooling working group.


Understood. My point is that we could make clang support for C++20 more friendly by adding extra default behavior.
And your point is that it may not be true in distributed build system.

The specifics of how GCC, Clang, (& maybe other compilers) and various build systems may end up interacting on
the command line and filesystem (module discovery outside the current build for system-installed library dependencies)
is still being discussed and debated in places like C++'s SG15 tooling working group.


I have a basic question about the Clang/LLVM policy. I remember that one of the policy of Clang/LLVM’s command line system
is to be compatible with GCC. This is basically true so that we could transfer the compiler used in various projects.
Is this policy not true now?

I wouldn’t state a general policy - though compatibility’s certainly been a motivating use case when Clang started out, and to some extent still is today - but it’s not always a case of “if GCC did it, Clang must accept patches that match the functionality” - especially new functionality that’s in flux, I think there’s some room for some nuance.

  • Dave

Got it. So you think the better choice is to wait the decision from SG15 or authoritative conclusion from somebody, do I get your point?

From the perspective of a user, I like the GCC style more. After all, it is not easy to rememeber to add ‘Xclang -emit-module-interface’
all the time. I would like to see if there is any comment to address the drawbacks in current command line support in GCC.

Thanks,
Chuanqi

Hi Blaikie,

This sort of interaction is probably not going to be how modules are generally built/supported
as far as I understand it - it's opaque to the build system, so may make it more difficult for the
build system to know when things need to be rebuilt, and also wouldn't support any kind of distributed
build system.

The specifics of how GCC, Clang, (& maybe other compilers) and various build systems may end up interacting
on the command line and filesystem (module discovery outside the current build for system-installed library
dependencies) is still being discussed and debated in places like C++'s SG15 tooling working group.

---
Understood. My point is that we could make clang support for C++20 more friendly by adding extra default behavior.
And your point is that it may not be true in distributed build system.

The specifics of how GCC, Clang, (& maybe other compilers) and various

build systems may end up interacting on

the command line and filesystem (module discovery outside the current build for system-installed library dependencies)
is still being discussed and debated in places like C++'s SG15 tooling

working group.
---
I have a basic question about the Clang/LLVM policy. I remember that one of the policy of Clang/LLVM's command line system
is to be compatible with GCC. This is basically true so that we could transfer the compiler used in various projects.
Is this policy not true now?

GCC had the advantage of seeing clang's experiments. The history is different for the two compilers here -- clang developed 'implicit modules', driven by a large build system. with GCC I was very mindful that 'hello world' should be simple to drive -- as you have found. Clang has the tricky job of not breaking its existing interface.

As David says, what the best way to drive module compilations is no yet clear.

nathan

I got it. The key point here is that since clang’s module is already used in scale. So we couldn’t change it arbitrarily before we get a clear solution.

Thanks,
Chuanqi

Let me just throw out there that any interface involved -Xclang is not a proper user-facing interface. The -Xclang options aren’t intended for use by end users.

–paulr

Let me just throw out there that any interface involved -Xclang is not a proper user-facing interface. The -Xclang options aren’t intended for use by end users.

Yeah, it’s certainly still in flux/unclear how this should look long-term. I mean Google’s been using these -Xclang interfaces for building explicit Clang Modules for years at this point - so probably worth giving a legitimate/official interface for that use case, even if it doesn’t answer all the questions about how C++20 modules will be consumed.

–paulr

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of chuanqi.xcq via cfe-dev
Sent: Wednesday, October 27, 2021 2:28 AM
To: Richard Smith <richard@metafoo.co.uk>; David Blaikie <dblaikie@gmail.com>; Nathan Sidwell <nathanmsidwell@gmail.com>
Cc: cfe-dev <cfe-dev@lists.llvm.org>
Subject: Re: [cfe-dev] Make command line support for C++20 module uniform with GCC

I got it. The key point here is that since clang’s module is already used in scale. So we couldn’t change it arbitrarily before we get a clear solution.

I think there’s probably space to experiment with more things without breaking the interfaces currently in use - but equally not immediately going for GCC compatibility for its own sake either.

Let me just throw out there that any interface involved -Xclang is not a proper user-facing interface. The -Xclang options aren’t intended for use by end users.

Yeah, it’s certainly still in flux/unclear how this should look long-term. I mean Google’s been using these -Xclang interfaces for building explicit Clang Modules for years at this point - so probably worth giving a legitimate/official interface for that use case, even if it doesn’t answer all the questions about how C++20 modules will be consumed.

Anyone know if Google has published any open source projects that use modules yet? Or even any open source project that uses modules? I’m not aware of any yet.

Let me just throw out there that any interface involved -Xclang is not a proper user-facing interface. The -Xclang options aren’t intended for use by end users.

Yeah, it’s certainly still in flux/unclear how this should look long-term. I mean Google’s been using these -Xclang interfaces for building explicit Clang Modules for years at this point - so probably worth giving a legitimate/official interface for that use case, even if it doesn’t answer all the questions about how C++20 modules will be consumed.

Anyone know if Google has published any open source projects that use modules yet? Or even any open source project that uses modules? I’m not aware of any yet.

Google isn’t using C++20 modules, only Clang Header Modules, where the source is a subset valid non-modular code - so Google has published lots of open source code that is compiled internally with Clang Header Modules - but it’s unobservable, basically, because it’s also just valid non-modular C++.

  • Dave

Hi all,

Recently I am playing with C++20 modules and I found that the command line support of GCC
is much better than Clang. Here is an example:

// say_hello.cpp
module;
#include <iostream>
#include <string_view>
export module Hello;
export void SayHello
(std::string_view const &name)
{
std::cout << "Hello " << name << "!\n";
}
// main.cpp
#include <string_view>
import Hello;
int main() {
SayHello("world");
return 0;
}

To compile the example, in gcc we need:

g++ -std=c++20 -fmodules-ts say_hello.cpp main.cpp

And in clang, we need:


clang++ -std=c++20 -fmodules-ts -Xclang -emit-module-interface -c say_hello.cpp -o Hello.pcm

clang++ -std=c++20 -fmodules-ts -fprebuilt-module-path=. main.cpp say_hello.cpp

Your point is well-taken. However, some part of the extra work required here is that you’re not doing things in the expected way.

The above is not a correct way to enable C++20 modules in Clang: -fmodules-ts enables the old Modules TS mode, not C++20 modules. -std=c++20 is enough to enable C++20 modules.

For the ‘-Xclang -emit-module-interface’ portion, what Clang expects is that files that define module interfaces are either named .cppm or are specified with -x c+±module. With that file type, you can use --precompile to produce a .pcm file (just like you’d use -E or -c to produce other kinds of output). For example:

clang++ -std=c++20 say_hello.cppm --precompile -o Hello.pcm

The above commands are also parsing say_hello.cpp twice. You can avoid that by using the precompiled form, say_hello.pcm, as a compilation input instead:

clang++ -std=c++20 -fprebuilt-module-path=. say_hello.pcm main.cpp

However, this is all based on a model where the PCM file contains a complete description of the input .cppm file, which is not a great model for us to use moving forward due to all the extra stuff ending up in the .pcm file. Currently, Clang lacks two important features here:

  1. Produce a .pcm file and a .o file from a single compilation action.
  2. Produce a .pcm file that contains only the information needed for an importer, not a complete description of the input.

We will of course need some command-line support for those features, and being compatible with GCC (which already provides these features) would likely make sense.

As for building and using modules in a single clang command, I agree that’d be nice to have, both for convenience and for GCC compatibility. But ideally this shouldn’t depend on what order the files are specified in on the command line, which would require some kind of pre-scanning to find which modules are defined in which files so they can be processed in topological order. (Otherwise, specifying the files in the wrong order would presumably result in stale .pcm files getting used, which would seem quite user-hostile. I don’t know if that’s what you get from GCC or if it does better somehow.) That kind of prescan might be more complexity than we’d want in the compiler driver, though we can discuss that and figure out where we want to draw that line.

In any case, I’m hoping we get some clear guidance from SG15 that we can follow.

Hi all,

Recently I am playing with C++20 modules and I found that the command line support of GCC
is much better than Clang. Here is an example:

// say_hello.cpp
module;
#include <iostream>
#include <string_view>
export module Hello;
export void SayHello
(std::string_view const &name)
{
std::cout << "Hello " << name << "!\n";
}
// main.cpp
#include <string_view>
import Hello;
int main() {
SayHello("world");
return 0;
}

To compile the example, in gcc we need:

g++ -std=c++20 -fmodules-ts say_hello.cpp main.cpp

And in clang, we need:


clang++ -std=c++20 -fmodules-ts -Xclang -emit-module-interface -c say_hello.cpp -o Hello.pcm

clang++ -std=c++20 -fmodules-ts -fprebuilt-module-path=. main.cpp say_hello.cpp

Your point is well-taken. However, some part of the extra work required here is that you’re not doing things in the expected way.

The above is not a correct way to enable C++20 modules in Clang: -fmodules-ts enables the old Modules TS mode, not C++20 modules. -std=c++20 is enough to enable C++20 modules.

For the ‘-Xclang -emit-module-interface’ portion, what Clang expects is that files that define module interfaces are either named .cppm or are specified with -x c+±module. With that file type, you can use --precompile to produce a .pcm file (just like you’d use -E or -c to produce other kinds of output). For example:

clang++ -std=c++20 say_hello.cppm --precompile -o Hello.pcm

The above commands are also parsing say_hello.cpp twice. You can avoid that by using the precompiled form, say_hello.pcm, as a compilation input instead:

clang++ -std=c++20 -fprebuilt-module-path=. say_hello.pcm main.cpp

However, this is all based on a model where the PCM file contains a complete description of the input .cppm file, which is not a great model for us to use moving forward due to all the extra stuff ending up in the .pcm file. Currently, Clang lacks two important features here:

  1. Produce a .pcm file and a .o file from a single compilation action.
  2. Produce a .pcm file that contains only the information needed for an importer, not a complete description of the input.

Ah, that’s good to know - didn’t know you were inclined/supportive of this direction (as the only way to build a module - or some mode that’d do it as two-step too?) - one of the previous counterarguments was that producing the .pcm without the .o unblocked consumers sooner/let the .o generation be done in parallel with those consumers. Is that generally known/considered to be too small of a benefit to be worth the build/support complexity compared to the minimal-pcm+.o in-one-go mode & its benefits (smaller .pcms)?

Hi all,

Recently I am playing with C++20 modules and I found that the command line support of GCC
is much better than Clang. Here is an example:

// say_hello.cpp
module;
#include <iostream>
#include <string_view>
export module Hello;
export void SayHello
(std::string_view const &name)
{
std::cout << "Hello " << name << "!\n";
}
// main.cpp
#include <string_view>
import Hello;
int main() {
SayHello("world");
return 0;
}

To compile the example, in gcc we need:

g++ -std=c++20 -fmodules-ts say_hello.cpp main.cpp

And in clang, we need:


clang++ -std=c++20 -fmodules-ts -Xclang -emit-module-interface -c say_hello.cpp -o Hello.pcm

clang++ -std=c++20 -fmodules-ts -fprebuilt-module-path=. main.cpp say_hello.cpp

Your point is well-taken. However, some part of the extra work required here is that you’re not doing things in the expected way.

The above is not a correct way to enable C++20 modules in Clang: -fmodules-ts enables the old Modules TS mode, not C++20 modules. -std=c++20 is enough to enable C++20 modules.

For the ‘-Xclang -emit-module-interface’ portion, what Clang expects is that files that define module interfaces are either named .cppm or are specified with -x c+±module. With that file type, you can use --precompile to produce a .pcm file (just like you’d use -E or -c to produce other kinds of output). For example:

clang++ -std=c++20 say_hello.cppm --precompile -o Hello.pcm

The above commands are also parsing say_hello.cpp twice. You can avoid that by using the precompiled form, say_hello.pcm, as a compilation input instead:

clang++ -std=c++20 -fprebuilt-module-path=. say_hello.pcm main.cpp

However, this is all based on a model where the PCM file contains a complete description of the input .cppm file, which is not a great model for us to use moving forward due to all the extra stuff ending up in the .pcm file. Currently, Clang lacks two important features here:

  1. Produce a .pcm file and a .o file from a single compilation action.
  2. Produce a .pcm file that contains only the information needed for an importer, not a complete description of the input.

Ah, that’s good to know - didn’t know you were inclined/supportive of this direction (as the only way to build a module - or some mode that’d do it as two-step too?) - one of the previous counterarguments was that producing the .pcm without the .o unblocked consumers sooner/let the .o generation be done in parallel with those consumers. Is that generally known/considered to be too small of a benefit to be worth the build/support complexity compared to the minimal-pcm+.o in-one-go mode & its benefits (smaller .pcms)?

I think it’s likely there’ll be reasonable build strategies that want to build a minimal PCM and a .o file with two separate actions (to maximize throughput in highly parallel builds), and there’ll be reasonable build strategies that want to build them as part of the same action (to minimize total time in a build with less parallelism). I expect people will want both options to be available. The option that we currently provide – producing a PCM file that can be used as an input to both .o generation and for import – is probably not well aligned with what most build strategies will want.

Hi all,

Recently I am playing with C++20 modules and I found that the command line support of GCC
is much better than Clang. Here is an example:

// say_hello.cpp
module;
#include <iostream>
#include <string_view>
export module Hello;
export void SayHello
(std::string_view const &name)
{
std::cout << "Hello " << name << "!\n";
}
// main.cpp
#include <string_view>
import Hello;
int main() {
SayHello("world");
return 0;
}

To compile the example, in gcc we need:

g++ -std=c++20 -fmodules-ts say_hello.cpp main.cpp

And in clang, we need:


clang++ -std=c++20 -fmodules-ts -Xclang -emit-module-interface -c say_hello.cpp -o Hello.pcm

clang++ -std=c++20 -fmodules-ts -fprebuilt-module-path=. main.cpp say_hello.cpp

Your point is well-taken. However, some part of the extra work required here is that you’re not doing things in the expected way.

The above is not a correct way to enable C++20 modules in Clang: -fmodules-ts enables the old Modules TS mode, not C++20 modules. -std=c++20 is enough to enable C++20 modules.

For the ‘-Xclang -emit-module-interface’ portion, what Clang expects is that files that define module interfaces are either named .cppm or are specified with -x c+±module. With that file type, you can use --precompile to produce a .pcm file (just like you’d use -E or -c to produce other kinds of output). For example:

clang++ -std=c++20 say_hello.cppm --precompile -o Hello.pcm

The above commands are also parsing say_hello.cpp twice. You can avoid that by using the precompiled form, say_hello.pcm, as a compilation input instead:

clang++ -std=c++20 -fprebuilt-module-path=. say_hello.pcm main.cpp

However, this is all based on a model where the PCM file contains a complete description of the input .cppm file, which is not a great model for us to use moving forward due to all the extra stuff ending up in the .pcm file. Currently, Clang lacks two important features here:

  1. Produce a .pcm file and a .o file from a single compilation action.
  2. Produce a .pcm file that contains only the information needed for an importer, not a complete description of the input.

Ah, that’s good to know - didn’t know you were inclined/supportive of this direction (as the only way to build a module - or some mode that’d do it as two-step too?) - one of the previous counterarguments was that producing the .pcm without the .o unblocked consumers sooner/let the .o generation be done in parallel with those consumers. Is that generally known/considered to be too small of a benefit to be worth the build/support complexity compared to the minimal-pcm+.o in-one-go mode & its benefits (smaller .pcms)?

I think it’s likely there’ll be reasonable build strategies that want to build a minimal PCM and a .o file with two separate actions (to maximize throughput in highly parallel builds), and there’ll be reasonable build strategies that want to build them as part of the same action (to minimize total time in a build with less parallelism). I expect people will want both options to be available. The option that we currently provide – producing a PCM file that can be used as an input to both .o generation and for import – is probably not well aligned with what most build strategies will want.

Reckon there just aren’t enough savings in reusing the PCM for .o generation compared to parsing from scratch? Not enough to justify adding an extra intermediate file (a full pcm that gets consumed for .o generation and a slim pcm that gets consumed by uses). That we’ll move away from complete pcms entirely to only minimal pcms? Fair enough. Good to know/think about.

The disuccsion is really helpful!

I think it may be implementable to support two strategy in clang:

(1) Produce .pcm and .o in a single compilation.
(2) Produce .pcm and .o separetely.
At least the first choice may be friendly to the beginners. From my point of view, no matter what the conclusion of SG15 is, it would be better to compile a hello world example in one line.

So the needs for C++20 module that I summarized from the thread now includes:
(a) Offer a legtimate option to the users instead of -Xclang option.
(b) Offer a strategy to produce .pcm and .o in a single compilation.
(c) Make the compilation results independent from the order of the input line.
(d) Reduce the .pcm.

(a) should be easiest and we don’t discuss it more. (b) (c) (d) needs further discussions about should/how we do it.
If every agree on the needs list, I would like to open 4 issues in bugzilla. I think it would be a better place to manage the needs.

Thanks,
Chuanqi

The disuccsion is really helpful!

I think it may be implementable to support two strategy in clang:

(1) Produce .pcm and .o in a single compilation.
(2) Produce .pcm and .o separetely.
At least the first choice may be friendly to the beginners. From my point of view, no matter what the conclusion of SG15 is, it would be better to compile a hello world example in one line.

So the needs for C++20 module that I summarized from the thread now includes:
(a) Offer a legtimate option to the users instead of -Xclang option.

This is already possible with --precompile.

(b) Offer a strategy to produce .pcm and .o in a single compilation.
(c) Make the compilation results independent from the order of the input line.
(d) Reduce the .pcm.

(a) should be easiest and we don’t discuss it more. (b) (c) (d) needs further discussions about should/how we do it.
If every agree on the needs list, I would like to open 4 issues in bugzilla. I think it would be a better place to manage the needs.

Opening bugs for (b) (c) (d) sounds reasonable at least.

I had opened bugs 52340(https://bugs.llvm.org/show_bug.cgi?id=52340), 52341 (https://bugs.llvm.org/show_bug.cgi?id=52341) and 52342 (https://bugs.llvm.org/show_bug.cgi?id=52342) for (b) (c) (d) relatively.

Thanks,
Chuanqi