[Modules TS] feedback

Hi,

I am working on C++ modules support in the build2 build system[1]. As
part of that I have tested latest Clang (r302560 from apt.llvm.org to
be more precise) with a barely-realistic example (module interface
unit, module implementation unit, and an importing test; compilation
command lines are in 'cmd'):

http://codesynthesis.com/~boris/tmp/libhello-clang.tar.gz

Here are some notes/issues:

1. When compiling the module implementation unit, -fprebuilt-module-path
   does not work. Instead one has to use -fmodule-file.

   The failure mode is also interesting: there is no diagnostics about
   the module not being found but rather about undeclared module entities.

2. I believe you are aware of this: if a non-inline function is defined
   in a module interface unit, things end up badly (duplicate symbols).

   If the plan is to also generate an object file as part of module
   interface compilation (the way both VC and GCC currently do it),
   then consider supporting separate compilation mode for these two.
   The reason is this: the .pcm file has to be generated before any
   (a) module importing or (b) module implementation units can be
   compiled. So waiting for the .o file to be produced will hinder
   parallelism.

3. When compiling a test that imports a module using -fprebuilt-module-path
   a bogus warning is issued:

   clang: warning: argument unused during compilation: '-fprebuilt-module-path=.' [-Wunused-command-line-argument]

[1] https://build2.org

Thanks,
Boris

Hi,

I am working on C++ modules support in the build2 build system[1]. As
part of that I have tested latest Clang (r302560 from apt.llvm.org to
be more precise) with a barely-realistic example (module interface
unit, module implementation unit, and an importing test; compilation
command lines are in ‘cmd’):

http://codesynthesis.com/~boris/tmp/libhello-clang.tar.gz

Here are some notes/issues:

  1. When compiling the module implementation unit, -fprebuilt-module-path
    does not work. Instead one has to use -fmodule-file.

The failure mode is also interesting: there is no diagnostics about
the module not being found but rather about undeclared module entities.

  1. I believe you are aware of this: if a non-inline function is defined
    in a module interface unit, things end up badly (duplicate symbols).

If the plan is to also generate an object file as part of module
interface compilation (the way both VC and GCC currently do it),
then consider supporting separate compilation mode for these two.
The reason is this: the .pcm file has to be generated before any
(a) module importing or (b) module implementation units can be
compiled. So waiting for the .o file to be produced will hinder
parallelism.

Agreed, and actually, Clang already supports generating object files for modules and the only way it supports it right now is as you’ve suggested - as a separate build step that takes the pcm file and generates an object, for the reasons you’ve mentioned.

I implemented it with the “legacy modules” scenario in mind, but I believe it also works for TS modules, probably.

Passing -fmodules-codegen and -fmodules-debuginfo (I think I’m remembering the spelling right) to the step that creates the pcm file will establish the right conditions, then passing that pcm file to clang again -c, should generate the required object file.

This hasn’t been widely deployed, but I have tested it internally on large programs using explicit legacy modules and produced objects that successfully link and are substantially (~10% less .text in input objects, maybe 30% less debug info) smaller (at least at -O0 - higher optimization levels make the object code part less useful/outright harmful to object size (perhaps a lesser mode could be implemented for TS modules correctness, where only non-inline functions are tied to the modular object) but the debug info remains useful). Let me know if you want/need further details and I’d be glad to hear about any results/comments/patches you have to the functionality :slight_smile:

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

Passing -fmodules-codegen and -fmodules-debuginfo (I think I'm remembering
the spelling right) to the step that creates the pcm file will establish
the right conditions, then passing that pcm file to clang again -c, should
generate the required object file.

While I believe the spelling is correct, it still does not work:

clang++-5.0 -std=c++1z -fmodules-ts -fmodules-codegen -fmodules-debuginfo \
  -I.. --precompile -x c++-module -o hello.pcm hello.mxx

clang: error: unknown argument: '-fmodules-codegen'
clang: error: unknown argument: '-fmodules-debuginfo'

Also note that this is not an unresolved symbol but a duplicate:

clang++-5.0 -std=c++1z -fmodules-ts -I.. --precompile -x c++-module -o hello.pcm hello.mxx
clang++-5.0 -std=c++1z -fmodules-ts -I.. -fmodule-file=hello.pcm -c hello.cxx
clang++-5.0 -std=c++1z -fmodules-ts -I.. -fprebuilt-module-path=. -c driver.cxx
clang++-5.0 -std=c++1z -fmodules-ts -I.. -o driver driver.o hello.o
hello.o: In function `hello::non_inline()':
hello.cxx:(.text+0x0): multiple definition of `hello::non_inline()'
driver.o:driver.cxx:(.text+0x0): first defined here

Thanks,
Boris

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

Passing -fmodules-codegen and -fmodules-debuginfo (I think I’m remembering
the spelling right) to the step that creates the pcm file will establish
the right conditions, then passing that pcm file to clang again -c, should
generate the required object file.

While I believe the spelling is correct, it still does not work:

clang+±5.0 -std=c++1z -fmodules-ts -fmodules-codegen -fmodules-debuginfo
-I… --precompile -x c+±module -o hello.pcm hello.mxx

clang: error: unknown argument: ‘-fmodules-codegen’
clang: error: unknown argument: ‘-fmodules-debuginfo’

Right, sorry - these flags aren’t exposed through the driver yet, so for now you can pass them like:

-Xclang -fmodules-codegen
-Xclang -fmodules-debuginfo

Also note that this is not an unresolved symbol but a duplicate:

It should work naturally either way, I think.

Here we go:

foo.cppm:
export module foo;
export {
int f() { return 0; }
}
use1.cpp:
import foo;
void use1() { f(); }
use2.cpp:
import foo;
void use1();
int main() { f(); use1(); }

$ clang+±tot -std=c++1z -fmodules-ts foo.cppm --precompile -o foo.pcm -Xclang -fmodules-codegen
$ clang+±tot -std=c++1z -fmodules-ts foo.pcm -c
$ clang+±tot -std=c++1z -fmodules-ts -fmodule-file=foo.pcm use1.cpp use2.cpp -c
$ nm {foo,use1,use2}.o
foo.o:
0000000000000000 T _Z1fv
use1.o:
0000000000000000 T main
U _Z1fv
U _Z4use2v
use2.o:
U _Z1fv
0000000000000000 T _Z4use2v
$ clang+±tot {foo,use1,use2}.o
$ ./a.out
$ clang+±tot -std=c++1z -fmodules-ts foo.cppm --precompile -o foo.pcm
$ clang+±tot -std=c++1z -fmodules-ts -fmodule-file=foo.pcm use1.cpp use2.cpp -c
$ nm {use1,use2}.o
use1.o:
0000000000000010 T main
0000000000000000 T _Z1fv
U _Z4use2v
use2.o:
0000000000000000 T _Z1fv
0000000000000010 T _Z4use2v
$ clang+±tot use1.o use2.o
use2.o: In function f()': use2.cpp:(.text+0x0): multiple definition of f()’
use1.o:use1.cpp:(.text+0x0): first defined here

I suppose -fmodules-codegen will be the default for -fmodules-ts sooner or later. (-fmodules-debuginfo is useful too, but you won’t see the correctness issue, just debug info size changes)

It may be that two degrees of -fmodules-codegen would be useful - one that only homes the non-inline functions (external functions become external in the module’s object, available_externally in module users objects), and the one we have here that homes all functions (even inline ones, so linkonce_odr functions become weak in the module’s object and available_externally in module users objects).

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

  -Xclang -fmodules-codegen
  -Xclang -fmodules-debuginfo

[...]

It should work naturally either way, I think.

Yep, all works now, thanks.

Somewhat related question: what happens if I pass -fPIC? I assume I
will need different .o for PIC/non-PIC. Will I also need different
.pcm files?

I tend to think the answer is yes since at the minimum -fPIC may
define preprocessor macros that may change the translation unit.

I guess what I am trying to understand is if .pcm is just the AST
or if it actually includes some object/intermediate code? For example,
does it make sense to pass -O when generating .pcm? Sorry if these
are stupid/obvious questions ;-).

Thanks,
Boris

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

-Xclang -fmodules-codegen
-Xclang -fmodules-debuginfo

[…]

It should work naturally either way, I think.

Yep, all works now, thanks.

Somewhat related question: what happens if I pass -fPIC? I assume I
will need different .o for PIC/non-PIC. Will I also need different
.pcm files?

I tend to think the answer is yes since at the minimum -fPIC may
define preprocessor macros that may change the translation unit.

If a flag changes things like preprocessor macros then it would need to be consistently passed to all stages (including the first one - that generates the .pcm). I know Clang has some enforcement of matching flags between PCM generation steps and uses (& hopefully that also triggers on the PCM->Object step too, though I haven’t checked).

I’m not sure how lenient those checks are (whether they have many false positives (flagging mismatched flags that are benign/can be composed without conflicting) or false negatives (allowing mismatched flags that are incompatible))

Richard might be able to say more about that.

I think -O flags also generate preprocessor defines, so I expect they would be checked/enforced by this too, and not possible to use a PCM built with a different -O level than its use.

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

> -Xclang -fmodules-codegen
> -Xclang -fmodules-debuginfo
>
> [...]
>
> It should work naturally either way, I think.

Yep, all works now, thanks.

Somewhat related question: what happens if I pass -fPIC? I assume I
will need different .o for PIC/non-PIC. Will I also need different
.pcm files?

I tend to think the answer is yes since at the minimum -fPIC may
define preprocessor macros that may change the translation unit.

If a flag changes things like preprocessor macros then it would need to be
consistently passed to all stages (including the first one - that generates
the .pcm). I know Clang has some enforcement of matching flags between PCM
generation steps and uses (& hopefully that also triggers on the
PCM->Object step too, though I haven't checked).

I'm not sure how lenient those checks are (whether they have many false
positives (flagging mismatched flags that are benign/can be composed
without conflicting) or false negatives (allowing mismatched flags that are
incompatible))

Richard might be able to say more about that.

We have a bunch of flags whitelisted that are permitted to change between
module build and use. We permit flags to change between module build and
use if the flag change does not affect interoperability of the pcm file,
even though in some cases the resulting combination doesn't make a lot of
sense. For example, flags that affect predefined macros (__OPTIMIZE__,
__PIC__, ...) *are* permitted to vary between module build and use, with
each side seeing the value of the macro as it was defined for that
compilation, and flags that affect minor details of the language mode are
also permitted to vary.

The whitelist is likely incomplete (there are probably flags for which it
would be useful and reasonable for them to differ between module build and
use, but where we disallow them differing), and as noted above allows some
combinations that work (we'll do what you asked us to do) but might not
make a lot of sense.

I think -O flags also generate preprocessor defines, so I expect they would

be checked/enforced by this too, and not possible to use a PCM built with a
different -O level than its use.

I guess what I am trying to understand is if .pcm is just the AST
or if it actually includes some object/intermediate code? For example,
does it make sense to pass -O when generating .pcm? Sorry if these
are stupid/obvious questions ;-).

In our default configuration, the .pcm file is just the AST (at least for
now). We also have a mode in which the .pcm file is actually a .o file that
contains the AST as well as debug information, and there's been some
discussion of including other things in it too, for example LLVM IR for
inline functions (to speed up optimized compilation of module users).

Even now, it makes sense to pass those flags, and as David says, the
easiest thing to do is to pass the same flags you use for source
compilations to module compilations. As an example of a case where this
makes a difference today, the glibc headers will sometimes provide
different definitions of C standard library functions based on whether the
__OPTIMIZE__ macro is defined, and if you want the module build to take
advantage of that, you need to make sure you pass -O<n> to the module
compilation.

The intent is that the same set of flags would typically be passed to a
compilation building the module interface for a particular module as would
be passed to a compilation building the module interface. That way, the
user can decide on a per-module basis how they want the compiler to act
(including things like which warnings they want enabled for their module,
and perhaps some details of the language it's written in).

Hi Richard,

Richard Smith <richard@metafoo.co.uk> writes:

The intent is that the same set of flags would typically be passed to a
compilation building the module interface for a particular module as would
be passed to a compilation building the module interface.

Should the second "module interface" be "module user (importer)"?

That way, the user can decide on a per-module basis how they want the
compiler to act (including things like which warnings they want enabled
for their module, and perhaps some details of the language it's written in).

Interesting. So this implies that the module binary interface (.pcm) is
a user-side (or per-user) thing, not module-side?

In other words, if I have library libx that implements modx and I have
two programs, exe1 and exe2, that use/link this module/library, then,
I may need to generate two modx.pcm versions, one for each exe if they
are built with different-enough flags (actually, three: +1 for libx's
module implementation units).

This makes sense if you start thinking about inline functions, warnings,
etc. But definitely not how I (and I bet a lot of other people) think
about it.

I also wonder how this model applies to the standard library? Will each
project that consumes it as a module have to compile its module interface?

Thanks,
Boris

Hi Richard,

Richard Smith <richard@metafoo.co.uk> writes:

> The intent is that the same set of flags would typically be passed to a
> compilation building the module interface for a particular module as
would
> be passed to a compilation building the module interface.

Should the second "module interface" be "module user (importer)"?

No, it should be "module implementation", sorry. That is, you can have
somewhat different flags between module interface and module user, but the
entire module (both interface and implementation) would typically be
expected to be built with a similar set of flags, determined principally by
the module author. That's not required, though, and if a build system wants
to set things up differently, it can.

That way, the user can decide on a per-module basis how they want the
> compiler to act (including things like which warnings they want enabled
> for their module, and perhaps some details of the language it's written
in).

Interesting. So this implies that the module binary interface (.pcm) is
a user-side (or per-user) thing, not module-side?

In other words, if I have library libx that implements modx and I have
two programs, exe1 and exe2, that use/link this module/library, then,
I may need to generate two modx.pcm versions, one for each exe if they
are built with different-enough flags (actually, three: +1 for libx's
module implementation units).

If the above are all part of the same build system, building a project with
largely the same configuration throughout, I would expect the same libx.pcm
would be used in all three cases. But yes, if the flags are different
enough (perhaps one compilation is C++1z, another is Objective-C++11, and a
third one is targeting a different architecture), then the .pcm file can't
be reused, so you would need to rebuild it for each sufficiently-different
user. (Although in those cases, you'd often want different object files for
the module implementation, too.)

This makes sense if you start thinking about inline functions, warnings,

etc. But definitely not how I (and I bet a lot of other people) think
about it.

I also wonder how this model applies to the standard library? Will each
project that consumes it as a module have to compile its module interface?

Typically yes (but only once).

Richard Smith <richard@metafoo.co.uk> writes:

Richard Smith <richard@metafoo.co.uk> writes:

>
> > I also wonder how this model applies to the standard library? Will each
> > project that consumes it as a module have to compile its module
interface?
>
> Typically yes (but only once).

Is anyone working on defining -fmodules-ts -compatible interface units for
libc++?

Libc++ ships with a module.modulemap, so it should work out of the box with
Clang modules.

However I think you're asking a different question; Can you clarify?

Hi Eric,

Eric Fiselier <eric@efcs.ca> writes:

Libc++ ships with a module.modulemap, so it should work out of the box with
Clang modules.

I may be wrong, but I don't think "old" modules support (-fmodules) is
compatible with "new" (-fmodules-ts).

Even if it is, using a Clang-specific .modulemap file is not ideal in
my case. I would much prefer to have "standard" (as in Modules TS) module
interface units.

A bit more background: I am adding C++ modules support to a build system
(build2) and trying to figure out how everything is going to fit together.
Based on the discussion in this thread it's clear that the build system
will need to pre-compile module interface units on the consumer side
even for the standard library (because of different compilation options
which may affect the result; but see earlier emails for details).

Thanks,
Boris

Hi Eric,

Eric Fiselier <eric@efcs.ca> writes:

> Libc++ ships with a module.modulemap, so it should work out of the box
with
> Clang modules.

I may be wrong, but I don't think "old" modules support (-fmodules) is
compatible with "new" (-fmodules-ts).

Ah, sorry. My mistake.

They are compatible, and designed and intended to be used together.

For this, you’ll probably need to wait for the modules TS to specify how the standard library is supposed to be modularized. (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0581r0.pdf contains a suggestion of how this might look, but it has undergone no committee review so far).

Richard Smith <richard@metafoo.co.uk> writes:

They are compatible, and designed and intended to be used together.

Goog to know, thanks.

For this, you'll probably need to wait for the modules TS to specify how
the standard library is supposed to be modularized. (
http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2017/p0581r0.pdf
contains a suggestion of how this might look, but it has undergone no
committee review so far).

Yes, I was thinking along the lines of that proposal. Microsoft has
shipped a preview of the standard library modularization in VC15, so
some have started playing with this.

Their approach, however, is different compared to what we discussed
in this thread in that they shipped precompiled standard library
modules (for a few runtimes) which the compiler then selects auto-
magically based on the IFCPATH environment variable. So in their
case the standard library modules are special and handled in an ad
hoc manner.

Boris