yet another question about modules

Hi,

sorry if it is annoying, yet another email about the topic, but I am working hard to understand that whatabouts of the modules. Promise hereby to give further the wisdom once gathered :wink:

Tried to joggle with modules again today, to see if it can improve the build times of a given project. Before going to that project I just tried to build simple C++ files with few template instantiations. The first conclusion is that using the modules, build times increase instead of decreasing.

I took a very trivial use case.

#include
#include
using namespace std;
typedef map<string, string> map0;
typedef map<map0, map0> map1;


//typedef map<map"i-1", map"i-1"> map"i";

int main () {

map0 m0;
map0 n0;
m0=n0;
map1 m1;
map1 n1;
m1=n1;
map2 m2;
map2 n2;

m2=n2;



}

for “one level” instantiation, without modules, clang seems to be a little faster than gcc, but for i > 3. With modules compilation is much longer.

Can anybody please help to understand if modules really help to speed up compilation and give an idea of how?

Though unrelated an even more discouraging result is that for i > 4 clang takes an eternity. For i=9, I stopped the process after 13 minutes
I have 16GB on my machine (8 cores AMD), and the compiler does not seem to use more than 2GB of memory (from /proc/…/status: VmPeak: 2033548 kB).

for i=6 I get

g++: 0,05s system 99% cpu 0,375 total
clang++ (both from dev branch and 3.5) 11,81s user 0,06s system 99% cpu 11,908 total

this means factor 30? Is clang scaling so badly?

Please do not feel offended about this report.

Anyway, the reason of this email is about understanding how much modules can help. Let’s say that I have the situation with this (i=6) and assuming -std=c++11 can I assume that modules will bring compilation times to something reasonable?

Hi,

sorry if it is annoying, yet another email about the topic, but I am working
hard to understand that whatabouts of the modules. Promise hereby to give
further the wisdom once gathered :wink:

Tried to joggle with modules again today, to see if it can improve the build
times of a given project. Before going to that project I just tried to build
simple C++ files with few template instantiations. The first conclusion is
that using the modules, build times increase instead of decreasing.

I took a very trivial use case.

#include <map>
#include <string>
using namespace std;
typedef map<string, string> map0;
typedef map<map0, map0> map1;
......
.......
//typedef map<map"i-1", map"i-1"> map"i";

int main () {

map0 m0;
map0 n0;
m0=n0;
map1 m1;
map1 n1;
m1=n1;
map2 m2;
map2 n2;
m2=n2;
......
......
.....
}

I'm not sure what you're trying to investigate here. You've forced the
majority of the compile time to be used while processing the .cpp file
rather than its includes, so modules is not going to help you much
here.

for "one level" instantiation, without modules, clang seems to be a little
faster than gcc, but for i > 3. With modules compilation is much longer.

Including or excluding the one-time cost of building the module for
your standard library? I'm seeing a speedup in this testcase by
enabling modules (with an appropriately-modularized standard library);
the proportion speedup decreases as i increases, as expected, since
more of the time is outside the headers.

Can anybody please help to understand if modules really help to speed up
compilation and give an idea of how?

Though unrelated an even more discouraging result is that for i > 4 clang
takes an eternity. For i=9, I stopped the process after 13 minutes
I have 16GB on my machine (8 cores AMD), and the compiler does not seem to
use more than 2GB of memory (from /proc/.../status: VmPeak: 2033548 kB).

This is more interesting. I'd imagine that GCC is avoiding some of the
exponential-time costs here; in particular, its overload resolution
short-circuits template argument deduction if it gets an exact match
from a non-template candidate, and that's probably happening in the
map assignments.

for i=6 I get

g++: 0,05s system 99% cpu 0,375 total
clang++ (both from dev branch and 3.5) 11,81s user 0,06s system 99% cpu
11,908 total
this means factor 30? Is clang scaling so badly?
Please do not feel offended about this report.

Not at all, thanks for reporting your findings.

Anyway, the reason of this email is about understanding how much modules can
help. Let's say that I have the situation with this (i=6) and assuming
-std=c++11 can I assume that modules will bring compilation times to
something reasonable?

No; modules is intended to reduce the costs of including headers. It
does not reduce costs associated with template instantiations that are
performed locally to your .cpp file. (If you instantiated map0, map1,
and so on in a header, then it would help.)

In principle, we could extend the modules system with a template
instantiation repository to cache the results of instantiating
templates from modules, but I don't think anyone is working on, or
planning, such a system for Clang at the moment.

I'm not sure what you're trying to investigate here. You've forced the
majority of the compile time to be used while processing the .cpp file
rather than its includes, so modules is not going to help you much
here.

What I try to investigate: if there is a way to fool the modules cache to
cache template instantiation. I understand that the modules system is about
taking the ast of the headers and saving it to a file. During compilation
of a unit the compiler glues that AST to the ast of the compilation unit.
Again intuitively if I have template instantiation in the cache, that
should be just "glued". For me if I have a

map8 a1;
map8 a2;
.....
.....

one hundreds of times, compilation should be almost equally fast as i would
have only once, and should be equally fast if the instantiation is found in
the modules cache.

But, whatoever hard I try to cache both libcxx and my header, no
improvement. The cache is created, I can walk through the ast inside the
cache, but no speedup.

For instance I created a simple header:
#include <string>
#include <map>

using namespace std;

typedef map<string, string> mymap0;
typedef map<mymap0, mymap0> mymap1;
typedef map<mymap1, mymap1> mymap2;
typedef map<mymap2, mymap2> mymap3;
typedef map<mymap3, mymap3> mymap4;
typedef map<mymap4, mymap4> mymap5;
typedef map<mymap5, mymap5> mymap6;
typedef map<mymap6, mymap6> mymap7;
typedef map<mymap7, mymap7> mymap8;

mymap0 m0;
mymap1 m1;
mymap2 m2;
mymap3 m3;
mymap4 m4;
mymap5 m5;
mymap6 m6;

Hi,

BTW: is there any other trick to force instantiation, without generating any text?

I guess the main C++11 trick for this is to use “extern templates”. I’ve used extern templates quite a bit to cut down on compilation times (and memory usage) of template-heavy code, but its impact is limited by the fact that building the AST (within the instantiation) is still a pretty big deal. For example, I had some code that needed about 6GB and lots of time to compile (with GCC), and by putting some of the main template instantiations into a separate cpp file, I would end up with about 4GB to compile each of the two separate cpp files, in other words, 6GB turned into 4GB + 4GB, roughly, and about the same for compilation time, meaning that this was a net loss in overall compilation costs, but with benefits like actually being able to compile it (without overwhelming the system) and reducing the need for recompilations (e.g., recompiling only one of the cpp files, instead of everything). BTW, with Clang, this is significantly faster to compile compared to GCC (1/2 of the time, and 1/3 of the memory, approximately).

But the point is that it seems to me (I might be wrong) that even with an existing instantiation (extern or not), the compilation time/memory needed to deal with it is still significant.

I have one related question to the clang devs:
When you have an extern template declaration in a header file that is part of a module, is the complete AST (or whatever else) of that template instantiation included in the module?

Hi,

BTW: is there any other trick to force instantiation, without generating
any text?

I guess the main C++11 trick for this is to use "extern templates".

FYI, the name for this feature is "explicit instantiation
declarations". "extern templates" is the name of the pre-standard GCC
extension.

I've
used extern templates quite a bit to cut down on compilation times (and
memory usage) of template-heavy code, but its impact is limited by the fact
that building the AST (within the instantiation) is still a pretty big deal.
For example, I had some code that needed about 6GB and lots of time to
compile (with GCC), and by putting some of the main template instantiations
into a separate cpp file, I would end up with about 4GB to compile each of
the two separate cpp files, in other words, 6GB turned into 4GB + 4GB,
roughly, and about the same for compilation time, meaning that this was a
net loss in overall compilation costs, but with benefits like actually being
able to compile it (without overwhelming the system) and reducing the need
for recompilations (e.g., recompiling only one of the cpp files, instead of
everything). BTW, with Clang, this is significantly faster to compile
compared to GCC (1/2 of the time, and 1/3 of the memory, approximately).

But the point is that it seems to me (I might be wrong) that even with an
existing instantiation (extern or not), the compilation time/memory needed
to deal with it is still significant.

I have one related question to the clang devs:
When you have an extern template declaration in a header file that is part
of a module, is the complete AST (or whatever else) of that template
instantiation included in the module?

Whenever a module triggers the instantiation of a template for any
reason, that instantiation is stored in the module and will be reused
if a user of that module needs it.

In principle, we could extend the modules system with a template
instantiation repository to cache the results of instantiating
templates from modules, but I don't think anyone is working on, or
planning, such a system for Clang at the moment.

That sounds to me like automatically making all template instantiations
"extern", right?

No, that wouldn't make any difference -- explicit instantiations in a
user of the module still currently get performed each time that module
user is compiled. The idea here is: if a user of a module triggers a
template instantiation, we'd first check to see if that instantiation
is cached in some external template instantiation cache stored
alongside the module cache, and if not, we'd add it to that cache.
This would likely only be possible if the instantiation only depends
on entities imported from modules (as is the case here), and not if a
template argument refers to a local type.

(This is pretty similar to what we'd do to support exported templates,
as it happens...)

This could actually be pretty awesome. But I'm worried
about the implications for the standard rules (it looks to me like it could
be allowed). That could be part of the rules for modules, that any
instantiation encountered for a template declared/defined inside a module
would implicitly be considered "extern" and cached somewhere alongside the
module's cache.

There is a conformance impact here: we would say that the
instantiation of a template would be performed in a context where only
the "associated modules" are visible (that is, those modules with
which the template-name and template-arguments are associated). That
has the potential to break some legitimate (but probably questionable)
code.

from my experience, it was rather the template instantiation that was
causing long build times and not the preprocessing.

That's right. Last time I ran my code through GCC with compilation profiling
enabled, it showed that template instantiations accounted for about 98% of
the compilation time. For template-heavy code, pre-processing, parsing, AST
building and all the "normal" compilation stuff is negligible. And for code
that isn't template-heavy, compilation times are rarely a big problem.

OK, that might be true in some situations, but "in main source file"
vs "in a header file", and "in template instantiation" vs "in code
written by user" are orthogonal axes, so this doesn't really prove
anything. If you're doing template-heavy things in your header files,
modules should help. If you're doing template-heavy things in your
.cpp file, modules won't help so much; that's not a problem they aim
to solve.