Pre-compiled Modules and (Failing) Optimization of Build Time Requirements

I have been using Clang modules (defined in modulemap) for a C++ project. I have about 350 modules involved in the composition of a particular set of tests.

Building these tests from scratch takes about 800-850 seconds. I have noticed, however, that a simple change to the test code (which imports the modules but is not part of any module itself), the build time does not decrease. Particularly strange, none of the module pcm files seem to be touched during most of this time (the modules do not end up with a new creation/modification timestamp). My expectation (given that modifications occurred only to files the modules do not depend on) is that the tests should take only a very small amount of time to compile.

I had the impression that I was getting reduced build time in other dependent modules, but I don’t have any concrete proof of this and it may be wishful thinking (given the above issue).

I am building with -fprebuilt-implicit-modules, have set a -fmodules-cache-path and provided -Xclang -fdisable-module-hash. The modules are output as expected in the cache directory without hashes.

I have used ClangBuildAnalyzer to attempt to analyze the situation, but the particular example is too large and crashes before completion (leaving a 32GB json file).

Does anyone have any idea what might be going on? Or suggestions for better tracing what is going on?

Appreciate any insight!

Thanks,
Asher

For this kind of the slow incremental build I’d check what the build system is doing. I.e., is it building test .cpp files again or not. Also it would be interesting to compare the build behaviour with non-modular build.

The flags you are using aren’t the most common, so it is possible clang generates wrong dependencies .d in this case that can confuse the incremental build. But it’s just a guess about how this can go wrong, I don’t have any previous cases when it was broken.

There is only one .cpp file. Modules are produced via header files with inline class definitions.

Further investigation suggests that the slowdown is coming from re-instantiating the same templates in different non-dependent compilation units (so modules aren’t resolving the duplication until later). Not sure how to verify this, though, except to eliminate the instantiation (which is how I have proceeded).

Any suggestions as to how I would go about investigating “what the build system is doing”? Where would I be looking to examine .d files? And with what tools— I assume they aren’t simply human readable?

Appreciate the response!

Asher

To clarify my understanding (before I give you bad misleading advice :grinning:). Is building this single .cpp file takes 800-850 seconds? And when you’ve mentioned “a simple change to the test code” does it mean a change to this .cpp file?

Depends on your build system. In most cases you look at the build log and try to understand which commands take the longest and if their execution is necessary for the incremental build.

Location of .d files depends on your build setup. For example, clang itself has -MF tools/clang/lib/Serialization/CMakeFiles/obj.clangSerialization.dir/ASTReader.cpp.o.d for ASTReader.cpp. Here in the build log you’d look for the flag -MF (more information at Preprocessor Options (Using the GNU Compiler Collection (GCC))). Dependency files are human-readable and you don’t need any special tools to read them. It is useful to look at them if you think that your build system rebuilds too much but I don’t know what exactly you should be looking for.

In this case, the .cpp file has some googletest fixtures and main from libgtestmain. It imports the modules that the tests depend on.

To my understanding, this should separate the module pcm compilation units from the test fixtures in the .cpp file. Making a change to a test fixture (and not to any module code) should require only the test fixture be recompiled. Is this flawed logic?

As far as I can tell, you are right and that’s correct logic.

Now the question is what is the cost of including modules. If including those headers textually is not a bottleneck, using modules won’t help with the build times. Modules are the most efficient in reducing the build time if you are including the same big headers over and over, in multiple .cpp files. Then using modules avoids parsing these big headers multiple times. But if you have a single .cpp file, you shouldn’t see any benefits.

My understanding was that each module is its own compilation unit. That would mean that even though it is one cpp file, it has many compilation units. I would expect only changed compilation units to recompile (so none of the modules). This even seems to be true for output (no pcm changes), but it’s as if all of the work is still done with no output…

That .pcm files aren’t recompiled doesn’t mean .cpp compilation itself cannot be slow. My impression is that’s the case and that’s why modules don’t get you any noticeable speedup.