Modules increased build times

Hi,

I ported a big third-party header library to C++20 modules. I expected a decrease in compilation time, as it must only compile once per project and not per translation unit. Instead, the build times increased. -ftime-report reports that half of the time is spent in reading modules. I tried to profile it and most of the time is spent in clang::ASTReader::ReadDeclRecord(unsigned int).

Is there any possibility to reduce or speed up the declaration lookup?

Thanks!

1 Like

This seems to be related: https://github.com/llvm/llvm-project/issues/60996

Update: -ftime-trace reports that most of the time is spent in WriteAst and not in reading modules.

The issue looks not related to [Modules] Disappointing O(n^2) scaling of compile times with modules and optimizations · Issue #60996 · llvm/llvm-project · GitHub, which tells the modular build is slow within optimization. But the problem the thread talks is not related to the middle end.

According to [C++20] [Modules] Part of the time trace information may miss in one phase compilation · Issue #60555 · llvm/llvm-project · GitHub, the information recorded from -ftime-trace may miss in one-phase style (We can find the information about one phase style here: Standard C++ Modules — Clang 17.0.0git documentation), so the main time consuming part may be incorrect.

I think it would be helpful to send this to github issue with the reproducer.

Apparently, in my Cmake project, both time trace files are found in my build folder.

Here is my worst file:

First phase (precompile, generate pcm):

Second phase (build object file):

In the coming days, I will make a minimal example to reproduce this.

1 Like

Further information:

As a test I created a file in my project which looks like this:

module;

#include <iostream>


export module test;
import really_big_module;


export {
    void hello() {
        std::court << "hello\n";
    }
}

The following flame graph came out:

Note that <iostream> is only used, to inflate the AST.

Maybe it would be possible to cache parts of clang::ASTWriter::GenerateNameLookupTable and speed up the AST reading. I don’t know anything about clang internals, so I can’t say how to best fix this issue.

I also created an issue that focuses on the AST reader: https://github.com/llvm/llvm-project/issues/61064

Here the test file looks like this:

export module b;
import really_big_module;

export namespace b {
    int b() {
        return 1;
    }
}

And the resulting flame graph of this file:

As you can see a lot of time is spent in clang::ASTReader::VisitFunctionDecl, despite not using any function of really_big_module.

1 Like

Thanks! It should be pretty helpful for us to improve it.

@ChuanqiXu thank you for your time!

I tried the patch at: [C++20] [Modules] Don't load declaration eagerly for named modules · llvm/llvm-project@af86957 · GitHub. It significantly reduced the time spent in Module Load, however WriteAst still takes the same time. I was unable to create a minimal reproducer that highlights this specific issue.

As a test, I ran this file through callgrind with and without import really_big_module:

module;

#include <iostream>

export module test;
import really_big_module;


export {
    void hello() {
        std::cout << "hello\n";
    }
}

The number of calls to ASTWriter::WriteDecl, ASTDeclWriter::Visit, and DeclContext::lookup stayed nearly the same. The most significant difference is that ASTReader::FindExternalVisibleDeclsByName gets called often when using import. I noticed that the call chains while reading the AST are deep. I tried to print the Declaration it visits by modifying the AST reader, however, I only got crashes.

I created a repo that highlights this issue, note this is neither minimal nor beautiful and only a bad port to C++ Modules, but it could help to find the root cause of this issue. Please notify me if there are problems building this.

I can’t run it.

It tells me that:

./vulkan/vulkan_core.h:8186:10: fatal error: 'vk_video/vulkan_video_codec_h264std.h' file not found

It would be better if you can reduce it further.

I pushed a reduction, now it only depends on iostream.

Let’s track the issue in: [C++20] [Modules] Writing PCM will be unexpectedly slow with additional import. · Issue #61447 · llvm/llvm-project · GitHub

2 Likes

In recent weeks I tried using C++ modules on a project. It was so cool at first to be getting rid of header files. Alas, my compilation times eventually went insane. Some files were taking over a minute. It became too hard to get work done. I just spent the time to go back to headers, and compilation time is back to normal.

I’ve reported some bugs connected to modules, but on this one I don’t think I can get a “minimal” reproducer. I suspect it’s something that happens as the includes and imports start to pile up.

But I’ve glimpsed the promised land, so I eagerly await the bug fixes and further implementation… :slight_smile:

Compilation speed is also a main concern of me. If we can’t find a “minimal” reproducer, a relatively reduced reproducer like “[C++20] [Modules] Writing PCM will be unexpectedly slow with additional import. · Issue #61447 · llvm/llvm-project · GitHub” is good too. At least we need know what’s going wrong so that we can have a direction.

BTW, is there a lot “constexpr”/“consteval” in your code? If yes, could you test again by removing these “constexpr”/“consteval”? There is a known problem that clang doesn’t cache the evaluated value and the problem become much more significant in modules.

Not that I know of. It’s possible one of my 3rd party libs has some. I’ll keep that in mind.

Maybe related: the size of the .pcm files in my build directory made me wonder. I have only about 50K lines of C++, and the pcms were 1.5GB total. Way more space than the object files.

Some input that could be useful; consider the following files:

C++

// foo.cppm
module;

#include <compare>

export module foo;

export struct Foo
{
	int value{};

	auto operator <=>(const Foo &) const = default;
};

// bar.cppm
export module bar;

import foo;

export struct Bar
{
	Foo foo{};
};

This compiles successfully under clang, but if you compile it with msvc you have to include <compare> in bar or export it in foo. According to an msvc dev this is by design. As msvc is not impacted by these high compile times, this could be why it happens in clang.

Relevant issues:
https://developercommunity.visualstudio.com/t/Template-exports-requiring-importing-of-/1425979#T-N1435887

https://developercommunity.visualstudio.com/t/error-importing-a-module-with-detaulted-three-way/1582949

Maybe related: the size of the .pcm files in my build directory made me wonder. I have only about 50K lines of C++, and the pcms were 1.5GB total. Way more space than the object files.

You can look at the discussion here (C++ modules support? (#18355) · Issues · CMake / CMake · GitLab) to get a feeling about the one phase compilation model, the two phase compilation model, the fat BMI and thin BMI. Though I don’t think we can get a real “thin” enough BMI without refactoring the current design of BMI in clang…

And in my local experiments, the size of pcms is similar with yours and I have 18 module units. Then the module units are completely independent with each other. And I write cmake scripts manually to support the two phase compilation for my project. Finally, I get 50% compilation time reduced.

For your project, if the number of module units is not a lot, maybe you can try if you can improve it by using two phase compilation manually.

I feel this is an issue in MSVC and it doesn’t matter with the compilation speed. Since in bar we didn’t access anything about comparison, it looks not good for me to emit an error for this.

Is this patch âš™ D126694 [C++20][Modules] Implementation of GMF decl elision. related to the issue?

In fact, it should be [Modules] Faster compilation speed or better diagnostic messages? - #24 by vvassilev. The patch you linked tries to implement the semantics of GMF elision. But from the perspective of serializer, it may not help a lot with performance.