[Modules] Faster compilation speed or better diagnostic messages?

vvassilev · July 9, 2023, 4:46pm

Hi all,

Thanks for bringing this up. Our use of modules is mostly for reflection information purposes where the load time and the peak memory use basically define if we should adopt modules or not.

Yes we do most of the work lazily with several notable exceptions. Currently, the way we model the source locations in clang and the ASTReader is problematic. It comes from just a single line in clang where we reserve huge chunks for memory for the source location of eventually read entities. However fixing this would be very hard and might need to rework the source location management. We have tried workarounds such as using sparse vectors to delay the pre-allocations but without a great success. Reworking the source locations modelling in clang will be probably a good thing since it is a bit inefficient now. We might be able to use some sort of binary tree to model them. That issue is a major blocker if we build a module for a library which has a lot of token-generating header files such as boost preprocessor (or in fact most of the large boost libraries).

Another problem is that we eagerly deserialize template specializations upon module load. That is because we need to make the selection and possibly implicitly instantiatate. That results in a lot of redundant deserializations even for standard modules such as the one for libstdc++ since we have tons of specializations. In fact the this issue might be easier to fix and less intrusive. A very old solution is here ⚙ D41416 [modules] [pch] Do not deserialize all lazy template specializations when looking for one. and I will update it next week since I think I screwed up rebasing…

A major unsolved issue is the eager loading of types from the ASTReader. Some types are huge such as the template specialization types. That is, the way we load types eagerly here. Unfortunately, I do not have a plan how to fix this.

A long standing optimization is reducing the update record calls in many contexts where we do not need them…

These are the major issues we see on our production workflows so far.

I would like to second that if we manage to make a module loading a no-op which does not seem to be far from what I understand.

Topic		Replies	Views
[RFC] [C++20] [Modules] Introduce Thin BMI and Decls hash Clang Frontend	57	2364	April 7, 2025
[Modules TS] Have the file formats been decided? Clang Frontend	21	329	February 7, 2017
Plans for module debugging LLDB	23	313	December 1, 2014
Module build - tokenized form of intermediate source stream Clang Frontend	14	205	October 22, 2015
[RFC] [Modules] Should we embed sources to the BMI? Clang Frontend clang	32	923	September 18, 2024

[Modules] Faster compilation speed or better diagnostic messages?

Related topics