(I meant to write RFC. But I feel we treat Request For Comment as proposals, which seems not literally consistent. )
I have an idea and I discussed it with @mpark yesterday. We feel it is interesting. I haven’t implemented it (it is not designed in details neither). But I want to hear more opinions for the direction and design before my implementation this time.
What I want
I hope we can generate a serializable data structure from an arbitrary decl. Then we can use the data structure to get the specific decl or null if the such decl doesn’t exist or not loaded.
Motivation
The motivation of the idea is to avoid merging declarations when serializations. Personally, the `merging` problem is the most concerning problem in serialization related techniques (PCH, clang header modules, header units and named modules …)
Background
The background section describes what is the `merging` problem and why it is a problem. Reader who don’t care the old problem can skip this section.
What is the merging problem
The merging problem is, when there are multiple redeclarations coming from different module files and the being parsed textual source, the compiler needs to merge them.
e.g.,
// header.h
class H {
...
};
// a.cppm
module;
#include "header.h"
export module a;
export using ::H;
// b.cppm
module;
#include "header.h"
export module b;
export using ::H;
// use.cc
import a;
import b;
void use() {
H h; // triggers merging of H from a.pcm and b.pcm
}
// use2.cc
import a;
#include "header.h"
void use() {
H h; // triggers merging of H from a.pcm and the textual source.
}
(The problem exists for clang header modules too. I use C++20 named module for convinient)
The merging itself is a simple cheap link list operation. But the cost of founding and deserialization these redeclarations are expensive. Since when we deserialize a entity, we will deserialize the other entity stored in it. This is a chain reaction. And also, we have a lot of code like
for (auto *RD : D→redecls()) {
...
}
then such loops will take more time when the number of redeclaration goes up.
(BTW, ::redecls() will trigger the deserialization all redeclarations.)
In my experience, the merging problem is the major source of complaining for the compilation performance regression after switching to modules except recompilations and decreased parallelism. And also it is a major source of compiler bugs and crashes.
To avoid this, we always suggest users to use a bottom up pattern, which avoids contains the duplicated textual source in different module files. But this is not completely avoidable. Image the case:
// a.cppm
export module a;
export template <class T>
class A { ... };
// b.cppm
export module b;
import a;
export A<int> AB;
// c.cppm
export module c;
import a;
export A<int> AC;
// user.cc
import b;
import c;
void use() {
func(AB, AC); // triggers merging of A<int> in module b and module c.
}
For this example, even if we’ve followed the suggested best practice, we can avoid the merging of generated implicit instantiation in different modules.
Why can’t we solve the merging problem
The question can be split into two parts:
- Why do we have to load duplicated decls?
- Why do we have to merge the duplicated decls?
The first question relates to the current serialization/deserialization mechanism. Currently, when we want to store a reference to a decl, we store it as a `<ModuleFileIndex, LocalDeclIndex>`, (We know the module of different index), so a declaration reference may read as “the third declaration in module a“. But what is the declaration? We don’t know until we load it. So it is always possible to load the duplicated decls when there are duplicated decls. We are not able to make the choice before loading. During the compilation, there are many requests like “I need the third declaration in module a“ but we don’t know if the third declaration in module a already exist before we load it.
The second question relates to implementation limitations. We have a lot of code need to decide if two declarations are the same. And in my experiment, if we force to not merge them, there will be other problems in codegen part.
Motivation summary
To solve the problem, we have to have a mechanism to decide if we should load a declaration before loading it. This is what I called as query system. We should be able to produce a serializable data structure (the simpler, the better) to allow us to describe all Decl we want and also we need to have a mechanism to search all existing decls with the query information.
The design
(Again: the design is just an idea. I didn’t implement it nor I am proposing it. I just want to have some discussion)
I think the design can split into 3 question:
- What should the query info structure be?
- How to produce the query info from a declaration?
- How to search existing declaration without loading?
AST Matchers
The AST Matching system works fine for searching libraries. But the problem of AST Matchers is: how to produce the AST matcher data structure from any declaration? Is this doable?
I spent sometime looking into it but I didn’t find an answer.
A general searching process
Assuming every declaration can be found by traversing the children of TranslationUnitDecl recursively, we can have a fallback searching process and design the query info structure like:
struct DeclQueryInfo {
DeclQueryInfo *parent;
...
};
then we can have:
Decl *searchExistingDecl(const DeclQueryInfo &Info) {
Decl *Parent = searchExistingDecl(Info);
if (!Parent) return nullptr; // Assuming we won't search for TranslationUnitDecl
for (auto *Children : Parent->noload_children()) // We have children() method, but it may triggering deserialization. So maybe we need to add a noload_children() interface.
if (Children->match(Info)) // We need to add match(const DeclQueryInfo &) interface for every declaration kind.
return Children;
return nullptr;
}
So if the previous assumption is good, we can make it by implementing noload_children() and ::match(const DeclQueryInfo &) for every declaration kind.
This can be further optimized. As most declarations are named declaration, we can use a lookup mechanism for named declaration:
struct DeclQueryInfo {
DeclQueryInfo *parent;
std::optional<DeclarationName> Name; // Maybe we can't reuse DeclarationName simply due to serialization/deserialization limit. e.g., DeclarationName will record Type for constructor name. But we should be able to use other format to describe DeclarationName easily.
...
};
Decl *searchExistingDecl(const DeclQueryInfo &Info) {
Decl *Parent = searchExistingDecl(Info);
if (!Parent) return nullptr; // Assuming we won't search for TranslationUnitDecl
// Optimization for named declaration case, to avoid the O(N) fallback mechanism.
if (DeclarationName Name = getNameFromDeclQueryInfo(Info)) {
for (auto *Found : Parent->noload_lookup(Name))
if (Found->match(Info))
return Found;
return nullptr;
}
for (auto *Children : Parent->noload_children())
if (Children->match(Info))
return Children;
return nullptr;
}
then finally, what DeclQueryInfo should be? I think we can complete it by:
struct DeclQueryInfo {
DeclQueryInfo *parent;
std::optional<SerializableName> Name;
DeclKind Kind;
unsigned Hash;
};
Note that maybe we need to add other bits like (isAnonymouseNamespace) to describe declarations like anonymouse namespace but have no name.
Given we already have a mechanism to store ODRHash for many declarations. We can reuse the mechanism to judge if a declaration is the same with the one we’re looking for.
How to add the system to current serialization/deserialization framework?
I think, we can add it by serializing DeclQueryInfo for every serialized declaration before their current serialized bits. Accurately, we will store the ID for the DeclQueryInfo for every declaration.
So something in the writer side may look like:
class ASTWriter {
DenseMap<DeclQueryInfo *, DeclQueryInfoID> DeclQueryInfos; // DeclQueryInfoID is an unsigned like type
std::queue<DeclQueryInfo *> DeclQueryInfoToEmit;
DeclQueryInfoID NextDeclQueryInfoID = 0;
...
};
void ASTWriter::WriteDecl(ASTContext &Context, Decl *D) {
...
// Before any other records
DeclQueryInfo &DQI = D->getOrCreateDeclQueryInfo();
WriteDeclQueryInfoID(getDeclQueryInfoID(DQI));
}
DeclQueryInfoID getDeclQueryInfoID(DeclQueryInfo &Info) {
if &Info in DeclQueryInfos:
return DeclQueryInfos[&Info];
DeclQueryInfoToEmit.push_back(&Info);
DeclQueryInfos[Info] = NextDeclQueryInfoID++;
return DeclQueryInfos[&Info];
}
void ASTWriter::WriteDeclAndTypes(ASTContext &Context) {
...
// Existing code to write decl and type
do {
WriteDeclUpdatesBlocks(Context, DeclUpdatesOffsetsRecord);
while (!DeclTypesToEmit.empty()) {
DeclOrType DOT = DeclTypesToEmit.front();
DeclTypesToEmit.pop();
if (DOT.isType())
WriteType(Context, DOT.getType());
else
WriteDecl(Context, DOT.getDecl());
}
} while (!DeclUpdates.empty());
while(!DeclQueryInfoToEmit.empty()) {
DeclQueryInfo *DQI = DeclQueryInfoToEmit.front();
DeclQueryInfoToEmit.pop();
WriteDeclQueryInfo(DQI);
}
...
}
something in the reader side may look like:
Decl *ASTReader::ReadDeclRecord(GlobalDeclID ID) {
...
// Just before actually deserializing any existing bit
DeclQueryInfo &DQI = readDeclQueryInfo();
if (Decl* D = searchExistingDecl(DQI))
return D;
// Other existing code
...
}
Design Summary
We need to introduce a DeclQueryInfo structure to describe any declaration:
struct DeclQueryInfo {
DeclQueryInfo *parent;
std::optional<SerializableName> Name;
DeclKind Kind;
unsigned Hash;
};
Such structure should be controlled by ASTContext.
Every declaration kind should add the following two methods:
DeclQueryInfo &getOrCreateDeclQueryInfo();
bool match(DeclQueryInfo &); // Returning true when they are semantically the same.
In ASTWriter, we need to write the ID of DeclQueryInfo for every declaration wrote and all mentioned DeclQueryInfo.
In ASTReader, the most important part, for every getDeclID(GlobalDeclID) before actually loading the corresponding read decl, we will load the DeclQueryInfo first and look if we have any existing declaration matches the DeclQueryInfo. If yes, we will return it directly.
Key question
In my mind, the key question to the above design is, is it allowed to have the same declaration with different DeclID?
I tried to find the result but I didn’t find the limitation. If we don’t have the limitation really, I think the idea is doable.
Usages
Avoid merging problem and related bugs
As mentioned in the beginning, the “merging” problem is the key concernning point for performance regression with adopting modules. And also, a lot crash reports are triggered by the “merging” problem.
If we have the idea landed, in the ideal case, we can avoid the merging problem at all.
Avoid recompilations
The above described design won’t improve the recompilation problem directly. But the idea of DeclQueryInfo introduce a chance to improve the problem.
With Standard C++ Modules — Clang 22.0.0git documentation, now we are able to avoid some recompilations at file level. With DeclQueryInfo, it looks potential to make it in the declaration level.
e.g.,
export module a;
export int a() {
...
}
export module b;
export int b() {
return a();
}
After a compilation, now we change module a to
export module a;
export int a() {
...
}
export int a2() {
...
}
then the question comes: Can we avoid recompiling module ‘b’ in theory?
For the most simple question, I think the answer is yes. But why can’t we make it now? It relates to the structure of DeclID now. As mentioned in the begging, now the DeclID of a() in module b literally reads as the first declaration in module 'a'. And after module a changes, the module b has no way to know what is the first declaration except recompilation.
But if, we can replace DeclID completely with DeclQueryInfo, we are able to have much more information. So in module ‘b’ now we can record it as: we want 'int ::a()' from module a. Then we (or the build system) have a chance to perform a query before deciding if we should make a recompilation.
Note that the recompilation problem are harder when we consider debug info, e.g, what should happen if the definition of int a() doesn’t change but its source location changed? Will that relate to the emitted debug info in module ‘b’?
But after all I think it is potential.
A real compilation database
During the use of modules, I found an interesting case which most of time doesn’t spent on headers or module interfaces, but spent on the cpp files itself. More precisely, the template instantiation process.
Then I was wondering if we can have a chance to record the compilation result of the cpp files this time and reuse the result when compiling the cpp file next time. I feel the idea of DeclQueryInfo is helpful.
During the compilation, we can store all the instantiated decls into a special file. Then next time, when compile the source file we can reuse the recorded file. And also maybe we can merge the recorded file from different source files, which makes it look like a real compilation database.
Of course this is complex and far. But it looks enchanting.
Summary
I have an idea for a decl query system. With this system, we’re possible to avoid the annoying “merging” problem. And also it maybe helpful to improve the recompilation problem and cache the compilation results in more finer grained status.
Opinions are welcomed. Especially if there is a flaw in my imagination. e.g, if we can have the same declaration with different ID? Or if there are other problems I didn’t realize.
And also if any one would love to offer help to implement it, I am highly appreciated. I’d like to implement it myself but I am not sure when I can start. I feel the design may not be complex but we need some code since we need to add 2 interfaces for every declaration kind. I feel the amount of code may not be less. So collaborators are welcomed.