How should we support dependency scanner for C++20 Modules?

Recently I am looking for how to implement the dependency scanner for C++20 Modules. And I want to follow the style of scanner for clang modules. Then I met several questions and I want to discuss the here.

1 ModuleDependencyCollector and ModuleDepCollector?

When I read the codes, I found two module dependency collectors by the name. So it is a little bit confusing. ModuleDependencyCollector live in Frontend And. ModuleDepCollector live in Tooling and ModuleDepCollector is for clang-scan-deps.

Also from the git logs, I think ModuleDependencyCollector is not maintained. Here is the git log for ModuleDependencyCollector:

d79ad2f1dbc2 [clang][lex] NFCI: Use FileEntryRef in PPCallbacks::InclusionDirective()
82b3e28e836d [SystemZ][z/OS][Windows] Add new OF_TextWithCRLF flag and use this flag instead of OF_Text
080952a9447a Support: Remove duplicated code in {File,clang::ModulesDependency}Collector, NFC
ba5628f2c2a9 ADT: Use 'using' to inherit assign and append in SmallString
adcd02683856 Make llvm::StringRef to std::string conversions explicit.
2b3d49b610bd [Clang] Migrate llvm::make_unique to std::make_unique
d9b948b6eb73 Rename F_{None,Text,Append} to OF_{None,Text,Append}. NFC
77bc7355163b [ModuleDependencyCollector] Use llvm::sys::fs::real_path (NFC)
2946cd701067 Update the file headers across all of the LLVM projects in the monorepo to reflect the new license.
ee89b2e01dfb [VFS] Remove 'ignore-non-existent-contents' attribute for YAML-based VFS.
96fbe58b0f52 Reland '[clang] Adding CharacteristicKind to PPCallbacks::InclusionDirective'
b524d5e55375 Revert "[clang] Adding CharacteristicKind to PPCallbacks::InclusionDirective"
36d94ab8f012 [clang] Adding CharacteristicKind to PPCallbacks::InclusionDirective
d637c05986ae IWYU for llvm-config.h in clang. See r331124 for details.
6bc635ef56b9 Revert r329698 (and r329702).
148c8cb4bf0c Use llvm::sys::fs::real_path() in clang.

We can find that the first non-NFC patch is 6bc635ef56b9 . And it is in 2018. So I feel like ModuleDependencyCollector is not maintained. But not being maintained is not equal to not being used.
So I want to ask if ModuleDependencyCollector is still used. If it is not used, we should remove it. And if it is still used, if we should follow its pattern.

2 Do we expect to use Clang modules and C++20 Modules together?

Assume ModuleDependencyCollector is deprecated. Then it looks like we need to support the dependency scanner in clang-scan-deps to follow the existing manner.

Then I met a problem: if I should extend the implementation in ModuleDepCollector or I should create a new derived class like CXXModuleDepCollector. There are already many logics and data structures in ModuleDepCollector but these logics and data structures can’t fit into C++20 Modules. I mean, if we want to implement the scanner in ModuleDepCollector, we may need to add some field which has similar names. For example, the code may look like this:

  /// Direct and transitive modular dependencies of the main source file.
  llvm::MapVector<const Module *, std::unique_ptr<ModuleDeps>> ModularDeps;
  /// Secondary mapping for \c ModularDeps allowing lookup by ModuleID without
  /// a preprocessor. Storage owned by \c ModularDeps.
  llvm::DenseMap<ModuleID, ModuleDeps *> ModuleDepsByID;
  /// Direct modular dependencies that have already been built.
  llvm::MapVector<const Module *, PrebuiltModuleDep> DirectPrebuiltModularDeps;
  
  // Following is new added
  llvm::SmallVector<std::string> StandardCXXModuleDeps;
  // Other field starts with `StandardCXXModule`

So the codes look not so clean at least. So from the perspective of clean code, it’ll be better to create a new class like CXXModuleDepCollector. However, I think it’ll be problematic if the users want to use Clang Modules and C++20 Modules together. So here is the decision point.

Simply, if we disallow to use C++20 Modules and Clang Modules together, the (many other) implementation will be simpler and some potential bugs will not be found. However, this is from the perspective of implementors. But it may be a different thing from the view of users. I think how users feel may be more important.


So I have two questions in this post:
(1) If ModuleDependencyCollector is still maintained and used? If not, can we remove it?
(2) Do we expect our users to use C++20 Modules with Clang C++ Modules together?

CC people I feel like who is interested: @iains @dblaikie @Bigcheese @jansvoboda11 @vsapsai @tahonermann @bcardosolopes @tschuett

1 Like

apropos 2)

My understanding of the long-term objectives was that clang modules should gradually morph into C++ standard modules (but that might take quite some time and involves complex issues like not breaking existing codebases).

“clang” header modules are [currently] quite different semantically from C++ header units (which is why they are distinct module types in the current implementation).

That would seem like the first thing to resolve if one wanted interoperability.

For 1, ModuleDependencyCollector is for generating reproducers from an implicit modules build. It copies all the files needed into a directory. This is still used, and is unrelated to building modules.

For 2, I would very much like to support Clang modules and C++20 modules at the same time. The scanner will need to change a bit to support this, but my plan was to add a list of named imported modules to the output (and the relevant build settings needed to build them), along with the name of the module itself if it has one. A big difference in this case is that the scanner won’t recurse into imports as it doesn’t necessarily know where they are, and it doesn’t need to as there’s no preprocessor state to worry about that can impact other scanning.

An issue with this is the current way we scan for modules is to actually build them. This is both slow and unnecessary, and would need hacks to ignore import decls. Eventually I would like to move to a solution that emulates actually building and importing modules without the overhead. We only care about preserving the preprocessor semantics, and doing header search and modulemap processing correctly.

Yeah, it will be a big progress if we can get a consensus for the mixed use of clang modules and standard c++ modules. I sent a document patch here: âš™ D136221 [docs] Add the description about mixing use of clang modules and c++ modules.

The scanner will need to change a bit to support this, but my plan was to add a list of named imported modules to the output (and the relevant build settings needed to build them), along with the name of the module itself if it has one.

My current plan is to support P1689. So I’ll probably need to introduce another ScanningOutputFormat. (Currently it has Make and Full only). I think you’re talking about modifying the Full mode. It is a little bit hard indeed. But I feel like it would be easier if we are implementing a new format.

“clang” header modules are [currently] quite different semantically from C++ header units (which is why they are distinct module types in the current implementation).

Yeah, they have different semantics in the higher level. But in the lower level, I feel like the header units may be the simpler clang header modules. At least they share a big part in the serialization and deserilization. And both of them mimics the header semantics. So I feel like the differences are much smaller than the difference between header modules and named modules.