This is likely going to be a bit weird since I just subscribed and don't
have the original email(s) to reply to, so apologies if my
reconstruction is incorrect.
For explicit modules we only need to know the direct dependencies, as the
build system will handle the transitive set.
Correct. Though `import` statements in `#include` files still need to be
mentioned.
For preprocessing we still need to import header units (but only their
preprocessor state), but not normal modules. For this case it’s ok if `-E
-MD` fails to find a module. But it does still need to be able to find
header units and module maps. Additionally the normal Make output syntax
is not sufficient to represent the needed information unless the driver
decides how modules and header units should be built and where intermediate
files should go. There’s currently a json format working its way through
the tooling subgroup of the standards committee that I think we should
adopt for this.
I think we need separate modes in clang for these along with support for
scanning through header units without actually building a clang module for
them. clang-scan-deps will make use of the explicit mode. The question I
have is how should we select this mode, and what clang options do we need
to add?
Proposal
As a rough idea I propose the following:
* `-M?` means output the json format which can correctly represent
dependencies on a module for which we don’t know what the final file path
will be.
[ I'm the author of the paper specifying the mentioned format. ]
For my GCC patch, I've spelled the flags for the output in the following
way:
- `-fdep-format=trtbd`: Necessary to support creating old format
versions (the "trtbd" part is in search of a much better name
).
- `-fdep-output=<PATH>`: The path that will be passed to the `-o` flag
when compiling the TU being scanned. This is needed to hook up which
scan result goes with which compilation rule (it can't be associated
with the source because a single source path may be compiled
multiple times within a build; the output object file does need to
be unique however).
- `-fdep-file=<PATH>` where to write the output for the format.
I avoided the `-M` flag family because that means "make". This is not
make syntax, so it doesn't belong there. In addition, the existing `-M`
flags are still useful because the "should I rerun this rule" logic for
the scan step itself can be satisfied with the `-M` flags here.
* `clang++ -std=c++20 -E -MD -fimplicit-header-units` should implicitly
find header unit sources, but not modules (as we've not given it any way to
look up how to build modules).
* This means that the dep file will contain a bunch of `.h`s,
`.modulemap`s, and any `.pcm`s explicitly listed on the command line.
* This also means erroring on unknown imported modules as we don't know
what to put in the dep file for them.
Sounds reasonable. Matching GCC's output for them might be a viable
option, but that is going to make not-make parsers of the `.d` files
choke (since that output involves appending to make variables).
* `clang++ -std=c++20 -E -MD -fimplicit-header-units
-fimplicit-module-lookup=?` should do the same as the above, except that
it does know how to find modules, and should list all of the transitive
dependencies of any modules it finds.
* `clang++ -std=c++20 -E -MD` should fail if it hits a module or header
unit, and should never do implicit lookup.
* `clang++ -std=c++20 -E -M?` should scan through header units without
actually building clang modules for them (to get the macros it needs), and
should note all module imports.
* This means that the dep file will contain only `.h`s that it
includes, and use the json representation of header units and modules.
* It will also be shallow, with only direct dependencies.
Sounds good.
Additionally, we should (eventually) make:
`$ clang++ -std=c++20 a.cpp b.cpp c.cpp a.cppm -o program`
Work without a build system, even in the presence of modules. To do this
we will need to prescan the files to determine the module dependencies
between them and then build them in dependency order. This does mean
adding a (simple) build system to the driver (maybe [llbuild](
https://github.com/apple/swift-llbuild)?), but I think it’s worth it to
make simple cases simple. It may also make sense to actually push this
work out to a real build system. For example have clang write a temporary
ninja file and invoke ninja to perform the build.
This sounds like what a Meson developer is expecting in this blog post:
https://nibblestew.blogspot.com/2019/08/building-c-modules-take-n1.html
I don't know how "simple" they're able to force their compilation model
into what would be provided here. I'm also not sure how much a nested
ninja would be appreciated (there's no notion of a jobserver for
ninja-under-ninja to propagate things like `-l` or `-j` flags down).
Pool information may also be useful there. There is a patchset for
ninja-under-make to obey jobserver information though, but that doesn't
help Meson at all.
> I don't object to supporting the json format, but are there defaults
> that would make sense? Maybe using the preprocessor state implied by
> the current command-line options and putting intermediate files /
> interface files in the current directory, or in
> TMDIR/.clang/<hash of path>, or something else? We'd need defaults
> for your `-M?` below anyway?
I think that defaults for the `-M?` (or `-fdep-*` flags) is unnecessary.
The flags are only really meaningful to a build system sophisticated
enough to understand module dependencies anyways, so just requiring at
least `-fdep-format=` and `-fdep-file=` to be set sounds OK to me at
least (`-fdep-output=` being unset means the build tool knows what it's
doing I guess). I suppose `-fdep-file=` could have a default too, but
hat sounds like a build system being too trusting of cross-version
compatibility to me.
The json format doesn't include pcm paths.
It doesn't require them, but there is a slot for the scan tool to say
something. In CMake's implementation, I take the filename of the pcm
path placed there, but relocate it to a target-specific directory. If it
is missing, I create my own filepath based on the logical name of the
module. This is communicated to the actual build by creating a file for
GCC's module mapper to locate it (which is used for import and export
locations). If clang wants a response file, that can be done too (with
the flag just being spelled as `@` instead of `-fmodule-mapper=`).
It just says which source
files provide which modules, and what modules and header units each
source file imports. It's up to the build system to construct an actual
build.
Yep.
The other issue with -MD is that I believe tools that use `.d`
files wouldn't even be able to handle a `.d` that included actual
commands.
Correct. Ninja tries to handle the barest of syntax for these files
(basically what is seen in the wild).
> Also, does finding a module involve matching a cppm file with
> compatible preprocessor state, or is it just by name?
>
It's just by name. The assumption here is that you have a compilation
database or similar and thus know the command line options passed to
every source file.
In CMake, mismatched preprocessor state is expected to be detected by
the compiler (something like "-D flags change the interpretation of the
BMI") or linker (as `_ITERATOR_DEBUG_LEVEL` is handled in Windows).
--Ben