Intros, C++ modules, and Facebook

Hi cfe-dev,

My name is Louis Brandy and I work at Facebook. I’ve begun working on
getting clang modules setup in our C++ codebase. Mostly the purpose of
this email is just to introduce myself and let people know what we’re
doing and our motivation, but I’ve also brought a handful of newbie
questions. I’ve gotten the basic integrations into the build system and
have some core projects building modularly. To get this far, I hacked
together a highly unprincipled set of module maps for glibc, libstdc++.
I’m at the point, now, where I need “real” module maps for our std/system
headers.

First, is there any prior art re: glibc and libstdc++ module maps? I don’t
want to repeat any work that’s already been done, and my google-fu failed.

Second, I’m interested in the workflow of actually incrementally adding
module maps to a large codebase. I do understand the need to start at the
bottom but I’m worried about proper coverage, and then prioritizing what
to do next. In particular, I find myself really wanting a “summary” of
what #includes did and did not magically become imports so I can use that
to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
to do next (by e.g. aggregating over a build the most textually included
headers). I don’t think such a diagnostic/remark exists? I’ve not looked
too deeply, yet, at clang-modularize, so perhaps my answers lie over
there?

On a final note, it’s been remarkably easy to get modules up and running
so kudos to everyone who’s gotten it this far.

-Louis

Hello,

I couldn't find existing module maps for the libraries you mentioned, but you might be able to reuse parts of the modulemap for libc++:

    libcxx/include/module.modulemap

I'm interested in your second question as well. There's a line in "CompilerInstance::loadModule" that updates LastModuleImportLoc. Do you think it'd be worthwhile to dump the module name there, to get an idea of what's been loaded?

I've played around with the "modularize" utility but it's results aren't always usable. If you have the time, I'd love to read a writeup about modularizing large codebases.

vedant

Hi cfe-dev,

My name is Louis Brandy and I work at Facebook. I’ve begun working on
getting clang modules setup in our C++ codebase. Mostly the purpose of
this email is just to introduce myself and let people know what we’re
doing and our motivation, but I’ve also brought a handful of newbie
questions. I’ve gotten the basic integrations into the build system and
have some core projects building modularly. To get this far, I hacked
together a highly unprincipled set of module maps for glibc, libstdc++.
I’m at the point, now, where I need “real” module maps for our std/system
headers.

First, is there any prior art re: glibc and libstdc++ module maps? I don’t
want to repeat any work that’s already been done, and my google-fu failed.

Richard, I remember in the past we talked and you sent me your glibc module
map and small patches. Any chance you could attach your latest ones here?
Looking back in my email, you said that you didn't have a libstdc++ module
map. Is that still the case?

Louis, I'm going to be setting up a LLVM/Clang buildbot (running linux)
that uses modules for building LLVM itself next week or so, so I'll
definitely keep you up to date.

Second, I’m interested in the workflow of actually incrementally adding
module maps to a large codebase. I do understand the need to start at the
bottom but I’m worried about proper coverage, and then prioritizing what
to do next. In particular, I find myself really wanting a “summary” of
what #includes did and did not magically become imports so I can use that
to make sure 1) I’ve not missed anything “below” and 2) to prioritize what
to do next (by e.g. aggregating over a build the most textually included
headers).

When I first went to investigate how much time is spent in which header, I
placed some DTrace probes inside of clang and aggregated the time spent
textually within a file across compiler invocations. See the thread "Some
DTrace probes for measuring per-file time." (
[cfe-dev] Some DTrace probes for measuring per-file time.)
The raw data that comes out of that DTrace script is a list of pairs
{"/path/to/file", total time spent in this file across all compiler
invocations}.
I then looked at the data in this Mathematica notebook:
https://drive.google.com/file/d/0B8v10qJ6EXRxTWpMTTBnaERQaVU/view?usp=sharing

Note that in that notebook (one of many) I removed the time spent after
parsing (basically, codegen time), so the pie chart at the end is a bit
deceptive.
I've attached two pie charts that include the time spent after parsing.
The first is a debug build (low optimization). The latter is a a release
build (for a release build, a much larger fraction of time is spent in
codegen).

If you don't have DTrace available so that you can directly measure the
time, you can probably get a decent idea based on the inclusion counts. One
easy way to do this is to tally up files mentioned by the -H option whose
output you can massage. There is also '.d' files, but I forget exactly what
we emit into them (we may emit header file names even if we didn't
textually touch the header, but only loaded its module).

Note that for measuring the time, you need to use a timestamp that is
virtualized CPU time. If you use real time then you will spuriously count
IO latency and other stuff, which will give wrong results (e.g. the total
sum of time will appear much larger than is possible).

I don’t think such a diagnostic/remark exists? I’ve not looked

too deeply, yet, at clang-modularize, so perhaps my answers lie over
there?

We have -Wauto-import which is sort of the opposite of this. Adding the
reverse warning "warn me when you included a header but didn't know about
it from a module map" could probably be done.

-- Sean Silva

AFAIK "clang ... -fmodules ... -E" will tell you which #include was turned implicitly into an import. From there on, grep could do what you need :wink:

Vassil

FYI: you don’t need to do a full bottom-up rollout; standard libraries are of course the first priority (libc, stdandard c++ libs, your own base libs), but after that, we do support a middle-out approach.

Spent some time playing with the different options today and -H actually does approximately what I want (telling me which headers are being textually included). It doesn’t seem to emit headers that are pulled from the module, though it will emit it during the module build itself (at least with –Rmodule–build). So if I do a clean build leaving the module cache intact, I’ll only get the textually included headers.

The dep files appear to include all headers, even those pulled modularly, but it also includes the module maps, so in theory with some parsing I could work it out from there as well. I think I’ll try to get by with –H for now and see where it leads.

-Louis