A question about modules implementation

Hi All,

I am seeing something unexpected with –fmodules. Is this intentional ?

If I include a file of one submodule, I see code for static initialization for global

variables of files of other submodules of the same parent module.

To take a standalone example, make a directory “HDR” with three files “A.h”, “B.h” and

“module.modulemap” with the following contents:

A.h:

struct AA { AA();};

B.h:

struct counter {

int v;

counter() { v = 0; }

};

const counter junk;

module.modulemap:

module TOP {

module A {

header “A.h”

export *

}

module B {

header “B.h”

export *

}

}

Now take a main file, test.cpp, with just one line

#include <A.h>

and compile it with ‘clang –S –fmodules –IHDR test.cpp’

Since there are no object definitions in A.h, one would expect an empty file. But

we see code for static initialization of ‘junk’.

So the question: Why is ‘junk’ being initialized in test.s ?

If modules A and B are not nested in another module “TOP”, then this initialization

does not occur. However, even with this nesting, why should module TOP.B be initialized

when only A.h is being included?

Without –fmodules, we get an effectively empty file, as expected.

I realize that putting a definition (of junk) in an include file is not a good idea,

still, I am not using that include file. Why should I be penalized for files that I am

not using.

Note that with the test case above, even if I include <A.h> (or even <B.h>) in multiple files,

no multiple definition error occurs because the variable definition is a const. The only cost

of -fmodules is some extra initialization code, though with multiple instances of this

phenomenon it can become significant.

Now, if the variable definition is made non-const, -fmodules prevents me from including <A.h>

in multiple translation units. The poor-programming-practice in <B.h> prevents inclusion

of <A.h> in multiple TUs. That is much more serious than just some extra code.

I have tested this with current TOT on Linux x86. The module cache was empty before doing

the compilation in these tests.

Thanks

Sunil Srivastava

Sony Computer Entertainment

This is related to how clang’s modules are implemented. Every top level module is basically turned into a PCH containing all headers within it (including submodules; except those excluded by requires). The “submodules” are just name hiding trickery on top of the PCH. So inherently all the files within a top-level module come in at the same time.

Clang internally generates a text file containing a #include of all the files listed in the top level module, then effectively compiles that into a PCH (and some extra modules-specific information so that it can do the name hiding trickery and other stuff). See addHeaderInclude and its callsites in lib/Frontend/FrontendActions.cpp

When a submodule is imported, clang basically imports the entire PCH, then hides names that are not supposed to be visible based on the submodule structure.

Inside clang, have an ever-expanding set of workarounds to this fundamental reality, so it seems in line with that to add another workaround to remedy this undesirable behavior. Richard, what do you think? It sounds like IRGen just needs to gain some awareness of where the globals are coming from; something like “only emit a global if it is in the transitive closure of headers reachable from the submodules that were imported by the TU” (probably needs some finessing surrounding when a submodule doesn’t export *).

– Sean Silva