RFC: allow modules found via -fmodule-map-file to shadow implicitly discovered modules

Hey all,

Background

Suppose you’re developing a system library FancyLib that installs a module map file into a default search path, and you’re using implicit module maps. When you’re developing locally, you will likely end up with multiple versions of your module that are reachable in your search paths:

  • The system version /usr/include/FancyLib/module.modulemap
  • The local development version ~/…/Build/Products/Debug/FancyLib/module.modulemap

You generally want to use the local version, and ignore the one from the system. Unfortunately, this currently causes module-redefinition errors if both module map files are ever parsed. For framework modules, we have been getting away with this because the compiler almost never looks into a framework directory if it has previously looked into a framework directory of the same name. Sadly the same is not true of non-frameworks.

Proposed change for users

Users developing non-framework system modules would add:
-fmodule-map-file=path/to/local/copy/of/module.modulemap

Typically, the “local copy” of the module map file would be to somewhere inside their build products.

Proposed clang changes

The flag -fmodule-map-file already exists in clang, and it causes clang to explicitly parse the specified module map, making any modules defined in that file available for @import (or via auto-import from a #import).

I propose we extend -fmodule-map-file so that modules found by this mechanism are allowed to shadow modules that we found implicitly in header search paths. When searching for a module by name, we will not consider the shadowed module at all. It will be an error for header search to find a header that is considered part of a shadowed module.

If two modules with the same name are both found in the contents of -fmodule-map-file arguments, then it is a redefinition error. Since these flags are ordered, we could theoretically allow shadowing between them, but I cannot think of a good use-case for this and suspect it would usually be unexpected to users.

If there are two modules with the same name and neither of them comes from -fmodule-map-file, then it is a redefinition error just like it is today.

Why not use header search path order to do shadowing?

I originally thought we should just allow modules that come from earlier search paths to shadow modules from later ones. This would avoid having to use an extra flag -fmodule-map-file, and let us infer the module configuration “for free”, since users are already accustomed to setting up their header search paths. However, there are a number of problems with this approach:

  • Module map files from “later” search paths are often parsed ahead of module map files from “earlier” paths. We would either need to eagerly parse module map files from all search paths in order (slow), or we would need to be able to retroactively make a module shadowed (brittle).
  • Module map lookup does not know about search paths, so when searching for a module map file by umbrella directory we don’t know when we cross search path boundaries
  • We would need to lock down the HeaderSearch interface to prevent modifying search path order in the middle of a build (although that might be a generally good thing…)

My patch implementing this is attached for reference.

fmodule-map-file-shadow.patch (16.1 KB)

Hey all,

*Background*

Suppose you’re developing a system library FancyLib that installs a module
map file into a default search path, and you’re using implicit module
maps. When you’re developing locally, you will likely end up with multiple
versions of your module that are reachable in your search paths:
* The system version /usr/include/FancyLib/module.modulemap
* The local development version
~/…/Build/Products/Debug/FancyLib/module.modulemap

You generally want to use the local version, and ignore the one from the
system. Unfortunately, this currently causes module-redefinition errors if
both module map files are ever parsed. For framework modules, we have been
getting away with this because the compiler almost never looks into a
framework directory if it has previously looked into a framework directory
of the same name. Sadly the same is not true of non-frameworks.

*Proposed change for users*

Users developing non-framework system modules would add:
    -fmodule-map-file=path/to/local/copy/of/module.modulemap

Typically, the “local copy” of the module map file would be to somewhere
inside their build products.

*Proposed clang changes*

The flag -fmodule-map-file already exists in clang, and it causes clang to
explicitly parse the specified module map, making any modules defined in
that file available for @import (or via auto-import from a #import).

I propose we extend -fmodule-map-file so that modules found by this
mechanism are allowed to shadow modules that we found implicitly in header
search paths. When searching for a module by name, we will not consider
the shadowed module at all. It will be an error for header search to find a
header that is considered part of a shadowed module.

If two modules with the same name are *both* found in the contents of
-fmodule-map-file arguments, then it is a redefinition error. Since these
flags are ordered, we could theoretically allow shadowing between them, but
I cannot think of a good use-case for this and suspect it would usually be
unexpected to users.

If there are two modules with the same name and *neither* of them comes
from -fmodule-map-file, then it is a redefinition error just like it is
today.

*Why not use header search path order to do shadowing?*

I originally thought we should just allow modules that come from earlier
search paths to shadow modules from later ones. This would avoid having to
use an extra flag -fmodule-map-file, and let us infer the module
configuration “for free”, since users are already accustomed to setting up
their header search paths. However, there are a number of problems with
this approach:

* Module map files from “later” search paths are often parsed ahead of
module map files from “earlier” paths. We would either need to eagerly
parse module map files from all search paths in order (slow), or we would
need to be able to retroactively make a module shadowed (brittle).
* Module map lookup does not know about search paths, so when searching
for a module map file by umbrella directory we don't know when we cross
search path boundaries
* We would need to lock down the HeaderSearch interface to prevent
modifying search path order in the middle of a build (although that might
be a generally good thing…)

The direction here makes complete sense to me.

My patch implementing this is attached for reference.

This seems to be somewhat more general than we need for the above proposal
-- it supports a fully-general set of scopes, whereas it would seem
sufficient to track a flag to indicate whether the module came from a
-fmodule-map-file= file or from an implicit module map file. Do you have
some future extension in mind that would need more than 2 different scopes?

Nothing concrete, although it feels like if we might someday be able to get at least one other scope for the default system search paths. The reason I chose to not just use a flag was - perhaps embarrassingly - that I haven’t been able to come up with a nice way to express what that “flag” is to the ModuleMap class without defining it in terms of our command-line interface, which feels like a conceptual layering violation. I’m open to suggestions thought :slight_smile:

The flag seems to be whether the module map is explicit or implicit, but
that's more at the level of HeaderSearch's world view than ModuleMap's.

I also think it's probably unnecessary to build shadow modules as your
patch does (unless you expect those files to be reachable via
#include_next?). Instead, you could do the same thing we do when the
definition of a module was provided by a loaded module file: skip to the
close brace of the module definition and return. We particularly don't want
to stat all the files named in the shadow module, since the extra stat
syscalls can be moderately expensive. It seems like a cleaner semantic
model to say that the shadowed module is completely ignored than that it
exists and has the same name but is a somehow different module.

In that case it should apply to -fmodule-file as well, right?

I built the unavailable module because I want to diagnose #include of headers from the shadowed module. Including headers from the shadowed module likely indicates a bug in how the module or header search paths are set up.

Hey all,

*Background*

Suppose you’re developing a system library FancyLib that installs a
module map file into a default search path, and you’re using implicit
module maps. When you’re developing locally, you will likely end up with
multiple versions of your module that are reachable in your search paths:
* The system version /usr/include/FancyLib/module.modulemap
* The local development version
~/…/Build/Products/Debug/FancyLib/module.modulemap

You generally want to use the local version, and ignore the one from the
system. Unfortunately, this currently causes module-redefinition errors if
both module map files are ever parsed. For framework modules, we have been
getting away with this because the compiler almost never looks into a
framework directory if it has previously looked into a framework directory
of the same name. Sadly the same is not true of non-frameworks.

*Proposed change for users*

Users developing non-framework system modules would add:
    -fmodule-map-file=path/to/local/copy/of/module.modulemap

Typically, the “local copy” of the module map file would be to somewhere
inside their build products.

*Proposed clang changes*

The flag -fmodule-map-file already exists in clang, and it causes clang
to explicitly parse the specified module map, making any modules defined in
that file available for @import (or via auto-import from a #import).

I propose we extend -fmodule-map-file so that modules found by this
mechanism are allowed to shadow modules that we found implicitly in header
search paths. When searching for a module by name, we will not consider
the shadowed module at all. It will be an error for header search to find a
header that is considered part of a shadowed module.

If two modules with the same name are *both* found in the contents of
-fmodule-map-file arguments, then it is a redefinition error. Since these
flags are ordered, we could theoretically allow shadowing between them, but
I cannot think of a good use-case for this and suspect it would usually be
unexpected to users.

If there are two modules with the same name and *neither* of them comes
from -fmodule-map-file, then it is a redefinition error just like it is
today.

*Why not use header search path order to do shadowing?*

I originally thought we should just allow modules that come from earlier
search paths to shadow modules from later ones. This would avoid having to
use an extra flag -fmodule-map-file, and let us infer the module
configuration “for free”, since users are already accustomed to setting up
their header search paths. However, there are a number of problems with
this approach:

* Module map files from “later” search paths are often parsed ahead of
module map files from “earlier” paths. We would either need to eagerly
parse module map files from all search paths in order (slow), or we would
need to be able to retroactively make a module shadowed (brittle).
* Module map lookup does not know about search paths, so when searching
for a module map file by umbrella directory we don't know when we cross
search path boundaries
* We would need to lock down the HeaderSearch interface to prevent
modifying search path order in the middle of a build (although that might
be a generally good thing…)

The direction here makes complete sense to me.

My patch implementing this is attached for reference.

This seems to be somewhat more general than we need for the above
proposal -- it supports a fully-general set of scopes, whereas it would
seem sufficient to track a flag to indicate whether the module came from a
-fmodule-map-file= file or from an implicit module map file. Do you have
some future extension in mind that would need more than 2 different scopes?

Nothing concrete, although it feels like if we might someday be able to
get at least one other scope for the default system search paths. The
reason I chose to not just use a flag was - perhaps embarrassingly - that I
haven’t been able to come up with a nice way to express what that “flag” is
to the ModuleMap class without defining it in terms of our command-line
interface, which feels like a conceptual layering violation. I’m open to
suggestions thought :slight_smile:

The flag seems to be whether the module map is explicit or implicit, but
that's more at the level of HeaderSearch's world view than ModuleMap’s.

In that case it should apply to -fmodule-file as well, right?

I would expect this flag to be round-tripped through module files, and
should depend on whether the file was found implicitly or explicitly when
building the module. That would mean you can no longer assume you see all
the explicit modules before any of the implicit ones, because an explicit
module file can reference an implicit module map (if the user specifies
both a -fmodule-file= and a -fmodule-map-file= where the former uses an
implicit module and the latter provides a different module with the same
name).

I also think it's probably unnecessary to build shadow modules as your
patch does (unless you expect those files to be reachable via
#include_next?). Instead, you could do the same thing we do when the
definition of a module was provided by a loaded module file: skip to the
close brace of the module definition and return.

I built the unavailable module because I want to diagnose #include of
headers from the shadowed module. Including headers from the shadowed
module likely indicates a bug in how the module or header search paths are
set up.

OK, that seems reasonable.

My expectation would be:

If an -fmodule-file=Foo.pcm matches with an -fmodule-map-file=Foo.modulemap then everything is fine, and it will shadow any other module named Foo. But if it conflicts with an -fmodule-map-file=OtherFoo.modulemap then I would expect it to be an error regardless of whether Foo.pcm was built with an implicit or explicit module map file. We could detect this immediately after reading the control block of the module file, which we do eagerly.

I suppose that still leaves the case where -fmodule-file=Foo.pcm was built with an implicit module map, and we now find a different implicit module map. I don’t have a strong feeling either way. If we went with round-tripping the explicitness like you suggested I think that would be okay. In this limited case, the fact that we might encounter implicit modules before explicit ones should be fine since any modules with the same name in the union of modules specified by -fmodule-file and -fmodule-map-file would be an error.

On the other hand, part of me feels that specifying -fmodule-file is in some sense “explicit” about how that module should be resolved. That would argue for letting modules specified with -fmodule-file shadow implicit modules. I admit I haven’t been following the explicit side of things very closely, so maybe this violates the model.

Ben

Hey all,

*Background*

Suppose you’re developing a system library FancyLib that installs a
module map file into a default search path, and you’re using implicit
module maps. When you’re developing locally, you will likely end up with
multiple versions of your module that are reachable in your search paths:
* The system version /usr/include/FancyLib/module.modulemap
* The local development version
~/…/Build/Products/Debug/FancyLib/module.modulemap

You generally want to use the local version, and ignore the one from
the system. Unfortunately, this currently causes module-redefinition
errors if both module map files are ever parsed. For framework modules, we
have been getting away with this because the compiler almost never looks
into a framework directory if it has previously looked into a framework
directory of the same name. Sadly the same is not true of non-frameworks.

*Proposed change for users*

Users developing non-framework system modules would add:
    -fmodule-map-file=path/to/local/copy/of/module.modulemap

Typically, the “local copy” of the module map file would be to
somewhere inside their build products.

*Proposed clang changes*

The flag -fmodule-map-file already exists in clang, and it causes clang
to explicitly parse the specified module map, making any modules defined in
that file available for @import (or via auto-import from a #import).

I propose we extend -fmodule-map-file so that modules found by this
mechanism are allowed to shadow modules that we found implicitly in header
search paths. When searching for a module by name, we will not consider
the shadowed module at all. It will be an error for header search to find a
header that is considered part of a shadowed module.

If two modules with the same name are *both* found in the contents of
-fmodule-map-file arguments, then it is a redefinition error. Since these
flags are ordered, we could theoretically allow shadowing between them, but
I cannot think of a good use-case for this and suspect it would usually be
unexpected to users.

If there are two modules with the same name and *neither* of them comes
from -fmodule-map-file, then it is a redefinition error just like it is
today.

*Why not use header search path order to do shadowing?*

I originally thought we should just allow modules that come from
earlier search paths to shadow modules from later ones. This would avoid
having to use an extra flag -fmodule-map-file, and let us infer the module
configuration “for free”, since users are already accustomed to setting up
their header search paths. However, there are a number of problems with
this approach:

* Module map files from “later” search paths are often parsed ahead of
module map files from “earlier” paths. We would either need to eagerly
parse module map files from all search paths in order (slow), or we would
need to be able to retroactively make a module shadowed (brittle).
* Module map lookup does not know about search paths, so when searching
for a module map file by umbrella directory we don't know when we cross
search path boundaries
* We would need to lock down the HeaderSearch interface to prevent
modifying search path order in the middle of a build (although that might
be a generally good thing…)

The direction here makes complete sense to me.

My patch implementing this is attached for reference.

This seems to be somewhat more general than we need for the above
proposal -- it supports a fully-general set of scopes, whereas it would
seem sufficient to track a flag to indicate whether the module came from a
-fmodule-map-file= file or from an implicit module map file. Do you have
some future extension in mind that would need more than 2 different scopes?

Nothing concrete, although it feels like if we might someday be able to
get at least one other scope for the default system search paths. The
reason I chose to not just use a flag was - perhaps embarrassingly - that I
haven’t been able to come up with a nice way to express what that “flag” is
to the ModuleMap class without defining it in terms of our command-line
interface, which feels like a conceptual layering violation. I’m open to
suggestions thought :slight_smile:

The flag seems to be whether the module map is explicit or implicit, but
that's more at the level of HeaderSearch's world view than ModuleMap’s.

In that case it should apply to -fmodule-file as well, right?

I would expect this flag to be round-tripped through module files, and
should depend on whether the file was found implicitly or explicitly when
building the module. That would mean you can no longer assume you see all
the explicit modules before any of the implicit ones, because an explicit
module file can reference an implicit module map (if the user specifies
both a -fmodule-file= and a -fmodule-map-file= where the former uses an
implicit module and the latter provides a different module with the same
name).

My expectation would be:

If an -fmodule-file=Foo.pcm matches with an
-fmodule-map-file=Foo.modulemap then everything is fine, and it will shadow
any other module named Foo. But if it conflicts with an
-fmodule-map-file=OtherFoo.modulemap then I would expect it to be an error
regardless of whether Foo.pcm was built with an implicit or explicit module
map file. We could detect this immediately after reading the control block
of the module file, which we do eagerly.

I suppose that still leaves the case where -fmodule-file=Foo.pcm was built
with an implicit module map, and we now find a different implicit module
map. I don’t have a strong feeling either way. If we went with
round-tripping the explicitness like you suggested I think that would be
okay. In this limited case, the fact that we might encounter implicit
modules before explicit ones should be fine since any modules with the same
name in the union of modules specified by -fmodule-file and
-fmodule-map-file would be an error.

On the other hand, part of me feels that specifying -fmodule-file is in
some sense “explicit” about how that module should be resolved. That would
argue for letting modules specified with -fmodule-file shadow implicit
modules. I admit I haven’t been following the explicit side of things very
closely, so maybe this violates the model.

OK, you've convinced me. Until / unless we have some reason to do
otherwise, we should not support multiple modules with the same name being
(transitively) used by the same translation unit, so modules that we know
about from a loaded module file should be treated the same as modules from
an explicit module map.

Having heard of no disagreement with the overall idea, I’m moving ahead. An updated patch is on cfe-commits. Thanks!

Hi Ben,
   Thanks for the work! We will definitely need such a concept, too.
Vassil

Hi Ben,
   Thanks for the work! We will definitely need such a concept, too.
Vassil