RFC: Supporting private module maps for non-framework headers

Hi all,

For frameworks Clang currently supports adding a separate module map file for the private headers of the framework. It looks specifically for the presence of ‘module.private.modulemap’ inside the .framework and parses both the public and the private module maps when it processes its module. We would like to extend support for private module maps for non-framework headers as well.

In the Darwin platform, the public SDK headers are located in ‘/usr/include’, while the associated private SDK headers are located in '/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in the header search paths.
We propose to make the following changes to Clang’s module mechanism:

  • When looking up a module through the search paths, in addition to ‘module.modulemap’ also lookup for a standalone ‘module.private.modulemap’ file. I will refer to this as the “private extension” module map.
  • When parsing a private extension map allow extending a module that was not defined before, without providing the full definition. To clarify, I refer to a module definition as this:

module MyModule {
<…>
}

while an extension is this:

module MyModule.SomethingPrivate {
<…>
}

An extension is a nested module with any depth.
We can reuse the “extern module” syntax to indicate that we are extending a module whose definition is in a different module map:

extern module MyModule
module MyModule.SomethingPrivate {
<…>
}

  • After parsing the private extension map, we are still missing the module definition so module lookup will continue looking in the following header search paths. If the module we are looking for is not found then Clang will a emit a “module not found” error.

  • It may seem backwards that module search will find and parse the private extension ahead of the public one, but it is actually advantageous because this allows us to continue searching only until we find the module definition, at which point we will stop looking. If module search worked the other way then, after we had the module definition, we would need to always keep looking through the rest of the search paths in case there is a private extension map that we need to take into account, or treat certain paths specially and only look for private extensions in those.
    By finding the extension map early on, we keep the current semantics of doing the minimal search necessary to find and complete the module definition, without treating any particular search path specially.

  • After Clang finds and parses the public module map for ‘MyModule’, the module definition will be complete. Clang will keep track that there is a private extension map associated with the module and it will pass the paths of both the public module map and the private extension one to the module building invocation. This will result in one module file containing both the public and private APIs, similar to what we do with frameworks.

  • A module definition inside a private extension will be disallowed. The rationale is that otherwise it will be a very common mistake for users to write

module.modulemap:
module Foo {

}

module.private.modulemap:
module Foo {

}

and then be left scratching their heads wondering why things are broken (things missing, headers included textually, etc.). Being more strict in private extension maps will be beneficial.

Let me know what you think!

Hi all,

For frameworks Clang currently supports adding a separate module map file
for the private headers of the framework. It looks specifically for the
presence of ‘module.private.modulemap’ inside the .framework and parses
both the public and the private module maps when it processes its module.
We would like to extend support for private module maps for non-framework
headers as well.

In the Darwin platform, the public SDK headers are located in
'/usr/include', while the associated private SDK headers are located in
'/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
the header search paths.

I worry that this will be fragile. If for any reason we look in
/usr/include but not in /usr/local/include, we'll not load the private
extension map and things will probably go quite badly from that point
onwards. If the presence of the /usr/local/include headers is a fundamental
part of a /usr/include module, then it seems better to me to specify that
within the /usr/include module map.

So here's one possibility: allow 'extern module' declarations to be nested
within other modules, then write your /usr/include module map as:

module MyModule {
  <...>
  extern module SomethingPrivate "/usr/local/include/module.private.map"
}

(in addition to the other changes you suggest here). Then only allow a
module to be extended if the extension is listed via an 'extern module' in
the definition of the module.

We propose to make the following changes to Clang’s module mechanism:

This has drawbacks:

  • Details of the private SDK, “leak out” to the public one. It should work similar to frameworks, in that the public SDK remains the same irrespective if there is or not a private API, and the private API is a straight addition on top of the public one without needing to modify something in the public SDK.
  • It is a bit weak as guarantee anyway because the public module map must necessarily function even when the extension map is missing, which means pointing at the wrong path or missing the private map when you really need it will not be detected.
  • Flexibility to extend a module from any path may be valuable for testing.

Hi all,

For frameworks Clang currently supports adding a separate module map file
for the private headers of the framework. It looks specifically for the
presence of ‘module.private.modulemap’ inside the .framework and parses
both the public and the private module maps when it processes its module.
We would like to extend support for private module maps for non-framework
headers as well.

In the Darwin platform, the public SDK headers are located in
'/usr/include', while the associated private SDK headers are located in
'/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
the header search paths.

I worry that this will be fragile. If for any reason we look in
/usr/include but not in /usr/local/include, we'll not load the private
extension map and things will probably go quite badly from that point
onwards. If the presence of the /usr/local/include headers is a fundamental
part of a /usr/include module, then it seems better to me to specify that
within the /usr/include module map.

So here's one possibility: allow 'extern module' declarations to be nested
within other modules, then write your /usr/include module map as:

module MyModule {
  <...>
  extern module SomethingPrivate "/usr/local/include/module.private.map"
}

This has drawbacks:

- Details of the private SDK, “leak out” to the public one. It should work
similar to frameworks, in that the public SDK remains the same irrespective
if there is or not a private API, and the private API is a straight
addition on top of the public one without needing to modify something in
the public SDK.
- It is a bit weak as guarantee anyway because the public module map must
necessarily function even when the extension map is missing, which means
pointing at the wrong path or missing the private map when you really need
it will not be detected.
- Flexibility to extend a module from any path may be valuable for testing.

OK, I'm not sure I understand what problem you're solving. If the
/usr/local/include stuff works as a layer on top of /usr/include, why do
you need them to be built as part of the same module? (Do your
/usr/local/include headers override / #include_next some of the
/usr/include headers, perhaps? If so, do you need the #includes in
/usr/include to find the /usr/local/include headers rather than the
/usr/include headers?)

There are some cases of cycles between public/private headers which would be accommodated by a single module build but the primary motivation is that we would like the module public/private interface to be under the same namespace, so you’d do

@import Dispatch;
@import Dispatch.Private;

@import Darwin.POSIX.Foo.Bar;

@import Darwin.POSIX.Foo.Bar.Private;

and generally any kind of private extension:

@import Dispatch.SuperCoolButPrivate;

Do you want / need them to be built as a single module file, or not?

As I said, cycles may make things difficult for separate module files, but how are we going to get new submodules under the same module name with separate module files ?

Well, the restriction that module files correspond to top-level module
names is arbitrary and artificial. (It's also a bad idea for a few reasons.
It makes incremental refactoring very hard, for instance, because you're
required to have no cycles at any point between things in different
top-level modules.)

Splitting up the description of how to build a module file across various
module maps seems like a very error-prone strategy, especially if you're
intending to be able to stop looking before you've read all of the relevant
module maps.

I think that the high level parts of my proposal are not dependent on whether we build one .pcm file or multiple ones, this is an implementation detail.
To be more specific, if we have

module.private.modulemap (extension):
extern module Dispatch
module Dispatch.Private {

}

module.modulemap:
module Dispatch {

}

It is an implementation detail whether we buiild one Dispatch.pcm file or a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it should make no difference on user code.
Is this incorrect ?

Maybe off topic (sorry if I misunderstood): would that ‘somehow’ allow placing a modulemap outside the /usr folder? (For cases like

Whether we build one .pcm or multiple is observable in some circumstances.
1) We concatenate together all the header files built as part of one .pcm
file, and parse them all at once, and that is not always semantically
equivalent to building them in two separate passes. 2) If you have one big
Dispatch.pcm file which also contains the private bits, and by any sequence
of events you end up also pulling in another Dispatch.pcm that contains the
public headers but not the private ones, you may get ambiguity errors. 3)
We do not allow circular references across .pcm files but do allow them
within a .pcm file.

If your Dispatch.Private is simply a layer on top of Dispatch, then
building them as two separate .pcm files seems like the right choice; it
keeps your Dispatch module (for want of a better word) modular. If on the
other hand, you need includes/imports in Dispatch to pull in headers /
submodules from Dispatch.Private, then one big Dispatch .pcm is probably
the right answer, and we'd need something like your proposal so we could
say "here is a Dispatch module that's like the one in /usr/include but
different in the following ways".

(Whichever of these options we pick, we can make the "@import
Dispatch.Private" syntax do the right thing.)

Hi all,

For frameworks Clang currently supports adding a separate module map
file for the private headers of the framework. It looks specifically for
the presence of ‘module.private.modulemap’ inside the .framework and parses
both the public and the private module maps when it processes its module.
We would like to extend support for private module maps for non-framework
headers as well.

In the Darwin platform, the public SDK headers are located in
'/usr/include', while the associated private SDK headers are located in
'/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
the header search paths.

I worry that this will be fragile. If for any reason we look in
/usr/include but not in /usr/local/include, we'll not load the private
extension map and things will probably go quite badly from that point
onwards. If the presence of the /usr/local/include headers is a fundamental
part of a /usr/include module, then it seems better to me to specify that
within the /usr/include module map.

So here's one possibility: allow 'extern module' declarations to be
nested within other modules, then write your /usr/include module map as:

module MyModule {
  <...>
  extern module SomethingPrivate "/usr/local/include/module.private.map"
}

Maybe off topic (sorry if I misunderstood): would that 'somehow' allow
placing a modulemap outside the /usr folder? (For cases like *gcc's
libstdc++*).

There are a few related problems with this. One is that we need to be able
to map from a #included file's name to the module map file, if we're
loading that module map lazily. Another is that files named in a module map
file are found relative to that flie.

We can solve the first problem with -fmodule-map-file=<libstdc++ module

. For the second half, I've been discussing with a few people the idea

of allowing a module map file to specify a "module root" directory relative
to which its files are found, which need not be the directory in which the
map is placed. (This also helps with another problem: diagnostics when
building or using a module point to files relative to the module map file,
which can result in some rather contorted and unnatural paths.)

Vassil

(2) is a bug.

In both cases we would need to extend module map parsing to allow submodule extensions. I think the original proposal accommodates that with the possible need that once clang supports separating a top module to different pcm files that we may need to control whether there is a combined module file or multiple ones.
Given that separating a top module in multiple pcm files is a rather intrusive change and the combined pcm file is sufficient for our needs, I’d like to proceed for now with the extending module map functionality which will result in one top level .pcm file. Is this reasonable ?

Hi all,

For frameworks Clang currently supports adding a separate module map
file for the private headers of the framework. It looks specifically for
the presence of ‘module.private.modulemap’ inside the .framework and parses
both the public and the private module maps when it processes its module.
We would like to extend support for private module maps for non-framework
headers as well.

In the Darwin platform, the public SDK headers are located in
'/usr/include', while the associated private SDK headers are located in
'/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
the header search paths.

I worry that this will be fragile. If for any reason we look in
/usr/include but not in /usr/local/include, we'll not load the private
extension map and things will probably go quite badly from that point
onwards. If the presence of the /usr/local/include headers is a fundamental
part of a /usr/include module, then it seems better to me to specify that
within the /usr/include module map.

So here's one possibility: allow 'extern module' declarations to be
nested within other modules, then write your /usr/include module map as:

module MyModule {
  <...>
  extern module SomethingPrivate
"/usr/local/include/module.private.map"
}

This has drawbacks:

- Details of the private SDK, “leak out” to the public one. It should
work similar to frameworks, in that the public SDK remains the same
irrespective if there is or not a private API, and the private API is a
straight addition on top of the public one without needing to modify
something in the public SDK.
- It is a bit weak as guarantee anyway because the public module map
must necessarily function even when the extension map is missing, which
means pointing at the wrong path or missing the private map when you really
need it will not be detected.
- Flexibility to extend a module from any path may be valuable for
testing.

OK, I'm not sure I understand what problem you're solving. If the
/usr/local/include stuff works as a layer on top of /usr/include, why do
you need them to be built as part of the same module? (Do your
/usr/local/include headers override / #include_next some of the
/usr/include headers, perhaps? If so, do you need the #includes in
/usr/include to find the /usr/local/include headers rather than the
/usr/include headers?)

There are some cases of cycles between public/private headers which
would be accommodated by a single module build but the primary motivation
is that we would like the module public/private interface to be under the
same namespace, so you’d do

@import Dispatch;
@import Dispatch.Private;

@import Darwin.POSIX.Foo.Bar;
@import Darwin.POSIX.Foo.Bar.Private;

and generally any kind of private extension:

@import Dispatch.SuperCoolButPrivate;

Do you want / need them to be built as a single module file, or not?

As I said, cycles may make things difficult for separate module files,
but how are we going to get new submodules under the same module name with
separate module files ?

Well, the restriction that module files correspond to top-level module
names is arbitrary and artificial. (It's also a bad idea for a few reasons.
It makes incremental refactoring very hard, for instance, because you're
required to have no cycles at any point between things in different
top-level modules.)

Splitting up the description of how to build a module file across various
module maps seems like a very error-prone strategy, especially if you're
intending to be able to stop looking before you've read all of the relevant
module maps.

I think that the high level parts of my proposal are not dependent on
whether we build one .pcm file or multiple ones, this is an implementation
detail.
To be more specific, if we have

*module.private.modulemap *(extension):
extern module Dispatch
module Dispatch.Private {
  <headers>
}

*module.modulemap:*
module Dispatch {
  <headers>
}

It is an implementation detail whether we buiild one Dispatch.pcm file or
a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it should
make no difference on user code.
Is this incorrect ?

Whether we build one .pcm or multiple is observable in some circumstances.
1) We concatenate together all the header files built as part of one .pcm
file, and parse them all at once, and that is not always semantically
equivalent to building them in two separate passes. 2) If you have one big
Dispatch.pcm file which also contains the private bits, and by any sequence
of events you end up also pulling in another Dispatch.pcm that contains the
public headers but not the private ones, you may get ambiguity errors. 3)
We do not allow circular references across .pcm files but do allow them
within a .pcm file.

(2) is a bug.

Perhaps. Suppose I put this in a header:

extern struct {
  int a, b;
} s;

If that ends up in two different translation units, do I have an error? In
C++, the answer is "yes", because the types are different; in C, it's "no",
because the types are compatible. Extending this to the modules world, if I
have the above in two different modules, should they be mergeable?

With the "bottom-up modularization" approach that Apple has been taking so
far, I expect you'll find that merging a Dispatch.pcm and a
DispatchWithExtensions.pcm together won't work very well. (Ironically, I'd
have more confidence this would work if you were using C++ rather than C,
because the merging story is more developed and better tested there.) And
even if it does work, the possibility of having two different modules
providing the same interface as part of the same translation unit seems
like a bad idea.

I think you should choose: either (a) if there's a private extension, then
you somehow guarantee that you only ever build / use the .pcm with that
extension and never mix that with a .pcm built without the extension, or
(b) treat the private extension as a layer on top of the public module.

If your Dispatch.Private is simply a layer on top of Dispatch, then
building them as two separate .pcm files seems like the right choice; it
keeps your Dispatch module (for want of a better word) modular. If on the
other hand, you need includes/imports in Dispatch to pull in headers /
submodules from Dispatch.Private, then one big Dispatch .pcm is probably
the right answer, and we'd need something like your proposal so we could
say "here is a Dispatch module that's like the one in /usr/include but
different in the following ways”.

(Whichever of these options we pick, we can make the "@import
Dispatch.Private" syntax do the right thing.)

In both cases we would need to extend module map parsing to allow
submodule extensions. I think the original proposal accommodates that with
the possible need that once clang supports separating a top module to
different pcm files that we may need to control whether there is a combined
module file or multiple ones.
Given that separating a top module in multiple pcm files is a rather
intrusive change and the combined pcm file is sufficient for our needs, I’d
like to proceed for now with the extending module map functionality which
will result in one top level .pcm file. Is this reasonable ?

A few questions:

What are you going to use as the defining module map file (part of the
'key' used for determining .pcm identity)? Is it the defining module map
for the top-level module or the module map with the extension?
What happens if multiple module maps try to extend the same module?
Why do you need a separate module map file name? Why not just put your
extension into the normal module.modulemap file?

Also note that 'extern module' takes a string literal pointing to the
module map file defining the module, and it triggers us to recursively load
that module map file; your approach doesn't seem to take this into account.
I think when we see

  extern module Foo "blah"
  module Foo.Bar { ... }

... we should parse "blah" before parsing the extension (this is the
natural result of the way the code is currently laid out) rather than
waiting until we hit "blah" on the search path and then adding an extension
to it.

The key to determine identity will include both module map files.

I’d prefer to allow it if they extend with different submodules, and error if they try to define the same submodule.

The reason was to disallow module definitions in the private module map file:

  • A module definition inside a private extension will be disallowed. The rationale is that otherwise it will be a very common mistake for users to write

module.modulemap:
module Foo {

}

module.private.modulemap:
module Foo {

}

and then be left scratching their heads wondering why things are broken (things missing, headers included textually, etc.). Being more strict in private extension maps will be beneficial.

I was aware of this functionality but if I think it is unnecessary boilerplate and restricted flexibility if we force all the authors to write module maps like:

extern module Dispatch “…/…/…/usr/include/dispatch/module.modulemap”

The compiler is going to either find the module definition or error about it, so there’s not much benefit to be hardcoding paths; such hardcoding will restrict flexibility and make testing more cumbersome.

I’m not suggesting of course to remove the functionality, only to also allow "extern module Dispatch”.

Technically we don’t need to wait until we hit “blah". "extern module Dispatch” can create an ‘incomplete’ Dispatch module, the submodules will be added and once we parse the module definition, the module will be ‘complete’ (and module lookup will succeed).

Hi all,

For frameworks Clang currently supports adding a separate module map
file for the private headers of the framework. It looks specifically for
the presence of ‘module.private.modulemap’ inside the .framework and parses
both the public and the private module maps when it processes its module.
We would like to extend support for private module maps for non-framework
headers as well.

In the Darwin platform, the public SDK headers are located in
'/usr/include', while the associated private SDK headers are located in
'/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
the header search paths.

I worry that this will be fragile. If for any reason we look in
/usr/include but not in /usr/local/include, we'll not load the private
extension map and things will probably go quite badly from that point
onwards. If the presence of the /usr/local/include headers is a fundamental
part of a /usr/include module, then it seems better to me to specify that
within the /usr/include module map.

So here's one possibility: allow 'extern module' declarations to be
nested within other modules, then write your /usr/include module map as:

module MyModule {
  <...>
  extern module SomethingPrivate
"/usr/local/include/module.private.map"
}

This has drawbacks:

- Details of the private SDK, “leak out” to the public one. It should
work similar to frameworks, in that the public SDK remains the same
irrespective if there is or not a private API, and the private API is a
straight addition on top of the public one without needing to modify
something in the public SDK.
- It is a bit weak as guarantee anyway because the public module map
must necessarily function even when the extension map is missing, which
means pointing at the wrong path or missing the private map when you really
need it will not be detected.
- Flexibility to extend a module from any path may be valuable for
testing.

OK, I'm not sure I understand what problem you're solving. If the
/usr/local/include stuff works as a layer on top of /usr/include, why do
you need them to be built as part of the same module? (Do your
/usr/local/include headers override / #include_next some of the
/usr/include headers, perhaps? If so, do you need the #includes in
/usr/include to find the /usr/local/include headers rather than the
/usr/include headers?)

There are some cases of cycles between public/private headers which
would be accommodated by a single module build but the primary motivation
is that we would like the module public/private interface to be under the
same namespace, so you’d do

@import Dispatch;
@import Dispatch.Private;

@import Darwin.POSIX.Foo.Bar;
@import Darwin.POSIX.Foo.Bar.Private;

and generally any kind of private extension:

@import Dispatch.SuperCoolButPrivate;

Do you want / need them to be built as a single module file, or not?

As I said, cycles may make things difficult for separate module files,
but how are we going to get new submodules under the same module name with
separate module files ?

Well, the restriction that module files correspond to top-level module
names is arbitrary and artificial. (It's also a bad idea for a few reasons.
It makes incremental refactoring very hard, for instance, because you're
required to have no cycles at any point between things in different
top-level modules.)

Splitting up the description of how to build a module file across
various module maps seems like a very error-prone strategy, especially if
you're intending to be able to stop looking before you've read all of the
relevant module maps.

I think that the high level parts of my proposal are not dependent on
whether we build one .pcm file or multiple ones, this is an implementation
detail.
To be more specific, if we have

*module.private.modulemap *(extension):
extern module Dispatch
module Dispatch.Private {
  <headers>
}

*module.modulemap:*
module Dispatch {
  <headers>
}

It is an implementation detail whether we buiild one Dispatch.pcm file
or a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it
should make no difference on user code.
Is this incorrect ?

Whether we build one .pcm or multiple is observable in some
circumstances. 1) We concatenate together all the header files built as
part of one .pcm file, and parse them all at once, and that is not always
semantically equivalent to building them in two separate passes. 2) If you
have one big Dispatch.pcm file which also contains the private bits, and by
any sequence of events you end up also pulling in another Dispatch.pcm that
contains the public headers but not the private ones, you may get ambiguity
errors. 3) We do not allow circular references across .pcm files but do
allow them within a .pcm file.

(2) is a bug.

Perhaps. Suppose I put this in a header:

extern struct {
  int a, b;
} s;

If that ends up in two different translation units, do I have an error? In
C++, the answer is "yes", because the types are different; in C, it's "no",
because the types are compatible. Extending this to the modules world, if I
have the above in two different modules, should they be mergeable?

With the "bottom-up modularization" approach that Apple has been taking so
far, I expect you'll find that merging a Dispatch.pcm and a
DispatchWithExtensions.pcm together won't work very well. (Ironically, I'd
have more confidence this would work if you were using C++ rather than C,
because the merging story is more developed and better tested there.) And
even if it does work, the possibility of having two different modules
providing the same interface as part of the same translation unit seems
like a bad idea.

Dispatch.pcm and Dispatch[WithExtensions].pcm will never show up in the
same translation unit, they will have different identities; similar to how
module maps from different paths result in different pcm files for the same
module, and they are not mixed together in the same translation unit.

Consider the following:

"d.h" is in Dispatch, and "dx.h" is in DispatchWithExtensions.
Module M1 says '#include "d.h"'. We implicitly load only the /usr/include
module map, find the Dispatch module, and build it, without ever loading
the extension.
Module M2 says '#include "dx.h"'. That results in us building the
DispatchWithExtensions module.
Now suppose a translation unit imports M1 and M2. We end up with both
.pcm's in the same TU.

How does your approach prevent the above from happening?

I think you should choose: either (a) if there's a private extension, then
you somehow guarantee that you only ever build / use the .pcm with that
extension and never mix that with a .pcm built without the extension, or
(b) treat the private extension as a layer on top of the public module.

If your Dispatch.Private is simply a layer on top of Dispatch, then
building them as two separate .pcm files seems like the right choice; it
keeps your Dispatch module (for want of a better word) modular. If on the
other hand, you need includes/imports in Dispatch to pull in headers /
submodules from Dispatch.Private, then one big Dispatch .pcm is probably
the right answer, and we'd need something like your proposal so we could
say "here is a Dispatch module that's like the one in /usr/include but
different in the following ways”.

(Whichever of these options we pick, we can make the "@import
Dispatch.Private" syntax do the right thing.)

In both cases we would need to extend module map parsing to allow
submodule extensions. I think the original proposal accommodates that with
the possible need that once clang supports separating a top module to
different pcm files that we may need to control whether there is a combined
module file or multiple ones.
Given that separating a top module in multiple pcm files is a rather
intrusive change and the combined pcm file is sufficient for our needs, I’d
like to proceed for now with the extending module map functionality which
will result in one top level .pcm file. Is this reasonable ?

A few questions:

What are you going to use as the defining module map file (part of the
'key' used for determining .pcm identity)? Is it the defining module map
for the top-level module or the module map with the extension?

The key to determine identity will include both module map files.

What happens if multiple module maps try to extend the same module?

I’d prefer to allow it if they extend with different submodules, and error
if they try to define the same submodule.

Why do you need a separate module map file name? Why not just put your
extension into the normal module.modulemap file?

The reason was to disallow module definitions in the private module map
file:

I don't think you need that if your 'extern module' specifies a path. You
just imported a definition of the module; it'd be an error to provide
another one.

- A module definition inside a private extension will be disallowed. The

Do you mean that M1 and M2 are built from different translation units with different sets of search paths ? Or are they built with the same search paths ?
If they are built with the same search paths then they will both build and depend on the Dispatch[WithExtensions] module file. During header lookup for '#include "d.h”’ we will be parsing the module maps in the search paths, even in the paths that do not contain “d.h” (clang already does this). When we complete the Dispatch module definition it will be a definition that includes the extensions.

If M1 and M2 are built with different search paths then the translation unit that imports both will rebuild M1 because it doesn’t depend on the expected Dispatch-.pcm.

Per my example, it protects against writing a module definition directly and missing any “extern” declaration (we could have “extern module” without a path also trigger an error if you define the extern’ed module in the same map file).

In general I like that a ‘module.private.modulemap’ would immediately indicate that it contains only private extensions to existing modules, while seeing a ‘module.modulemap’ you’d expect to see new module definitions. These can then be present in the same directory (’/usr/local/include/module.private.modulemap’ would contain the extensions while '/usr/local/include/module.modulemap’ would contain new modules that only exist in /usr/local/include).

But I don’t think these are very important benefits; do you feel that it is unnecessary complexity ?

Thanks for the pointers. This makes sense. Would I be able to to specify in the a framework’s directory modulemaps for external dependency. In my particular case, I’d like to be able to express that this is the modulemaps for the external dependency. I was thinking what if we could accommodate more than one modulemap per file. Say: cat module.modulemap: modulemap Map1 { module M1{} … } modulemap Map2 { modulemap_root /usr/include // Will use the virtual file system pretending the modulemap was found at the modulemap_root module N1{} … } IMO this would allow the ‘external dependencies’ to be organized in different configurations. For example, a module per header of bunch of headers for module, whichever decides the framework fits best. For our use-cases that would be great. Maybe this could simplify also the cross referencing modules and visibility also…

Hi all,

For frameworks Clang currently supports adding a separate module
map file for the private headers of the framework. It looks specifically
for the presence of ‘module.private.modulemap’ inside the .framework and
parses both the public and the private module maps when it processes its
module. We would like to extend support for private module maps for
non-framework headers as well.

In the Darwin platform, the public SDK headers are located in
'/usr/include', while the associated private SDK headers are located in
'/usr/local/include’. '/usr/local/include’ comes before '/usr/include’ in
the header search paths.

I worry that this will be fragile. If for any reason we look in
/usr/include but not in /usr/local/include, we'll not load the private
extension map and things will probably go quite badly from that point
onwards. If the presence of the /usr/local/include headers is a fundamental
part of a /usr/include module, then it seems better to me to specify that
within the /usr/include module map.

So here's one possibility: allow 'extern module' declarations to be
nested within other modules, then write your /usr/include module map as:

module MyModule {
  <...>
  extern module SomethingPrivate
"/usr/local/include/module.private.map"
}

This has drawbacks:

- Details of the private SDK, “leak out” to the public one. It
should work similar to frameworks, in that the public SDK remains the same
irrespective if there is or not a private API, and the private API is a
straight addition on top of the public one without needing to modify
something in the public SDK.
- It is a bit weak as guarantee anyway because the public module map
must necessarily function even when the extension map is missing, which
means pointing at the wrong path or missing the private map when you really
need it will not be detected.
- Flexibility to extend a module from any path may be valuable for
testing.

OK, I'm not sure I understand what problem you're solving. If the
/usr/local/include stuff works as a layer on top of /usr/include, why do
you need them to be built as part of the same module? (Do your
/usr/local/include headers override / #include_next some of the
/usr/include headers, perhaps? If so, do you need the #includes in
/usr/include to find the /usr/local/include headers rather than the
/usr/include headers?)

There are some cases of cycles between public/private headers which
would be accommodated by a single module build but the primary motivation
is that we would like the module public/private interface to be under the
same namespace, so you’d do

@import Dispatch;
@import Dispatch.Private;

@import Darwin.POSIX.Foo.Bar;
@import Darwin.POSIX.Foo.Bar.Private;

and generally any kind of private extension:

@import Dispatch.SuperCoolButPrivate;

Do you want / need them to be built as a single module file, or not?

As I said, cycles may make things difficult for separate module files,
but how are we going to get new submodules under the same module name with
separate module files ?

Well, the restriction that module files correspond to top-level module
names is arbitrary and artificial. (It's also a bad idea for a few reasons.
It makes incremental refactoring very hard, for instance, because you're
required to have no cycles at any point between things in different
top-level modules.)

Splitting up the description of how to build a module file across
various module maps seems like a very error-prone strategy, especially if
you're intending to be able to stop looking before you've read all of the
relevant module maps.

I think that the high level parts of my proposal are not dependent on
whether we build one .pcm file or multiple ones, this is an implementation
detail.
To be more specific, if we have

*module.private.modulemap *(extension):
extern module Dispatch
module Dispatch.Private {
  <headers>
}

*module.modulemap:*
module Dispatch {
  <headers>
}

It is an implementation detail whether we buiild one Dispatch.pcm file
or a Dispatch.Private.pcm file that depends on another Dispatch.pcm; it
should make no difference on user code.
Is this incorrect ?

Whether we build one .pcm or multiple is observable in some
circumstances. 1) We concatenate together all the header files built as
part of one .pcm file, and parse them all at once, and that is not always
semantically equivalent to building them in two separate passes. 2) If you
have one big Dispatch.pcm file which also contains the private bits, and by
any sequence of events you end up also pulling in another Dispatch.pcm that
contains the public headers but not the private ones, you may get ambiguity
errors. 3) We do not allow circular references across .pcm files but do
allow them within a .pcm file.

(2) is a bug.

Perhaps. Suppose I put this in a header:

extern struct {
  int a, b;
} s;

If that ends up in two different translation units, do I have an error?
In C++, the answer is "yes", because the types are different; in C, it's
"no", because the types are compatible. Extending this to the modules
world, if I have the above in two different modules, should they be
mergeable?

With the "bottom-up modularization" approach that Apple has been taking
so far, I expect you'll find that merging a Dispatch.pcm and a
DispatchWithExtensions.pcm together won't work very well. (Ironically, I'd
have more confidence this would work if you were using C++ rather than C,
because the merging story is more developed and better tested there.) And
even if it does work, the possibility of having two different modules
providing the same interface as part of the same translation unit seems
like a bad idea.

Dispatch.pcm and Dispatch[WithExtensions].pcm will never show up in the
same translation unit, they will have different identities; similar to how
module maps from different paths result in different pcm files for the same
module, and they are not mixed together in the same translation unit.

Consider the following:

"d.h" is in Dispatch, and "dx.h" is in DispatchWithExtensions.
Module M1 says '#include "d.h"'. We implicitly load only the /usr/include
module map, find the Dispatch module, and build it, without ever loading
the extension.
Module M2 says '#include "dx.h"'. That results in us building the
DispatchWithExtensions module.
Now suppose a translation unit imports M1 and M2. We end up with both
.pcm's in the same TU.

How does your approach prevent the above from happening?

Do you mean that M1 and M2 are built from different translation units with
different sets of search paths ? Or are they built with the same search
paths ?
If they are built with the same search paths then they will both build and
depend on the Dispatch[WithExtensions] module file. During header lookup
for '#include "d.h”’ we will be parsing the module maps in the search
paths, even in the paths that do not contain “d.h” (clang already does
this). When we complete the Dispatch module definition it will be a
definition that includes the extensions.

Hmm, I had thought that the #include -> module mapping would only look for
and load module map files from the path containing the #include'd file. It
seems we do not always perform this optimization today; with your proposed
change, such an optimization would no longer be correct, and that seems
unfortunate.

Also, we do perform this optimization in some cases, which are relevant to
your /usr/include versus /usr/local/include example. Suppose you build with
-Ifoo -Ibar, and you #include "baz/quux.h". If "foo/baz" does not exist but
"bar/baz" does, we will look for "bar/baz/module.modulemap" then for
"bar/module.modulemap". We will not look for "foo/module.modulemap" (nor
for "foo/baz/module.modulemap").

There are probably restrictions you could add that would help here. For
instance: if you have include paths X and Y, and your primary module map is
in X/path/, then your extension module map would need to be in Y/path/.
Even that isn't really sufficient (in particular if X or Y has module maps
in directories underneath /path/, or if the module maps can be found under
multiple include paths, you might still have problems), but it's a good
start.

If M1 and M2 are built with different search paths then the translation

unit that imports both will rebuild M1 because it doesn’t depend on the
expected Dispatch-<hashWithExtensions>.pcm.

I think you should choose: either (a) if there's a private extension, then

you somehow guarantee that you only ever build / use the .pcm with that
extension and never mix that with a .pcm built without the extension, or
(b) treat the private extension as a layer on top of the public module.

If your Dispatch.Private is simply a layer on top of Dispatch, then
building them as two separate .pcm files seems like the right choice; it
keeps your Dispatch module (for want of a better word) modular. If on the
other hand, you need includes/imports in Dispatch to pull in headers /
submodules from Dispatch.Private, then one big Dispatch .pcm is probably
the right answer, and we'd need something like your proposal so we could
say "here is a Dispatch module that's like the one in /usr/include but
different in the following ways”.

(Whichever of these options we pick, we can make the "@import
Dispatch.Private" syntax do the right thing.)

In both cases we would need to extend module map parsing to allow
submodule extensions. I think the original proposal accommodates that with
the possible need that once clang supports separating a top module to
different pcm files that we may need to control whether there is a combined
module file or multiple ones.
Given that separating a top module in multiple pcm files is a rather
intrusive change and the combined pcm file is sufficient for our needs, I’d
like to proceed for now with the extending module map functionality which
will result in one top level .pcm file. Is this reasonable ?

A few questions:

What are you going to use as the defining module map file (part of the
'key' used for determining .pcm identity)? Is it the defining module map
for the top-level module or the module map with the extension?

The key to determine identity will include both module map files.

What happens if multiple module maps try to extend the same module?

I’d prefer to allow it if they extend with different submodules, and
error if they try to define the same submodule.

Why do you need a separate module map file name? Why not just put your
extension into the normal module.modulemap file?

The reason was to disallow module definitions in the private module map
file:

I don't think you need that if your 'extern module' specifies a path. You
just imported a definition of the module; it'd be an error to provide
another one.

Per my example, it protects against writing a module definition directly
and missing any “extern” declaration (we could have “extern module” without
a path also trigger an error if you define the extern'ed module in the same
map file).

In general I like that a ‘module.private.modulemap' would immediately

indicate that it contains only private extensions to existing modules,
while seeing a ‘module.modulemap’ you’d expect to see new module
definitions. These can then be present in the same directory
('/usr/local/include/module.private.modulemap’ would contain the extensions
while '/usr/local/include/module.modulemap’ would contain new modules that
only exist in /usr/local/include).

But I don’t think these are very important benefits; do you feel that it
is unnecessary complexity ?

If you're going to load both module map files anyway, you should get an
error on the module redefinition, so in a lot of cases you'd notice you got
it wrong pretty quickly. If this were a really common use case, the idea of
having another kind of module map file might be worth considering, but if
it's only really to allow /usr/local/include to partially override/extend
/usr/include modules, I think we should go for the simpler approach.

I guess the minimum viable solution would be something like -fmodule-map-file-with-root=,/path/to/module/root. This has the advantage that your module map file doesn’t tie itself to a particular directory.

Thanks for the pointers. This makes sense. Would I be able to to specify in the a framework’s directory modulemaps for external dependency. In my particular case, I’d like to be able to express that this is the modulemaps for the external dependency.

I was thinking what if we could accommodate more than one modulemap per file. Say:
cat module.modulemap:
modulemap Map1 {
module M1{}

}
modulemap Map2 {
modulemap_root /usr/include // Will use the virtual file system pretending the modulemap was found at the modulemap_root
module N1{}

}

In this scheme, I would make your hypothetical root declaration part of the module, not the module map file. i.e

module M1 { }
module N1 {
module_root “/usr/include”

}

That might result in some duplication if you wanted to describe a lot of modules in a single directory from an external location, but it seems cleaner to me than a modulemap { } syntax.

IMO this would allow the ‘external dependencies’ to be organized in different configurations. For example, a module per header of bunch of headers for module, whichever decides the framework fits best. For our use-cases that would be great. Maybe this could simplify also the cross referencing modules and visibility also…

Not sure what you mean here. Would you mind expanding on this a bit?