[DISCUSSION] - modularize - module map generation (repost from cfe-commits)

(I originally posted this on cfe-commits, but I learned cfe-dev would be more appropriate.)

Hi,

The page for modules (http://clang.llvm.org/docs/Modules.html#future-directions) mentions in the future directions section enhancing modularize with an assistant mode for generating a module.map file. I’m starting to think about that, so before I plow ahead and possibly do something wrong, I wanted to open up a discussion for ideas and feedback, particularly since modules are new, and I might not have the correct understanding about them yet.

First off, do you think having modularize optionally generate a module.map file is even warranted? If it is, how close can we get to producing a useable module.map file? But perhaps the idea is just to create a starting point, to save some typing, since we would already have a list of header files. I’ll proceed along those lines.

Basically, as a starting point, I assume the header file name list is the basis for the headers included in the modules, except that these headers can still include other headers via #include, and are not listed because they are either internal, or they are part of a group but not independent and should not result in a separate submodule.

Then at this point, that raises the question of how you figure out the module/submodule hierarchy.

The simplest scheme is that all the files in the header file list pertain to just one outer module, named by a command-line option. For example, “-root=std” would define one outer module called “std” and each header in the header file list would be a submodule inside “std”.

The header file list might have headers that are in subdirectories. Perhaps a modification of the above scheme could be that a submodule is created for the subdirectory name, and the headers in the subdirectory go inside that module, becoming sub-sub-modules and so on.

As an alternative, perhaps namespace blocks could be used to determine modules. The outermost namespace determines the root module, and nested namespaces become the submodule hierarchy.

This raises the question of what to do about global definitions. Perhaps we could still use the -root=name option, but if the outer namespace uses that name, it’s effectively ignored, so the hierarchy is still as expected.

This also raises the question of file boundaries themselves. If multiple files are in a single leaf namespace, are they ignored with respect to becoming separate submodules, or do they still become submodules within the module for the leaf namespace? (Can a “module” statement have multiple “header” statements in the module.map language?) Perhaps if the namespace only occurs in the one file, it results in just one module. But if the leaf namespace is used in multiple files, the files become submodules with respect to the module created for the leaf namespace. This might be pretty difficult to figure out.

If we needed options for module map output per header file, the header file list format might be extended with ‘-‘ options, but perhaps that’s taking it too far, as we don’t want the header list complexity to start approaching that of the module.map.

Anyway, these are the ideas and questions that came to my mind. What do you think? Is there more that could be done to bring the output module.map file closer to what’s needed without making it too complicated?

Thanks.

-John

Hi John,

(I originally posted this on cfe-commits, but I learned cfe-dev would be more appropriate.)

Hi,

The page for modules (http://clang.llvm.org/docs/Modules.html#future-directions) mentions in the future directions section enhancing modularize with an assistant mode for generating a module.map file. I’m starting to think about that, so before I plow ahead and possibly do something wrong, I wanted to open up a discussion for ideas and feedback, particularly since modules are new, and I might not have the correct understanding about them yet.

First off, do you think having modularize optionally generate a module.map file is even warranted? If it is, how close can we get to producing a useable module.map file? But perhaps the idea is just to create a starting point, to save some typing, since we would already have a list of header files. I’ll proceed along those lines.

I imagine it will only be a starting point, but simply enumerating all of the header files that get pulled in would be helpful.

Basically, as a starting point, I assume the header file name list is the basis for the headers included in the modules, except that these headers can still include other headers via #include, and are not listed because they are either internal, or they are part of a group but not independent and should not result in a separate submodule.

Then at this point, that raises the question of how you figure out the module/submodule hierarchy.

The simplest scheme is that all the files in the header file list pertain to just one outer module, named by a command-line option. For example, “-root=std” would define one outer module called “std” and each header in the header file list would be a submodule inside “std”.

The header file list might have headers that are in subdirectories. Perhaps a modification of the above scheme could be that a submodule is created for the subdirectory name, and the headers in the subdirectory go inside that module, becoming sub-sub-modules and so on.

This would be my suggestion. Assume that the directory structure implies a module structure, and go from there. I know I've seen a number of libraries where this heuristic would be a good start.

As an alternative, perhaps namespace blocks could be used to determine modules. The outermost namespace determines the root module, and nested namespaces become the submodule hierarchy.

This is potentially interesting, but I suspect it won't help as much. At least, not in the namespacing structures I've seen.

  - Doug

Thanks for the feedback, Doug.

-John