[C++ modules] Is there any way to add api information into object files?

Hi all,
I just know that Modules has voted in C++20, and I checked the documentation about how Clang implement this feature, correct me if I’m wrong: the third party libraries are keep the same (headers, .so/.a files), and to make modules work, someone needs to add a module map file which can be find by compilers and using it to find the header. And without doing the #include copy thing, the compiler directly find the API the code use.

my question is, if I’m writing a new library(no code legacy problem), can I some how abandon headers, just API meta data and shared libraries?

For example, just like java(which add API information in .class file), is there any trying on adding API meta data to a object file?

Hi all,
I just know that Modules has voted in C++20, and I checked the documentation about how Clang implement this feature, correct me if I’m wrong: the third party libraries are keep the same (headers, .so/.a files), and to make modules work, someone needs to add a module map file which can be find by compilers and using it to find the header.

There’s lots still to be decided/worked out/implemented in terms of what this looks like for end users.

It’s expected that eventually third party libraries would ship modular code. (& eventually the standard library would have a modular interface (as well as the classic #include interface at the very least for backwards compatibility - but maybe some non-modular pieces of the standard library, etc) too - not in C++20, as I understand it)

Module maps are a feature of Clang’s “header modules” (which roughly maps to the “header units” language in the C++ proposal/working draft, etc). The standard wording, as I understand it, only says that “if you treat something like a header unit (import “quoted_identifier”) then expect that it /can/ be treated as an isolated unit in other places where it’s included using a normal #include” - it’s not expected that implementations will scan for these sort of headers & so probably more likely implementations would use something closer to/the same as these module maps that declare ahead of time the headers to treat as header units.

And without doing the #include copy thing, the compiler directly find the API the code use.

Not sure what you mean by the “#include copy thing” - oh, you mean classic preprocessing?

It’s a bit more complicated than “the compiler directly finds the API the code uses” - what “directly”? That’s sort of still open to discussion.

Generally the thinking is that compilers will generally build some kind of “Binary Module Interface” (yes, /very/ roughly analogous to a .class file in Java) from the source of a module.

But these BMI files (clang calls them PCM files - PreCompiled Modules) are not portable between machines, compilers, or even between different builds (with different -D flags, machine targets, etc) because C++ ASTs are /very/ tied to the specific target, unlike something like Java. That means they won’t be shipped along with your library or the like.

You’ll still be shipping C++ module interface source files like you ship header files today.

Any project that wants to use those will need a build system capable of finding these module interface source files (perhaps in locations similar to/the same as header files), building the appropriate BMIs from them (that’s a new step that’s causing some discussion about just how a compiler will determine how/when to build these BMIs, where to store them (it’ll have to be somewhere within the build tree) and consume them, etc), and passing them to the appropriate compilations taht depend on those modules.

my question is, if I’m writing a new library(no code legacy problem), can I some how abandon headers, just API meta data and shared libraries?

For example, just like java(which add API information in .class file), is there any trying on adding API meta data to a object file?

No, I don’t believe so - it’d be at the wrong layer of abstrtaction - object files have much stronger guarantees between compilers (see, for example, the Itanium ABI) and can have relatively long lifetimes, be used across different compilers (a library may be built with GCC, but the users of that library may be built with Clang, etc). BMIs do not have any such guarantees & need to be generated on a per-project basis, roughly speaking.

  • Dave

Hi David,
I got a question:

because C++ ASTs are /very/ tied to the specific target

Why ASTs involved here? I considered that the module feature is happened at link time. Does this mean the BMI files do not contain machine code, but some kind of source code inside?(like C#? compiler takes all source code at once, so there’s no link time normally?)
And,

Do you happen to know how VC++/Golang doing this?
I can’t find any details about how VC++ implement the module, and Golang is the only language I know which build directly to machine code while do not use headers(correct me if I’m wrong).

Hi David,
I got a question:

because C++ ASTs are /very/ tied to the specific target

Why ASTs involved here? I considered that the module feature is happened at link time.

Ah, no - it’s a compile time feature - the underlying infrastructure that’s already been used in Clang for many years (“clang header modules”) is designed to reduce (especially incremental) compilation time - by reusing as much as possible the work of parsing header files (rather than each compilation that uses a header having to reparse from scratch). The lanugage feature then layers on top of that, providing language level isolation for these reusable chunks.

The BMI is basically a full fidelity binary representation of the header/module that’s more efficient to use than the raw header - lazy loadable, fast lookup, etc.

Does this mean the BMI files do not contain machine code, but some kind of source code inside?

Correct - a binary representation of the headers or modules.

(like C#? compiler takes all source code at once, so there’s no link time normally?)

I’m not especially familiar with C#'s compilation model.

And,

Do you happen to know how VC++/Golang doing this?

Not sure, no - I know VC++'s pre-standard modules is a bit different from Clang’s - sounds like VC++'s are a bit more stable & VC++ ships standard library BMIs with the compiler (so they must be reusable amongst at least a few build modes, I guess).

But don’t take any of that as guaranteed - just a very rough guess.

  • Dave