RFC: APIs for bitcode files containing multiple modules

Hi all,

As mentioned in my recent RFC entitled “RFC: a more detailed design for ThinLTO + vcall CFI” I would like to introduce the ability for bitcode files to contain multiple modules. In https://reviews.llvm.org/D24786 I took a step towards that by proposing a change to the module format so that the block info block is stored at the top level. The next step is to think about what the API would look like for reading and writing multiple modules.

Here’s what I have in mind. To create a multi-module bitcode file, you would create a BitcodeWriter object and add modules to it:

BitcodeWriter W(OS);
W.addModule(M1);
W.addModule(M2);
W.write();

Reading a multi-module bitcode file would be supported with a BitcodeReader class. Each of the functional reader APIs in ReaderWriter.h would have a member function on BitcodeReader. We would also have a next() member function which would move to the next module in the file. For example:

BitcodeReader R(MBRef);
Expected B = R.hasGlobalValueSummary();

std::unique_ptr M1 = R.getLazyModule(Ctx); // lazily load the first module

R.next();

std::unique_ptr M2 = R.parseBitcodeFile(Ctx); // eagerly load the second module

We’d continue to support the existing functional APIs in ReaderWriter.h for convenience in the common case where the bitcode file has a single module.

Thanks,

Hi all,

As mentioned in my recent RFC entitled “RFC: a more detailed design for ThinLTO + vcall CFI” I would like to introduce the ability for bitcode files to contain multiple modules. In https://reviews.llvm.org/D24786 I took a step towards that by proposing a change to the module format so that the block info block is stored at the top level. The next step is to think about what the API would look like for reading and writing multiple modules.

Here’s what I have in mind. To create a multi-module bitcode file, you would create a BitcodeWriter object and add modules to it:

BitcodeWriter W(OS);
W.addModule(M1);
W.addModule(M2);
W.write();

That requires the two modules to lives longer than the bitcode write, the API could be:

BitcodeWriter W(OS);
W.writeModule(M1);
// delete M1
// …
// create M2
W.writeModule(M2);

(Maybe you had this in mind, but the API naming didn’t reflect it so I’m not sure).

Reading a multi-module bitcode file would be supported with a BitcodeReader class. Each of the functional reader APIs in ReaderWriter.h would have a member function on BitcodeReader. We would also have a next() member function which would move to the next module in the file. For example:

BitcodeReader R(MBRef);
Expected B = R.hasGlobalValueSummary();

std::unique_ptr M1 = R.getLazyModule(Ctx); // lazily load the first module

R.next();

std::unique_ptr M2 = R.parseBitcodeFile(Ctx); // eagerly load the second module

That makes the API quite stateful, you may have good implementation reason for this, but they’re not clear to me.
I rather see the bitcode reader as a random access container, iterating over modules.

Hi all,

As mentioned in my recent RFC entitled "RFC: a more detailed design for
ThinLTO + vcall CFI" I would like to introduce the ability for bitcode
files to contain multiple modules. In https://reviews.llvm.org/D24786 I
took a step towards that by proposing a change to the module format so that
the block info block is stored at the top level. The next step is to think
about what the API would look like for reading and writing multiple modules.

Here's what I have in mind. To create a multi-module bitcode file, you
would create a BitcodeWriter object and add modules to it:

BitcodeWriter W(OS);
W.addModule(M1);
W.addModule(M2);
W.write();

That requires the two modules to lives longer than the bitcode write, the
API could be:

BitcodeWriter W(OS);
W.writeModule(M1);
// delete M1
// ...
// create M2
W.writeModule(M2);

(Maybe you had this in mind, but the API naming didn’t reflect it so I’m
not sure).

In the API I prototyped, I took the maximum BitsRequiredForTypeIndices
value from all the modules, and used it to produce the abbreviations for
the top level block info block (without this I was seeing "Unexpected
abbrev ordering!" errors in the bitcode writer as a result of emitting the
"same" abbreviation multiple times). That would have required us to keep
the modules around until the call to write(). However, let me revisit this,
because it does not seem necessary (i.e. we can just continue to emit block
info blocks within the module block except with different abbreviation
numbers for each module).

Reading a multi-module bitcode file would be supported with a
BitcodeReader class. Each of the functional reader APIs in ReaderWriter.h
would have a member function on BitcodeReader. We would also have a next()
member function which would move to the next module in the file. For
example:

BitcodeReader R(MBRef);
Expected<bool> B = R.hasGlobalValueSummary();
std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the
first module
R.next();
std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load the
second module

That makes the API quite stateful, you may have good implementation reason
for this, but they’re not clear to me.
I rather see the bitcode reader as a random access container, iterating
over modules.

Random access seems reasonable to me as well. I will see how feasible that
is.

Thanks,

Hi all,

As mentioned in my recent RFC entitled "RFC: a more detailed design for
ThinLTO + vcall CFI" I would like to introduce the ability for bitcode files
to contain multiple modules. In https://reviews.llvm.org/D24786 I took a
step towards that by proposing a change to the module format so that the
block info block is stored at the top level. The next step is to think about
what the API would look like for reading and writing multiple modules.

Here's what I have in mind. To create a multi-module bitcode file, you
would create a BitcodeWriter object and add modules to it:

BitcodeWriter W(OS);
W.addModule(M1);
W.addModule(M2);
W.write();

That requires the two modules to lives longer than the bitcode write, the
API could be:

BitcodeWriter W(OS);
W.writeModule(M1);
// delete M1
// ...
// create M2
W.writeModule(M2);

(Maybe you had this in mind, but the API naming didn’t reflect it so I’m
not sure).

In the API I prototyped, I took the maximum BitsRequiredForTypeIndices value
from all the modules, and used it to produce the abbreviations for the top/
level block info block (without this I was seeing "Unexpected abbrev
ordering!" errors in the bitcode writer as a result of emitting the "same"
abbreviation multiple times). That would have required us to keep the
modules around until the call to write(). However, let me revisit this,
because it does not seem necessary (i.e. we can just continue to emit block
info blocks within the module block except with different abbreviation
numbers for each module).

Reading a multi-module bitcode file would be supported with a
BitcodeReader class. Each of the functional reader APIs in ReaderWriter.h
would have a member function on BitcodeReader. We would also have a next()
member function which would move to the next module in the file. For
example:

BitcodeReader R(MBRef);
Expected<bool> B = R.hasGlobalValueSummary();

What's this used for? Would there be a "readGlobalValueSummary()"
similar to function summaries?

std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the
first module
R.next();
std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load the
second module

I'm very excited about the idea of storing multiple modules in a
bitcode file, and the (thin)LTO and CFI goodness you're building using
it.

I have a few questions about where you're going if you don't mind--and
it's related to the API in that it's awfully hard to judge an API
without knowing what it's expected to be used for or what the
underlying data represents.

On that-- I'm sorry if I've missed this information, but reading
through your RFC's and posts I'm not finding the answer.
Is there a definition/explanation of what it means to have a bitcode
file containing multiple modules?

Is this a storage optimization where each module is what today is an
"llvm::Module" but we're encoding them into a single file for
efficiency/convenience reasons?

If so, can these modules have different triples? Different
("conflicting") definitions for a global?

There are also multiple tools that take bitcode as input, and
currently expect a single module.
Will these be made to reject multiple-module bitcode, and if not is
the plan to extend tools to handle multiple-module files?

Beyond the random access suggestion (+1) and lifetime comments, it
seems like there should be a way to reason about the contents of these
modules--names, identifiers, flags, *something* so that "load the
first module lazily and the second eagerly" can become "load the
module containing my CFI information eagerly but the rest lazily" or
something, or at least to check that this file was created using
-fsanitize=cfi and not something else.

Anyway sorry for all the questions and thanks for your efforts,
looking forward to using this in the near future! :slight_smile:

>>
>>
>>
>> Hi all,
>>
>> As mentioned in my recent RFC entitled "RFC: a more detailed design for
>> ThinLTO + vcall CFI" I would like to introduce the ability for bitcode
files
>> to contain multiple modules. In https://reviews.llvm.org/D24786 I took
a
>> step towards that by proposing a change to the module format so that the
>> block info block is stored at the top level. The next step is to think
about
>> what the API would look like for reading and writing multiple modules.
>>
>> Here's what I have in mind. To create a multi-module bitcode file, you
>> would create a BitcodeWriter object and add modules to it:
>>
>> BitcodeWriter W(OS);
>> W.addModule(M1);
>> W.addModule(M2);
>> W.write();
>>
>>
>> That requires the two modules to lives longer than the bitcode write,
the
>> API could be:
>>
>> BitcodeWriter W(OS);
>> W.writeModule(M1);
>> // delete M1
>> // ...
>> // create M2
>> W.writeModule(M2);
>>
>> (Maybe you had this in mind, but the API naming didn’t reflect it so I’m
>> not sure).
>
>
> In the API I prototyped, I took the maximum BitsRequiredForTypeIndices
value
> from all the modules, and used it to produce the abbreviations for the
top/
> level block info block (without this I was seeing "Unexpected abbrev
> ordering!" errors in the bitcode writer as a result of emitting the
"same"
> abbreviation multiple times). That would have required us to keep the
> modules around until the call to write(). However, let me revisit this,
> because it does not seem necessary (i.e. we can just continue to emit
block
> info blocks within the module block except with different abbreviation
> numbers for each module).
>>
>> Reading a multi-module bitcode file would be supported with a
>> BitcodeReader class. Each of the functional reader APIs in
ReaderWriter.h
>> would have a member function on BitcodeReader. We would also have a
next()
>> member function which would move to the next module in the file. For
>> example:
>>
>> BitcodeReader R(MBRef);
>> Expected<bool> B = R.hasGlobalValueSummary();

What's this used for?

This would be the equivalent to the existing llvm::hasGlobalValueSummary()
function, which currently controls whether we compile a module with regular
LTO or with ThinLTO.

Would there be a "readGlobalValueSummary()"

similar to function summaries?

There would be a getModuleSummaryIndex() which again would be similar
to llvm::getModuleSummaryIndex(). Note that the module summary already
covers all global values, not just functions.

std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the

>> first module
>> R.next();
>> std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load
the
>> second module

I'm very excited about the idea of storing multiple modules in a
bitcode file, and the (thin)LTO and CFI goodness you're building using
it.

I have a few questions about where you're going if you don't mind--and
it's related to the API in that it's awfully hard to judge an API
without knowing what it's expected to be used for or what the
underlying data represents.

On that-- I'm sorry if I've missed this information, but reading
through your RFC's and posts I'm not finding the answer.
Is there a definition/explanation of what it means to have a bitcode
file containing multiple modules?

Is this a storage optimization where each module is what today is an
"llvm::Module" but we're encoding them into a single file for
efficiency/convenience reasons?

Yes, each module would be an llvm::Module. This is more for convenience
reasons -- it's the simplest way to split modules that use CFI into a
regular LTO part and a ThinLTO part (as described in the RFC entitled "RFC:
a more detailed design for ThinLTO + vcall CFI") while storing the entire
compiled translation unit in a single file.

If so, can these modules have different triples?

That would certainly be possible in principle, but it's not part of my use
case. I'd imagine that another potential use case for this could be to
allow for LTO when targeting heterogeneous architectures (e.g.
CUDA/OpenMP), but I'm not sure about the specifics of how that could work.

Different ("conflicting") definitions for a global?

In principle such inputs would be rejected by the linker with a duplicate
symbol error. That might not be the appropriate thing to do in the
heterogeneous case though.

There are also multiple tools that take bitcode as input, and

currently expect a single module.
Will these be made to reject multiple-module bitcode, and if not is
the plan to extend tools to handle multiple-module files?

For testing purposes I was planning to extend llvm-dis (and possibly opt)
to take a flag specifying a module index, and introduce an llvm-join tool
which could be used to create a bitcode from multiple inputs.

The other tools probably don't need to know about this and could just read
the first module.

Beyond the random access suggestion (+1) and lifetime comments, it

seems like there should be a way to reason about the contents of these
modules--names, identifiers, flags, *something* so that "load the
first module lazily and the second eagerly" can become "load the
module containing my CFI information eagerly but the rest lazily" or
something, or at least to check that this file was created using
-fsanitize=cfi and not something else.

Right, this is the sort of functionality that would be provided by
functions such as hasGlobalValueSummary().

Thanks,

>>
>>
>>
>> Hi all,
>>
>> As mentioned in my recent RFC entitled "RFC: a more detailed design for
>> ThinLTO + vcall CFI" I would like to introduce the ability for bitcode
>> files
>> to contain multiple modules. In https://reviews.llvm.org/D24786 I took
>> a
>> step towards that by proposing a change to the module format so that
>> the
>> block info block is stored at the top level. The next step is to think
>> about
>> what the API would look like for reading and writing multiple modules.
>>
>> Here's what I have in mind. To create a multi-module bitcode file, you
>> would create a BitcodeWriter object and add modules to it:
>>
>> BitcodeWriter W(OS);
>> W.addModule(M1);
>> W.addModule(M2);
>> W.write();
>>
>>
>> That requires the two modules to lives longer than the bitcode write,
>> the
>> API could be:
>>
>> BitcodeWriter W(OS);
>> W.writeModule(M1);
>> // delete M1
>> // ...
>> // create M2
>> W.writeModule(M2);
>>
>> (Maybe you had this in mind, but the API naming didn’t reflect it so
>> I’m
>> not sure).
>
>
> In the API I prototyped, I took the maximum BitsRequiredForTypeIndices
> value
> from all the modules, and used it to produce the abbreviations for the
> top/
> level block info block (without this I was seeing "Unexpected abbrev
> ordering!" errors in the bitcode writer as a result of emitting the
> "same"
> abbreviation multiple times). That would have required us to keep the
> modules around until the call to write(). However, let me revisit this,
> because it does not seem necessary (i.e. we can just continue to emit
> block
> info blocks within the module block except with different abbreviation
> numbers for each module).
>>
>> Reading a multi-module bitcode file would be supported with a
>> BitcodeReader class. Each of the functional reader APIs in
>> ReaderWriter.h
>> would have a member function on BitcodeReader. We would also have a
>> next()
>> member function which would move to the next module in the file. For
>> example:
>>
>> BitcodeReader R(MBRef);
>> Expected<bool> B = R.hasGlobalValueSummary();

What's this used for?

This would be the equivalent to the existing llvm::hasGlobalValueSummary()
function, which currently controls whether we compile a module with regular
LTO or with ThinLTO.

Would there be a "readGlobalValueSummary()"
similar to function summaries?

There would be a getModuleSummaryIndex() which again would be similar to
llvm::getModuleSummaryIndex(). Note that the module summary already covers
all global values, not just functions.

>> std::unique_ptr<Module> M1 = R.getLazyModule(Ctx); // lazily load the
>> first module
>> R.next();
>> std::unique_ptr<Module> M2 = R.parseBitcodeFile(Ctx); // eagerly load
>> the
>> second module

I'm very excited about the idea of storing multiple modules in a
bitcode file, and the (thin)LTO and CFI goodness you're building using
it.

I have a few questions about where you're going if you don't mind--and
it's related to the API in that it's awfully hard to judge an API
without knowing what it's expected to be used for or what the
underlying data represents.

On that-- I'm sorry if I've missed this information, but reading
through your RFC's and posts I'm not finding the answer.
Is there a definition/explanation of what it means to have a bitcode
file containing multiple modules?

Is this a storage optimization where each module is what today is an
"llvm::Module" but we're encoding them into a single file for
efficiency/convenience reasons?

Yes, each module would be an llvm::Module. This is more for convenience
reasons -- it's the simplest way to split modules that use CFI into a
regular LTO part and a ThinLTO part (as described in the RFC entitled "RFC:
a more detailed design for ThinLTO + vcall CFI") while storing the entire
compiled translation unit in a single file.

Hmm, interesting. Thank you for the explanation.

This seems to be closer to partitioning a single Module than
supporting multiple modules (at least not yet).
Does that seem accurate?
If so maybe the API should be geared towards that--allow
"partition-aware" clients to read the pieces individually while
transparently treating the overall file as a single Module for
existing clients.
Just a thought, perhaps this wouldn't work for your use case?

Anyway I actually am very interested in support for multiple modules,
my use case being for use in shipping software in IR form as part of
the ALLVM project. Hence questions about things like linker semantics
and such.

Don't mean to burden you with accommodating the use-cases of everyone
else (like myself),
I guess I was just was surprised to see the bitcode format extended in
this way without an explicit discussion of the bigger picture--
what this was intended to be used for or why it was necessary, where
it was going... :). Mostly because as you say it seems rather useful
for other parties (heterogeneous, for example) but I suppose we/they
can chime in and help refine the details later on once these bits are
committed :).

Thank you for your explanation, very much appreciated :).

If so, can these modules have different triples?

That would certainly be possible in principle, but it's not part of my use
case. I'd imagine that another potential use case for this could be to allow
for LTO when targeting heterogeneous architectures (e.g. CUDA/OpenMP), but
I'm not sure about the specifics of how that could work.

Different ("conflicting") definitions for a global?

In principle such inputs would be rejected by the linker with a duplicate
symbol error. That might not be the appropriate thing to do in the
heterogeneous case though.

Yeah, it seemed unclear what this would "mean" and I suppose for now
is simply something folks can interpret/handle however makes sense for
their use case :).

There are also multiple tools that take bitcode as input, and
currently expect a single module.
Will these be made to reject multiple-module bitcode, and if not is
the plan to extend tools to handle multiple-module files?

For testing purposes I was planning to extend llvm-dis (and possibly opt) to
take a flag specifying a module index, and introduce an llvm-join tool which
could be used to create a bitcode from multiple inputs.

Awesome! I'm not sure how important it is but it seems that it should
be made an error to ignore part of a bitcode file?
(Shouldn't llvm-nm print vtable bits?)

The other tools probably don't need to know about this and could just read
the first module.

Beyond the random access suggestion (+1) and lifetime comments, it
seems like there should be a way to reason about the contents of these
modules--names, identifiers, flags, *something* so that "load the
first module lazily and the second eagerly" can become "load the
module containing my CFI information eagerly but the rest lazily" or
something, or at least to check that this file was created using
-fsanitize=cfi and not something else.

Right, this is the sort of functionality that would be provided by functions
such as hasGlobalValueSummary().

Ah, neat. I'll look into that, since apparently it answers many of my
questions :D. Sorry for the trouble :).

Thanks again, happy LLVM'ing...

~Will

Hi all,

As mentioned in my recent RFC entitled “RFC: a more detailed design for
ThinLTO + vcall CFI” I would like to introduce the ability for bitcode
files
to contain multiple modules. In https://reviews.llvm.org/D24786 I took
a
step towards that by proposing a change to the module format so that
the
block info block is stored at the top level. The next step is to think
about
what the API would look like for reading and writing multiple modules.

Here’s what I have in mind. To create a multi-module bitcode file, you
would create a BitcodeWriter object and add modules to it:

BitcodeWriter W(OS);
W.addModule(M1);
W.addModule(M2);
W.write();

That requires the two modules to lives longer than the bitcode write,
the
API could be:

BitcodeWriter W(OS);
W.writeModule(M1);
// delete M1
// …
// create M2
W.writeModule(M2);

(Maybe you had this in mind, but the API naming didn’t reflect it so
I’m
not sure).

In the API I prototyped, I took the maximum BitsRequiredForTypeIndices
value
from all the modules, and used it to produce the abbreviations for the
top/
level block info block (without this I was seeing “Unexpected abbrev
ordering!” errors in the bitcode writer as a result of emitting the
“same”
abbreviation multiple times). That would have required us to keep the
modules around until the call to write(). However, let me revisit this,
because it does not seem necessary (i.e. we can just continue to emit
block
info blocks within the module block except with different abbreviation
numbers for each module).

Reading a multi-module bitcode file would be supported with a
BitcodeReader class. Each of the functional reader APIs in
ReaderWriter.h
would have a member function on BitcodeReader. We would also have a
next()
member function which would move to the next module in the file. For
example:

BitcodeReader R(MBRef);
Expected B = R.hasGlobalValueSummary();

What’s this used for?

This would be the equivalent to the existing llvm::hasGlobalValueSummary()
function, which currently controls whether we compile a module with regular
LTO or with ThinLTO.

Would there be a “readGlobalValueSummary()”
similar to function summaries?

There would be a getModuleSummaryIndex() which again would be similar to
llvm::getModuleSummaryIndex(). Note that the module summary already covers
all global values, not just functions.

std::unique_ptr M1 = R.getLazyModule(Ctx); // lazily load the
first module
R.next();
std::unique_ptr M2 = R.parseBitcodeFile(Ctx); // eagerly load
the
second module

I’m very excited about the idea of storing multiple modules in a
bitcode file, and the (thin)LTO and CFI goodness you’re building using
it.

I have a few questions about where you’re going if you don’t mind–and
it’s related to the API in that it’s awfully hard to judge an API
without knowing what it’s expected to be used for or what the
underlying data represents.

On that-- I’m sorry if I’ve missed this information, but reading
through your RFC’s and posts I’m not finding the answer.
Is there a definition/explanation of what it means to have a bitcode
file containing multiple modules?

Is this a storage optimization where each module is what today is an
“llvm::Module” but we’re encoding them into a single file for
efficiency/convenience reasons?

Yes, each module would be an llvm::Module. This is more for convenience
reasons – it’s the simplest way to split modules that use CFI into a
regular LTO part and a ThinLTO part (as described in the RFC entitled “RFC:
a more detailed design for ThinLTO + vcall CFI”) while storing the entire
compiled translation unit in a single file.

Hmm, interesting. Thank you for the explanation.

This seems to be closer to partitioning a single Module than
supporting multiple modules (at least not yet).
Does that seem accurate?

The use case is portioning a single module. We should have any other assumption at this level (bitcode).
If you want to stuck multiple version of the same module for various architecture, that’s fine. You can have your own tooling to load the right module for a given architecture.

If so maybe the API should be geared towards that–allow
“partition-aware” clients to read the pieces individually while
transparently treating the overall file as a single Module for
existing clients.
Just a thought, perhaps this wouldn’t work for your use case?

While this could work for this use case, this would make it either very complex in the bitcode itself, or very inefficient for loading all as single module.

Anyway I actually am very interested in support for multiple modules,
my use case being for use in shipping software in IR form as part of
the ALLVM project. Hence questions about things like linker semantics
and such.

Right, I’m interested in this as well, and my vision in general is to try to build basic blocks as neutral as possible, so that it is easier reuse them for such cases as ALLVM.

Hope this help.

Hi all,

As mentioned in my recent RFC entitled “RFC: a more detailed design for
ThinLTO + vcall CFI” I would like to introduce the ability for bitcode
files
to contain multiple modules. In https://reviews.llvm.org/D24786 I took
a
step towards that by proposing a change to the module format so that
the
block info block is stored at the top level. The next step is to think
about
what the API would look like for reading and writing multiple modules.

Here’s what I have in mind. To create a multi-module bitcode file, you
would create a BitcodeWriter object and add modules to it:

BitcodeWriter W(OS);
W.addModule(M1);
W.addModule(M2);
W.write();

That requires the two modules to lives longer than the bitcode write,
the
API could be:

BitcodeWriter W(OS);
W.writeModule(M1);
// delete M1
// …
// create M2
W.writeModule(M2);

(Maybe you had this in mind, but the API naming didn’t reflect it so
I’m
not sure).

In the API I prototyped, I took the maximum BitsRequiredForTypeIndices
value
from all the modules, and used it to produce the abbreviations for the
top/
level block info block (without this I was seeing “Unexpected abbrev
ordering!” errors in the bitcode writer as a result of emitting the
“same”
abbreviation multiple times). That would have required us to keep the
modules around until the call to write(). However, let me revisit this,
because it does not seem necessary (i.e. we can just continue to emit
block
info blocks within the module block except with different abbreviation
numbers for each module).

Reading a multi-module bitcode file would be supported with a
BitcodeReader class. Each of the functional reader APIs in
ReaderWriter.h
would have a member function on BitcodeReader. We would also have a
next()
member function which would move to the next module in the file. For
example:

BitcodeReader R(MBRef);
Expected B = R.hasGlobalValueSummary();

What’s this used for?

This would be the equivalent to the existing llvm::hasGlobalValueSummary()
function, which currently controls whether we compile a module with regular
LTO or with ThinLTO.

Would there be a “readGlobalValueSummary()”
similar to function summaries?

There would be a getModuleSummaryIndex() which again would be similar to
llvm::getModuleSummaryIndex(). Note that the module summary already covers
all global values, not just functions.

std::unique_ptr M1 = R.getLazyModule(Ctx); // lazily load the
first module
R.next();
std::unique_ptr M2 = R.parseBitcodeFile(Ctx); // eagerly load
the
second module

I’m very excited about the idea of storing multiple modules in a
bitcode file, and the (thin)LTO and CFI goodness you’re building using
it.

I have a few questions about where you’re going if you don’t mind–and
it’s related to the API in that it’s awfully hard to judge an API
without knowing what it’s expected to be used for or what the
underlying data represents.

On that-- I’m sorry if I’ve missed this information, but reading
through your RFC’s and posts I’m not finding the answer.
Is there a definition/explanation of what it means to have a bitcode
file containing multiple modules?

Is this a storage optimization where each module is what today is an
“llvm::Module” but we’re encoding them into a single file for
efficiency/convenience reasons?

Yes, each module would be an llvm::Module. This is more for convenience
reasons – it’s the simplest way to split modules that use CFI into a
regular LTO part and a ThinLTO part (as described in the RFC entitled “RFC:
a more detailed design for ThinLTO + vcall CFI”) while storing the entire
compiled translation unit in a single file.

Hmm, interesting. Thank you for the explanation.

This seems to be closer to partitioning a single Module than
supporting multiple modules (at least not yet).
Does that seem accurate?

The use case is portioning a single module. We should have any other assumption at this level (bitcode).

I think my sentence is not well written, let me retry: “The CFI use case here is partitioning a single module in two. But at this level (bitcode), we should not bake such assumptions."

Ah, awesome. Thanks this makes great sense, sounds great to me :D. Thanks!