mapping types from a bitcode module

Hi all,

We've run into a tricky situation in our work on the Crack compiler and I'm
hoping that someone on this list can help us find the best solution.

We're currently trying to implement "module caching" for Crack, similar to the
feature in Python where module bitcode is persisted at compile time. When we
import a module, before compiling the module we check for a bitcode file
matching the source -- if a bitcode file exists, we load it and extract
compile-time metadata from it, saving us a compile step.

The problem here is that the StructType objects in the cached module are
different from those referenced by the compiler, so when we try to reference
entities in the cached module we get an assertion failure due to type
incompatibility. Specifically, this is currently happening when we reference
a global variable in the cached module from an array initializer in the
importing module which is still being compiled.

We can't easily just use the linker to manage all of this because we still
want to be able to persist that new module independently for later use. There
may be other reasons, too: seperate modules is a fundamental assumption of our
design. We currently use the linker only at the end of an AOT build.

So from what I can see, our possible solutions are:

1) duplicate the LinkModule internal code and copy the module we load from
bitcode to a new module with the correct types mapped.
2) duplicate BitcodeReader and create a version that reuses existing
StructTypes.
3) destructively convert all of the types in the imported module to our
existing types.

Needless to say, none if these are especially attractive. Is there a better
way to do this? Are any of these options clearly better or worse than the
others?

Also, when loading named StructTypes, would it be possible for LLVM to reuse
an existing type with the same name assuming the existing type is isomorphic?
This seems like it would be a win all around.

Hi Michael,

since noone of the experts answered, let me share our experiences. We recently had exactly the same problem, I posted on this list on January 31st.
I didn't follow Duncans advice to "just use the linker", since for several reasons we wanted to have unique struct types even in the separate modules.

1) duplicate the LinkModule internal code and copy the module we load from
bitcode to a new module with the correct types mapped.

Sounds quite inefficient. If you have to duplicate the internal code anyway, you can also iterate over the original module and mutateType().

2) duplicate BitcodeReader and create a version that reuses existing
StructTypes.

see below.

3) destructively convert all of the types in the imported module to our
existing types.

That's what we actually implemented, following the idea I described in the mentioned post. We don't identify identical struct types by their name, since even in the new type system, names don't actually mean anything. You could just strip them off.
Instead, we use the pointer value of the types to identify them, since originally, all our modules reside in the same LLVMContext. Since that doesn't seem to be the case in your situation, you propably would have to use the name, or attach other metadata to uniquely identify your structs.

Also, when loading named StructTypes, would it be possible for LLVM to reuse
an existing type with the same name assuming the existing type is isomorphic?
This seems like it would be a win all around.

Just out of interest, I also implemented that, because I thought it could improve the overall performance. But I couldn't measure any performance impact on the simple tests in the test-suite.
The main problem is that the named struct could reference other types defined later in the type table of the module, so you can only check whether named structs are identical after the whole type table has been parsed and the types are already created. So during parsing I am remembering which StructTypes had to be renamed, and after that - but before parsing the instructions - I check which of them are isomorphic to the corresponding existing struct, and directly manipulate the type list used when parsing the rest of the module.

A funny insight when implementing that is that I also had to change the behaviour of the linker, since it again created copies of all types used in the source module. So after linking, there again were different instances of the same struct type, but only one of them had a meaningful name, since the new copy that the linker creates steals the name of the original type :wink:

For both implementations I can provide source code if you wish.

Cheers,
Clemens

Hi Clemens - thanks for your response.

Clemens Hammacher wrote:

Hi Michael,

since noone of the experts answered, let me share our experiences. We
recently had exactly the same problem, I posted on this list on January
31st.
I didn't follow Duncans advice to "just use the linker", since for
several reasons we wanted to have unique struct types even in the
separate modules.

> 1) duplicate the LinkModule internal code and copy the module we load from
> bitcode to a new module with the correct types mapped.

Sounds quite inefficient. If you have to duplicate the internal code
anyway, you can also iterate over the original module and mutateType().

> 2) duplicate BitcodeReader and create a version that reuses existing
> StructTypes.

see below.

> 3) destructively convert all of the types in the imported module to our
> existing types.

That's what we actually implemented, following the idea I described in
the mentioned post. We don't identify identical struct types by their
name, since even in the new type system, names don't actually mean
anything. You could just strip them off.
Instead, we use the pointer value of the types to identify them, since
originally, all our modules reside in the same LLVMContext. Since that
doesn't seem to be the case in your situation, you propably would have
to use the name, or attach other metadata to uniquely identify your structs.

I was actually fearful of this approach, it looked to me like the linker was
at least partially copying data structures to the destination module. I see
that there is a mutateType() method in Value, though it comes with a very
stern warning :slight_smile:

But given your success with it, and given that it seems to involve the least
amount of copy-pasting the existing code, I think I'll give it a try.

> Also, when loading named StructTypes, would it be possible for LLVM to reuse
> an existing type with the same name assuming the existing type is isomorphic?
> This seems like it would be a win all around.

Just out of interest, I also implemented that, because I thought it
could improve the overall performance. But I couldn't measure any
performance impact on the simple tests in the test-suite.
The main problem is that the named struct could reference other types
defined later in the type table of the module, so you can only check
whether named structs are identical after the whole type table has been
parsed and the types are already created. So during parsing I am
remembering which StructTypes had to be renamed, and after that - but
before parsing the instructions - I check which of them are isomorphic
to the corresponding existing struct, and directly manipulate the type
list used when parsing the rest of the module.

Ah, I see. That would definitely complicate things.

A funny insight when implementing that is that I also had to change the
behaviour of the linker, since it again created copies of all types used
in the source module. So after linking, there again were different
instances of the same struct type, but only one of them had a meaningful
name, since the new copy that the linker creates steals the name of the
original type :wink:

For both implementations I can provide source code if you wish.

Thanks, I think we should be ok given your explanation. If I get stuck,
I might take you up on it. Although if we're both doing this, it may be
worthwhile for us to try to come up with something general enough to include
in LLVM.

Yeah, but as long as you mutate all types consistently, it works quite smoothly. The Verifier will tell you which values you missed :wink:

You shouldn't forget
- function arguments
- initializers of global variables
- constants

And of course you also have to consider composite types which contain a remapped struct.

Good luck! :wink:

Clemens

Clemens Hammacher wrote:

>>> 3) destructively convert all of the types in the imported module to our
>>> existing types.
>>
>> That's what we actually implemented, following the idea I described in
>> the mentioned post. We don't identify identical struct types by their
>> name, since even in the new type system, names don't actually mean
>> anything. You could just strip them off.
>> Instead, we use the pointer value of the types to identify them, since
>> originally, all our modules reside in the same LLVMContext. Since that
>> doesn't seem to be the case in your situation, you propably would have
>> to use the name, or attach other metadata to uniquely identify your structs.
>
> I was actually fearful of this approach, it looked to me like the linker was
> at least partially copying data structures to the destination module. I see
> that there is a mutateType() method in Value, though it comes with a very
> stern warning :slight_smile:

Yeah, but as long as you mutate all types consistently, it works quite
smoothly. The Verifier will tell you which values you missed :wink:

You shouldn't forget
- function arguments
- initializers of global variables
- constants

And of course you also have to consider composite types which contain a
remapped struct.

Very cool - thanks, Clemens!