[LLVM] What has happened to LLVM bitcode archive support?

Hi Rafael and other LLVM devs,

I'm currently upgrading a project that uses LLVM that links a bitcode
archive (a C library) with a module. Originally we used
Linker::LinkInFile() but that was removed by r172749. So I started
looking for an alternative and I found
Archive::findModulesDefiningSymbols() which looked very promising as
it would allow me to very easily implement linking a bitcode archive
with a module efficiently (i.e. only call Linker::LinkModules() on the
relevant bitcode modules inside the bitcode archive).

Before I started I thought I'd better check that this API was still
going to be available in the upcoming 3.4 release and to my dismay
``llvm/Bitcode/Archive.h`` was moved by you in r184083 and finally
removed by you in r186197.

So...

* Is LLVM completely moving away from the idea have bitcode archive
runtime libraries? The LLVM Make build system seems to support
building these but LLVM doesn't seem to have an API for linking
runtime bitcode archives into a bitcode module any more (or have I
missed something?). I have seen suggestions of using the LLVM gold
plug-in but that's not an API I can use, is it?

* I'm guessing I'm supposed to use the Object/Archive.h interface now?
This doesn't seem provide a nice interface for getting the set of
Modules I want to link with, so what am I supposed to do instead?

Any help would be appreciated,

Thanks,
Dan Liew.

Hi Rafael and other LLVM devs,

I'm currently upgrading a project that uses LLVM that links a bitcode
archive (a C library) with a module. Originally we used
Linker::LinkInFile() but that was removed by r172749. So I started
looking for an alternative and I found
Archive::findModulesDefiningSymbols() which looked very promising as
it would allow me to very easily implement linking a bitcode archive
with a module efficiently (i.e. only call Linker::LinkModules() on the
relevant bitcode modules inside the bitcode archive).

Before I started I thought I'd better check that this API was still
going to be available in the upcoming 3.4 release and to my dismay
``llvm/Bitcode/Archive.h`` was moved by you in r184083 and finally
removed by you in r186197.

So...

* Is LLVM completely moving away from the idea have bitcode archive
runtime libraries? The LLVM Make build system seems to support
building these but LLVM doesn't seem to have an API for linking
runtime bitcode archives into a bitcode module any more (or have I
missed something?). I have seen suggestions of using the LLVM gold
plug-in but that's not an API I can use, is it?

* I'm guessing I'm supposed to use the Object/Archive.h interface now?
This doesn't seem provide a nice interface for getting the set of
Modules I want to link with, so what am I supposed to do instead?

We are going this way, yes. The overview is

* In the old days llvm had its own archive index format for IL files
and special code to use it.
* llvm-ld was very incomplete, and not designed to grow into real
linker, so an alternative for LTO was needed.
* apple implemented libLTO and changed their linker to use it. A
wrapper (LLVMgold.so) was created for gold and bfd.
* Using this the gnu nm, ar and ld (and gold) were able to do LTO. In
particular, a IL files can be put in archives and show up in the
regular symbol table.
* llvm-ld was deprecated and then removed.
* with nothing able to read the old LLVM only symbol table, that was
also removed along with other archive related now dead code.

The more recent story is that llvm has grow support for object files,
so that llvm-ar is now a "real" (if immature) ar and lld is growing to
be a real linker. Currently llvm-ar is missing support for putting IL
files in the symbol table. The main reason is the desire to now get
this right this time, which required quiet a bit of refactoring. I
posted what I think is the last big patch for review (making the
Mangler not depend on target). With that in things should move fairly
quickly to have llvm-ar just work with IL files and with that
lib/Object should have the features you need.

Any help would be appreciated,

Thanks,
Dan Liew.

Cheers,
Rafael

Hi Rafael,

Could I just note (for the record) there is/are [at least one ;)] bare-metal targets that have neither static nor dynamic linkers.

We have one such (out-of-tree) target and make use of llvm-link to link bitcode - the linked bit code is translated directly into the executable. We would, however, welcome a solution that allowed the 'link' to be extended to include bitcode archives.

However, it's probably a non-starter (project constraints) to implement a static linker solely for this purpose.

You can probably use the gold plugin with all inputs being IR flies
and passing the plugin the emit-llvm option. That way you get a real
linker doing the resolution of what get fetched from the archives and
still get a IR file in the end.

It will be cleaner with llvm-ar+lld once the support is finish, but
the gold plugin should work today.

Cheers,
Rafael

We are going this way, yes.

You’ve confused me a bit here ( maybe I was being too vague )

Okay so I understand thst the old LLVM specific archive format is now gone.

However it seems you are allowing LLVM bitcode files ( I assume that’s what you mean by “IL” - does that stand for “intermediate language”? It’s not in [1] ) to be placed inside the more standard archive format understood by GNU ar and nm.

Therefore LLVM is still supporting runtime libraries that consist of llvm bitcode files (even if the format is now different)

For example, I can build a runtime library (e.g. a simple C library) as a collection of bitcode modules and place them in an archive using the latest llvm-ar ( I realise that currently the archive’s symbol table will be missing symbols from the bitcode files).

I
posted what I think is the last big patch for review (making the
Mangler not depend on target). With that in things should move fairly
quickly to have llvm-ar just work with IL files and with that
lib/Object should have the features you need.

Okay. Could you please clarify? Do you mean?

  • Future changes to lib/Object will let me read the symbols in an archive so I can implement my own ( primitive bitcode only ) linking?

Or

  • Future changes to lib/Object will implement llvm module linking for me?

I suspect you meant the first. If so, is the intention that if someone needs to link an in memory LLVM bitcode module to an archive of bitcode modules ( produced by the new llvm-ar or at with llvmLTO wrapper to ar) then they should use the API of lld? Does that API exist now? I took a look at the lld source code and I couldn’t find any methods that returned llvm::Module so I assumed it wasn’t possible.

Thanks,
Dan Liew

Okay so I understand thst the old LLVM specific archive format is now gone.

correct.

However it seems you are allowing LLVM bitcode files ( I assume that's what
you mean by "IL" - does that stand for "intermediate language"? It's not in
[1] ) to be placed inside the more standard archive format understood by GNU
ar and nm.

Correct, by having gnu ar use the plugin.

Therefore LLVM is still supporting runtime libraries that consist of llvm
bitcode files (even if the format is now different)

runtime? It is still possible to build .a files if that is what you men.

For example, I can build a runtime library (e.g. a simple C library) as a
collection of bitcode modules and place them in an archive using the latest
llvm-ar ( I realise that currently the archive's symbol table will be
missing symbols from the bitcode files).

Yes, it will produce an archive, but without the symbol table.

I
posted what I think is the last big patch for review (making the
Mangler not depend on target). With that in things should move fairly
quickly to have llvm-ar just work with IL files and with that
lib/Object should have the features you need.

Okay. Could you please clarify? Do you mean?

Once llvm-ar is producing .a files with symbol table for IR files, you
should be able to read those symbol tables to find which member
defines a symbol you are looking for.

- Future changes to lib/Object will let me read the symbols in an archive so
I can implement my own ( primitive bitcode only ) linking?

Or

- Future changes to lib/Object will implement llvm module linking for me?

The first. Since lld is in its own repository, the logic for fetching
member and iterating will be there, but all the supporting logic will
be in llvm proper.

I suspect you meant the first. If so, is the intention that if someone needs
to link an in memory LLVM bitcode module to an archive of bitcode modules (
produced by the new llvm-ar or at with llvmLTO wrapper to ar) then they
should use the API of lld? Does that API exist now? I took a look at the lld
source code and I couldn't find any methods that returned llvm::Module so I
assumed it wasn't possible.

No, we still have to implement support for bitcode files in lld.

Cheers,
Rafael

Therefore LLVM is still supporting runtime libraries that consist of llvm
bitcode files (even if the format is now different)

runtime? It is still possible to build .a files if that is what you men.

Sorry I haven't explained very clearly. The tool that I work on is an
interpreter of LLVM IR and when the tool runs, it links in an archive
of bitcode modules into the bitcode module that we are interpreting.
The modules we link in provide the implementation of some functions
(for example in our case one of the things we link in is a small C
library). Because the modules we link in provide functions for the
program we are running (i.e. interpreting) we refer to them as runtime
libraries. I'm not sure this is standard terminology so sorry if I
confused you.

Yes I did mean it's still possible to build (and read) .a files
containing LLVM bitcode files.

No, we still have to implement support for bitcode files in lld.

Okay thanks for clarifying.

So I guess if I want mimic linking in archives built by llvm-ar that
contain bitcode files, using code in LLVM trunk, my only choice at the
moment is to...

1. Collect a set of undefined symbols from the destination module.
2. Load **all** the `llvm::Module`s in the archive into memory
3. Iterate over each module's GlobalValues (does the list starting
with llvm::Module::global_begin() include the module's functions
too??) and if a GlobalValue in a module is not a declaration and is in
the set of undefined symbols then link that module into the
destination module using Linker::LinkModules()
4. Update the set of undefined symbols
5. repeat 1 and 2 until a fixed point (the set of undefined symbols
does not change) is reached.

??

Thanks,
Dan Liew.

Sorry I haven't explained very clearly. The tool that I work on is an

interpreter of LLVM IR and when the tool runs, it links in an archive
of bitcode modules into the bitcode module that we are interpreting.
The modules we link in provide the implementation of some functions
(for example in our case one of the things we link in is a small C
library). Because the modules we link in provide functions for the
program we are running (i.e. interpreting) we refer to them as runtime
libraries. I'm not sure this is standard terminology so sorry if I
confused you.

Excuse me for the diversion but, is that interpreter of LLVM IR available
somewhere? Details on how it works, performance, etc? I'd be interested

So I guess if I want mimic linking in archives built by llvm-ar that
contain bitcode files, using code in LLVM trunk, my only choice at the
moment is to...

1. Collect a set of undefined symbols from the destination module.
2. Load **all** the `llvm::Module`s in the archive into memory
3. Iterate over each module's GlobalValues (does the list starting
with llvm::Module::global_begin() include the module's functions
too??) and if a GlobalValue in a module is not a declaration and is in
the set of undefined symbols then link that module into the
destination module using Linker::LinkModules()
4. Update the set of undefined symbols
5. repeat 1 and 2 until a fixed point (the set of undefined symbols
does not change) is reached.

I would suggest for now building the archive with gnu ar and the
plugin, that way you have an index and don't need to read all members.

It looks like the algorithm you want is something like

while (there are undefined symbols)
  if we can find a member defining one of the undefined symbols
    load it
  else
    print error about undefined symbol.

Cheers,
Rafael

I would suggest for now building the archive with gnu ar and the
plugin, that way you have an index and don't need to read all members.

Thanks for the suggestion.I'll check out [1]. It would be definitely
be faster using LLVMGold.so with ar but to be honest I'd rather take a
small performance hit but make life much easier for our users at the
moment because the build process for our tool is already very
complicated. I'll definitely have a play with LLVMGold though as I've
always wanted to give LTO a go. Within a few releases of LLVM
hopefully lld will have the features I need :slight_smile:

Thanks for all your help.

[1] The LLVM gold plugin — LLVM 18.0.0git documentation

Cheers,
Dan Liew.

Excuse me for the diversion but, is that interpreter of LLVM IR available
somewhere? Details on how it works, performance, etc? I'd be interested

Yes it is available. The tool (KLEE [1]) is actually a lot more than
an interpreter (although it can be used as one). KLEE allows you to
mark certain variables (e.g. program inputs) as "symbolic". What this
means is that the variable marked as symbolic is allowed to have any
initially value and as program execution progresses, constraints are
gathered on the symbolic variable(s). Every time a conditional branch
is reached (e.g. if (x > 5) ) then KLEE will try to follow all
feasible paths and the record the constraints to each path (e.g. on
one path (x>5) and on the other (x <=5)).

The purpose of exploring these multiple paths is to

- Automatically generate test cases based
- Find bugs whilst exploring paths

If you don't mark any variables as symbolic then KLEE will just follow
a single path and so it acts just like an interpreter.

Unfortunately KLEE has not been that well maintained over the last few
years so the LLVM version that it builds against has slipped behind.
Several developers (including myself) are trying to upgrade it to work
correctly against LLVM 3.3. We have quite some way to go
unfortunately. It builds against LLVM3.3 but linking in our modified C
library (klee-uclibc [2]) doesn't work yet (which is why I'm asking
about bitcode archives) and some LLVM intrinsics and new instructions
added since LLVM2.7 aren't supported yet :frowning:

We're getting there though! If you have any questions feel free to
drop a question on the KLEE mailing list.

If you'd rather read a paper then look at code then you could take a
look at the original 2008 OSDI paper [3].

[1] http://ccadar.github.io/klee/GetStarted.html (note these
instructions are for llvm 2.9 :frowning: . Don't worry these instructions will
be updated eventually)
[2] GitHub - ccadar/klee-uclibc: klee-uclibc
[3] KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs

Thanks,
Dan Liew.