Reading LLVM bitcode into existing module

I am writing a compiler using LLVM 3.2 to generate native code (currently x86-64) from IR. The native code will be linked by the system linker (not a JIT).

The compiler generates calls to a run-time library to perform many operations. Therefore, each Module that I generate needs to be have declarations for all of these run-time functions added to it.

Question: is this true? I am assuming that LLVM works like a C++ compiler: before you can call a function from anywhere in a compilation unit, you need its prototype in scope.

Initially, I did this by calling Function::Create for each declaration I wanted to make. However, this is starting to “not scale”.

I also want to experiment with defining some of these library functions using LLVM IR directly. I can then have LLVM inline and optimize calls to these functions. Given that many of the arguments to the functions are constants, there is plenty of opportunity for loop unrolling and optimization.

To this end, I would like to read LLVM bitcode into an existing module. The bitcode would contain declarations for all of my library functions, plus definitions for anything I want to try to inline and optimize.

ReaderWriter provides an API for loading bitcode and returning a Module as a result. One possibility is for me to read the bitcode into a skeleton module and then have the compiler emit more code into that module. I won’t have control over the name of the module if I do this - I’m not sure if that will cause a problem down the road.

There also seems to be a mechanism for adding “library dependencies” to a Module. This suggested that perhaps I could read my bitcode into a master library module held off to the side, and have the compiler reference the master module as a library dependency in everything it generated. However, I didn’t see easily how the library mechanism worked.

What’s the most reasonable way for me to declare large numbers of functions into a module?

Hi David,

Question: is this true? I am assuming that LLVM works like a C++ compiler:
before you can call a function from anywhere in a compilation unit, you need
its prototype in scope.

Pretty much. There is only one scope for functions in LLVM IR
(module-global) but they do have to be declared.

Initially, I did this by calling Function::Create for each declaration I
wanted to make. However, this is starting to "not scale".

You don't say quite why it's not scaling, but the functions
Module::getOrInsertFunction make some of the details easier. It's how
I'd always create a function declaration.

To this end, I would like to read LLVM bitcode into an existing module. The
bitcode would contain declarations for all of my library functions, plus
definitions for anything I want to try to inline and optimize.

The easiest way to do this is probably to use the llvm::Linker class.
That utility class basically just merges one module's definitions into
another, so you just load your library bitcode and link it into the
module you actually care about.

Alternatively you could just call setModuleIdentifier to rename the
loaded library module. I suppose that would be simpler unless you
could see yourself splitting the library functions into multiple files
to help organisation.

Once your library is in the same module as your functions, you can
probably simplify the management issue too: declarations will already
exist so you can just look them up by name with
Module::getOrInsertFunction.

You could actually import *just* the declarations like that if you
wanted to experiment with just how much benefit came from the inlining
at some later date. That is, load a module which looked like just:

   declare float @sinf(float)
   declare double @sin(double)

There also seems to be a mechanism for adding "library dependencies" to a Module

Hmm. Not heard of that one. It's the kind of thing multiple languages
would find useful so it wouldn't surprise me if it did exist, but I've
not encountered it anywhere.

Cheers.

Tim.

I am writing a compiler using LLVM 3.2 to generate native code (currently
x86-64) from IR. The native code will be linked by the system linker (not a
JIT).

The compiler generates calls to a run-time library to perform many
operations. Therefore, each Module that I generate needs to be have
declarations for all of these run-time functions added to it.

Question: is this true? I am assuming that LLVM works like a C++ compiler:
before you can call a function from anywhere in a compilation unit, you
need its prototype in scope.

Yes.

Initially, I did this by calling Function::Create for each declaration I
wanted to make. However, this is starting to "not scale".

I also started out on this path, but once there are a lot of runtime
functions it creates a lot of code to gen them, which is what I presume you
mean by "not scale".

I also want to experiment with defining some of these library functions
using LLVM IR directly. I can then have LLVM inline and optimize calls to
these functions. Given that many of the arguments to the functions are
constants, there is plenty of opportunity for loop unrolling and
optimization.

To this end, I would like to read LLVM bitcode into an existing module.
The bitcode would contain declarations for all of my library functions,
plus definitions for anything I want to try to inline and optimize.

Being unable to find any articles that suggested one method or another, I
do this, which works well for what I am doing:

I have a runtime library written in C++ and compiled with clang.

I have a project module with a single .cpp file, that includes all the
relevant headers to pick up code inlined the .h files, and which uses
enough runtime methods for the inline code to be generated etc. In practice
this is not that much code.

I compile that .cpp file to a .bc file runtimeinterface.bc

When it comes to code gen time in my compiler, I first load
runtimeinterface.bc and compile it:

this->codeModule = llvm::ParseIRFile(libFile, ed, this->Context);

I then use setModuleIdentifier() to change the module ID appropriately

I then locate the 'junk' added by clang, such as the GLOBAL__I constructor
stuff and my dummy runtime interface function and perform
removeFromParent().

I now have declarations for all the runtime methods, which I need not hard
code in to the compiler as well as having the IR for the internal link once
inlinable functions in my runtime .h files. Use module->getFunction() to
find the function definitions and the approriate other getXXX calls in the
Module class. Use assert so that you can detect in debug builds when
someone changed the runtime underneath you (you cannot find the function or
definition any more)

I then start code generation in to this module as if it were an empty one
as is the usual case.

I have to admit that I just dreamed this method up after casting around
with Google quite a bit and I can see a few possible issue with this such
as being possibly dependent on the version of clang that is used. This
works for me because I am in control of the end use environment however and
can ensure that the versions of the components being used are the correct
ones.

If anyone with greater experience in LLVM than I have sees other issues,
then please pipe up!

Cheers,

Jim