How to translate library functions into LLVM IR bitcode?

Hello,

I run into a situation where I need to provide the library functions, such
as sqrt() from <math.h> and rand() from <stdlib.h>, in the format of LLVM
IR bitcode files. Then I can try to link the bitcode of my program against
these library bitcode files to formulate a holistic bitcode file.

However, these library functions are only available in object format. And
the source files of them I found have many references to other library
files.

So, is there any way to translate the library source files into bitcode
files, so that all the information that the library functions need is
included and can provide the implementation of these functions to the
calling program after linking?

Thanks for any suggestions:)

Hi Liwei,

You may at first look into whether there is a llvm intrinsic for the function you want to call. In case there is not, I would suggest you wrapped the original function in your own project and call the wrapped one instead. And at last you may dynamically load the original library and link to it at runtime (using dlopen).

Regards,
Yabin

Thanks for your reply, Yabin.

Actually I’m trying to come up with a way to translate any source code into bitcode format. So some of them are not necessarily LLVM intrinsics. Also it seems like that dynamically loading the library by dlopen() is for loading object files, not .bc files. So it might not be useful for library in .bc files.
Anyway, I’ve found a way to solve this by manually compiling each of the source file into a .bc file.

If there’s any automated way to infer about all the subroutines that one function needs, clang them into .bc file and link them into a stand-alone .bc library, that will be more than appreciated:-)

If there's any automated way to infer about all the subroutines that one
function needs, clang them into .bc file and link them into a stand-alone
.bc library, that will be more than appreciated:-)

If manually compiling the files is working for you, you could try
building the entire library with "-flto" for Link-time optimisation.
The output of that will be LLVM IR (if you can convince the build
system to do it for you).

The issue is that parts of the standard library are
performance-critical and often written in assembly. If the entire file
is assembly you won't be able to produce IR very easily at all.

Cheers.

Tim.

Good tips. Although I have used llvm-link to merge .bc files together, I guess -flto could optimize the resultant .bc file further.

As for the assembly, yes it is an issue. Anyway, I’ll try to address those sources which are available for being translated into .bc first.

Thanks for your advice, Tim.

Hey Liwei,

I attached a script I used some time back to compile multiple source
files (of a benchmark) into one bitcode file (instead of an executable).
The script is very simple but worked surprisingly well. It checks the
command line options for indicators what kind of output is expected and
produces bitcode files with the appropriate names. In order to use it
you just need to put/link the script on your path and use it as your
standard compiler (e.g., export CC=<script_name>) instead of clang.
However, clang and llvm-link need to be on the path. If the name of the
executed script is <script_name>++ (note the ++ in the end) then clang++
will be used to compile the source files, otherwise clang. Also note that
the script will remove some command line options as they are not supported
or desired when creating bitcode instead of object code files.

It should also be able to pass a usual autoconf configure run by
detecting it and simply redirecting the complete command line to clang(++).
But I never used it to link libraries though.

I'm not sure if you can use this script as is or as a starting point to
create your own but maybe you can.

Best regards,
  Johannes

llvm_link_ir (7.8 KB)

Hi Johannes,

By following your directions, I can use your script as is to produce the .bc file now. Here’s my command line for compiling s_sin.c into s_sin.bc file and the output:
command line:
~/Downloads/newlib-2.1.0/newlib/libm/mathfp » python ~/llvm_link.py s_sin.c -I…/common/ -I…/…/libc/include/ -o s_sin.bc

output:

Initiate CLANG (/path-to-clang):
Options: ‘s_sin.c -I…/common/ -I…/…/libc/include/ -o s_sin.bc -emit-llvm -c’
In file included from s_sin.c:18:
./zmath.h:7:9: warning: ‘NAN’ macro redefined
#define NAN 2
^
…/…/libc/include/math.h:57:11: note: previous definition is here

define NAN (__builtin_nanf(""))

^
1 warning generated.
Retcode: 0

I don’t know why this warning gets generated since in …/…/libc/include/math.h:57 the macro NAN is wrapped by ifndef as shown below, but that’s not important.

ifndef NAN

define NAN (__builtin_nanf(""))

endif

I llvm-dis this s_sin.bc file and get a s_sin.ll file as follows.

; Function Attrs: nounwind uwtable

define double @sin(double %x) #0 {
entry:
%x.addr = alloca double, align 8
store double %x, double* %x.addr, align 8
%0 = load double* %x.addr, align 8
%call = call double @sine(double %0, i32 0)
ret double %call
}

declare double @sine(double, i32) #1

Now here comes the point. In this bitcode file, the callee of sin() function, i.e. sine(), does not have a function body. I know that the definition of sine() is in another .c file, which is s_sine.c in the same folder. So essentially, I’m wondering if the script can help to automatically compile and link all the callees recursively required by sin() without manually specifying every source file of the callees.

Sorry for coming back late on this topic and thank you so much Johannes, for sharing your helper script.

Hey Liwei,

the script was "designed" to act as a compile during "the usual"
configure + make of a benchmark/program/library. If you have such a setup
(a configure script which will produce the Makefile file), you can do
something similar to:

ln -s ~/llvm_link.py ~/llvm_link
ln -s ~/llvm_link.py ~/llvm_link++

export CC=~/llvm_link
export CXX=~/llvm_link++

cd <SRC>
./configure
make

and it should result in a .bc file where otherwise an e.g., an
executable would have been produced.

If you just have a bunch of .c files which can be compiled into an
executable with:

clang -I <HEADER_DIR> a.c b.c c.c d.c -o ex

you can replace clang by the script and it should give you a ex.bc in
the end.

Does that help?

Best regards,
  Johannes

Hi Johannes,

Actually, I’m working in the same scenario, i.e. configure + make of a benchmark/program/library like you said. I’ve got your point of using this script as a replacement to generate .bc files instead of a executable. That’s truly helpful and has already answered my original question.

Now I’m actually moving a step further. Take the same example in your reply, say, if I have a bunch of .c files, where a.c calls functions defined in b.c, b.c calls functions defined in c.c and so on. I know that I can compile them into a .bc file by using this command:
~/llvm_link -I <HEADER_DIR> a.c b.c c.c d.c -emit-llvm -c -o linked.bc

But the problem is all of the files which define the callees need to be specified manually. Is it possible that the source files in which the callees are defined can be automatically inferred, compiled and linked? If so, then we can just simply use this command below to include all the callees required by in a.c:
~/llvm_link -I <HEADER_DIR> a.c -emit-llvm -c -o linked.bc

Though we can define the dependency for the source files defining callees in makefile, it’s still a manual specification there. So that’s why I’m looking into this further step.

Hey Liwei,

I inlined two comments but in short: I don't have a solution for your
automatic dependence gathering problem.

Hi Johannes,

Actually, I'm working in the same scenario, i.e. configure + make of a
benchmark/program/library like you said. I've got your point of using this
script as a replacement to generate .bc files instead of a executable.
That's truly helpful and has already answered my original question.

Now I'm actually moving a step further. Take the same example in your
reply, say, if I have a bunch of .c files, where a.c calls functions
defined in b.c, b.c calls functions defined in c.c and so on. I know that I can
compile them into a .bc file by using this command:
~/llvm_link -I <HEADER_DIR> a.c b.c c.c d.c -emit-llvm -c -o linked.bc

But the problem is all of the files which define the callees need to be
specified manually. Is it possible that the source files in which the
callees are defined can be automatically inferred, compiled and linked? If
so, then we can just simply use this command below to include all the
callees required by in a.c:
~/llvm_link -I <HEADER_DIR> a.c -emit-llvm -c -o linked.bc

I don't think this is in general even decidable and I'm not aware of any
implemented approximation to that problem. If such a thing would exist we
wouldn't need to write build dependences in Makefiles(.in) by hand any more :wink:

Though we can define the dependency for the source files defining callees
in makefile, it's still a manual specification there. So that's why I'm
looking into this further step.

My guess is you have to go the extra mile and define all dependences in
a Makefile yourself.

I hope you'll find a way that works for you,
  Johannes

Hi Johannes,

After reconsideration, I think your guess that the dependencies should be handled manually is reasonable. Because otherwise, if the dependencies are resolved automatically as I previously hoped, there might be an issue of redefinition of functions when multiple top functions share the same callees from library.

So, forget about the automatic dependence resolving stuff. I think this question is done and can be closed now.
Again, thanks for the helpful script, Johannes.
Cheers;)