LLVM analysis of Linux programs

Hey all,

I’m trying to do computations on the LLVM bitcode representations of various Linux programs, and I’m running into some troubles. I have the source code for various programs (tcpdump, gzip, curl, etc) and I want to compile each program down into a single LLVM bitcode file. I’ve tried a few different methods so far. Originally, I attempted to use the Makefiles of the projects with llvm-gcc as the compiler (make CC=“llvm-gcc -emit-llvm”), then manually link all the resulting bitcode files into a single one. This works on a few examples, but not all of them. Next, I was pointed to Ben Hardekopf’s work on pointer analysis; it turns out he has a nifty perl script for doing exactly the sort of thing I’m trying to do. Unfortunately, that script doesn’t seem to work on all programs either. It worked quite well on some of my examples, but not at all on others.

So basically, I’m wondering if you all have any advice on how to go about doing this. Is there some universal method of compiling entire source packages into a single LLVM bitcode file, or possibly some mix of tools/techniques that will work for the majority of programs? Or is my best bet just to hack together the sort of things I’ve already been trying?

Thanks,
Ben

Hey all,

I'm trying to do computations on the LLVM bitcode representations of
various Linux programs, and I'm running into some troubles. I have the
source code for various programs (tcpdump, gzip, curl, etc) and I want
to compile each program down into a single LLVM bitcode file. I've tried
a few different methods so far. Originally, I attempted to use the
Makefiles of the projects with llvm-gcc as the compiler (make
CC="llvm-gcc -emit-llvm"), then manually link all the resulting bitcode
files into a single one. This works on a few examples, but not all of
them. Next, I was pointed to Ben Hardekopf's work on pointer analysis;
it turns out he has a nifty perl script for doing exactly the sort of
thing I'm trying to do. Unfortunately, that script doesn't seem to work
on all programs either. It worked quite well on some of my examples, but
not at all on others.

So basically, I'm wondering if you all have any advice on how to go
about doing this. Is there some universal method of compiling entire
source packages into a single LLVM bitcode file, or possibly some mix of
tools/techniques that will work for the majority of programs? Or is my
best bet just to hack together the sort of things I've already been trying?

What I would suggest is first setting up a regular LTO build with gold:
http://llvm.org/docs/GoldPlugin.html

Once you have that working, you can manually get the bitcode for the program you want by passing -plugin-arg=also-emit-llvm to the linker. Something like

$ rm bzip2
$ make bzip2
<will print the like used for linking>
$ <link line> -Wl,-plugin-arg=also-emit-llvm

Thanks,
Ben

Cheers,
Rafael