building whole-program bitcode with LLVM

Hi,

Professor Adve suggested that we post this question to llvm-dev.
Thanks in advance for your advice.

My colleagues and I want to create whole-program bitcode for large
real programs like Apache, BIND, OpenLDAP, etc. We want the
whole-program bitcode to include every part of the program for which
we have source code. For example, in the case of Apache's "httpd"
server, we want to create a whole-program bitcode file "httpd.bc"
containing functions that the default build system stashes in various
application-specific auxiliary libraries (e.g., Apache's libapr and
libaprutil).

Our motive is *not* link-time optimization; we're interested in
analyzing and modifying the whole-program bitcode in other ways.
Once we have created a whole-program bitcode, we want to compile it
to native assembly, then pass it thru the native assembler & linker
to obtain a native executable whose behavior (except for performance)
is identical to that of an executable obtained from the default build
system. We do *not* want standard libraries like libc and libpthread
to be incorporated as bitcode in the whole-program bitcode; they can
be linked in at the final step, after we have converted the
whole-program bitcode to native assembly and assembled & linked it.

We have been able to achieve our goal for small programs consisting
of a handful of translation units, so we know that our goal is
attainable in principle. Problems start when we tackle big programs
with complex build systems. We want to find a generic strategy that
works with most real world open source C/C++ programs without too
much fuss, because we want to use it on at least a dozen different
programs. Ideally we want a strategy that works with unmodified
default build systems, because eventually we hope to produce a tool
that is easy for other developers to use.

Initially we had hoped simply to replace gcc, as, ld, etc. with their
LLVM counterparts in the standard build systems, but we haven't been
able to make that strategy work. Several different approaches along
these lines fail in various ways. Some have recommended the Gold
plugin, but it's not clear from the documentation that it does what
we want, and we haven't been successful in installing it yet.

Does anyone have experience in constructing whole-program bitcodes
that include app-specific libraries for large open-source programs?
If you could share the right tricks, that would be very helpful.

Thanks!

-- Terence Kelly, HP Labs

Hi,

For my PhD work, I have used LLVM to transform whole-program bitcode modules of systems like Quake 3 and Parrot VM. As build system integration is a very complex problem in general, integrating LLVM in medium to large build systems was not straightforward, although I guess things should be easier now with the help of the gold plugin and libLTO.

In short, I was not able to find a fully automated, generic approach to integrate LLVM, as every build system is unique, and often contains subtle mistakes (invoking gcc directly instead of via $CC, ...). Instead, I used a tool-supported, manual approach consisting of the following 3 steps:
  1. Visualize and understand the existing build system
  2. Plan how my tool fits in
  3. Change the makefiles

In step 1, I used my MAKAO tool (http://users.ugent.be/~badams/makao/) to visualize the build dependency graph of a run of the existing build system. This gives an idea about all libraries and executables that are built, how they fit together and which makefile rules are responsible for them.

Based on the information of step 1, I then determined in step 2 which libraries and executables I wanted to transform.

Finally, step 3 involved making system-dependent physical changes to the build system in order to deploy my tools the way I planned to in step 2. Sometimes, this could be done without touching the original makefiles, e.g. by overriding build variables. Often, more invasive changes were needed, such as splitting existing build rules or adding new ones.

at hand (see step 1) is indispensable when doing this kind of build change in large systems. More information can be found in sections 7.3.1, 9.3.1 and 10.3.1 of my PhD (http://users.ugent.be/~badams/publications/2008/PhD.pdf).

Kind regards,

Bram Adams
SAIL, Queen's University (Canada)

On Saturday 01 August 2009 03:11:57 Kelly, Terence P (HP Labs Researcher)

Initially we had hoped simply to replace gcc, as, ld, etc. with their
LLVM counterparts in the standard build systems, but we haven't been
able to make that strategy work. Several different approaches along
these lines fail in various ways. Some have recommended the Gold
plugin, but it's not clear from the documentation that it does what
we want, and we haven't been successful in installing it yet.

Could you summarize the failures/issues that you found with this kind of
approach?
We'll have to do sth similar eventually, and up to now I excepted that
customizing the compiler/linker should work most of the time. Perhaps with the
help of a custom-written linker-wrapper (using llvmc).

It would be great if you could share howto's / Makefile patches / ... once you
can build some of these large applications with the LLVM tools.

Thanks,
Torvald

Initially we had hoped simply to replace gcc, as, ld, etc. with their
LLVM counterparts in the standard build systems, but we haven't been
able to make that strategy work. Several different approaches along
these lines fail in various ways. Some have recommended the Gold
plugin, but it's not clear from the documentation that it does what
we want, and we haven't been successful in installing it yet.

Right now it will produce a native object file, but it might be
possible to hack it
to dump an IL file on the side. It is in a good position to do so
since the linker
passes it all the IL files, including the ones that are inside archives.

Thanks!

-- Terence Kelly, HP Labs

Cheers,