strace for whole-program bitcodes (was: RE: building whole-program bitcode with LLVM)

Hi,

It would be nice if it were easier for relative
novices to build whole-program bitcodes for
large, complex applications with hairy build
systems. Several readers of this list have
been trying various approaches for a few months
but as far as I know we haven't yet found a
good general solution. Approaches that have
been tried include 1) placing wrappers for the
usual tools (gcc, ar, as, ld, etc.) first on
the $PATH, and having the wrappers pass the
buck to the LLVM equivalent tools after cleaning
up the arguments; and 2) using the Gold plugin.

Recently another possibility occurred to me,
and I'm wondering if anyone has tried it.
The basic idea goes like this: A) use the
"strace" utility to trace the default build
system and log all invocations of all tools;
B) extract from the log a build recipe in the
form of tool invocations, with the default
tools replaced by LLVM equivalents.

I started thinking along these lines after
finding some genuine madness in a build system
(it used AWK to munge together existing .c files
into new ones midway through the build). I want
a method that's guaranteed to mimic faithfully
an arbitrarily nutty default build system, and
an strace-based approach seemed like a "Gordian
knot" solution. However I haven't tried it yet
and I'm wondering if anyone else has, or if
anyone can think of situations where it will
fail.

Thanks!

-- Terence

"Kelly, Terence P (HP Labs Researcher)" <terence.p.kelly@hp.com> writes:

and I'm wondering if anyone else has, or if
anyone can think of situations where it will
fail.

This will fail when the input files are temporary files that are
removed after the build process. You will also be building the program
twice.

http://saturn.stanford.edu/pages/relatedindex.html folks do something
similar but only look at the build log.

Hi, Kelly,
Have you found the solution for this problem? I met a similar problem when I were trying to test MySQL 5.0 with LLVM. The following is my step, but still failed since llvm-ld can not recognize some gcc link flags.

  1. during the configuration, use a script such as llvm-gcc.sh, at this time the script only invoke the gcc. This is necessary because the gnu configure will test the compiler before configuration.
  2. for configuration, specify CC and CXX as llvm-gcc.sh and llvm-g++.sh, also pass a special LDFLAG.
  3. after configuration, rewrite the llvm-gcc.sh and llvm-g++.sh to parse the LDFLAGS to determine if we should use
    "llvm-gcc --emit-llvm " or “llvm-ld”
  4. but finally, I still met the following error:
    libtool: link: mycc.sh -g -DDBUG_ON -DSAFE_MUTEX -O0 -g3 -shit-shit -rdynamic -o comp_sql comp_sql.o -lpthread -lcrypt -lnsl -lm -lpthread
    llvm-ld: Unknown command line argument ‘-g’. Try: ‘llvm-ld --help’
    llvm-ld: Unknown command line argument ‘-DDBUG_ON’. Try: ‘llvm-ld --help’
    llvm-ld: Unknown command line argument ‘-DSAFE_MUTEX’. Try: ‘llvm-ld --help’
    llvm-ld: Unknown command line argument ‘-O0’. Try: ‘llvm-ld --help’
    llvm-ld: Unknown command line argument ‘-g3’. Try: ‘llvm-ld --help’
    llvm-ld: Unknown command line argument ‘-rdynamic’. Try: ‘llvm-ld --help’

someone suggested me to use gold-plugin, I know nothing about it yet, I will have a try later. Does anyone have a good solution for this problem?

Thanks.

Tianwei

Tianwei <tianwei.sheng@gmail.com> writes:

someone suggested me to use gold-plugin, I know nothing about it yet, I will
have a try later. Does anyone have a good solution for this problem?

Afaik gold does not help here. I tried it and managed to only generate
native code.

I'm currently investigating an alternative approach to produce
whole-program bitcodes:

1) add /tmp/wrap to PATH
2) create /tmp/wrap/gcc with the following contents

#!/bin/sh
exec llvm-gcc -specs /tmp/wrap/gcc.specs "$@"

3) llvm-gcc -dumpspecs > /tmp/wrap/gcc.specs
4) modify /tmp/wrap/gcc.specs so that it always passes -emit-llvm to cc1
5) modify /tmp/wrap/gcc.specs so that it calls llvm-ld* instead of real
   ld and does not pass any unknown flags to it.

With this approach I was able to compile and run airstrike (a 2d
dogfighting game) in bitcode form very transparently with:

$ make-bitcode fakeroot apt-get --build source airstrike
$ sudo dpkg -i airstrike*.deb
$ airstrike

If you are interested I can try to rework my scripts to a shape where
they could be used by somebody else.

(*) I am not actually calling llvm-ld directly. Instead I have an
    "llvm-ld-exe" wrapper that calls llvm-ld and then uses "anytoexe" to
    pack the resulting bitcode to a shell script that can execute itself with
    lli and use the correct -load options.

Hi Terence,

I believe that this is in fact similar to an approach Coverity uses
(or used at one time) as a robust solution to determine what was done
during a build. I can imagine that one can build a robust system
following this technique, but it also seems like it might be quite a
bit of work.

Another possible alternative not mentioned is to teach the compiler
driver (clang, most likely) to understand how to deal with bitcode
files on platforms with no LLVM linker support. This isn't terribly
difficult, and would work as long as all access to the tools was done
through the driver (e.g., CC). There might still be problems with
build systems that call tools like ar/ld directly.

- Daniel

Tianwei <tianwei.sheng@gmail.com> writes:

someone suggested me to use gold-plugin, I know nothing about it yet, I will
have a try later. Does anyone have a good solution for this problem?

Afaik gold does not help here. I tried it and managed to only generate
native code.

"Just" gold isn't quite good enough, because at the last final link
steps gold will still generate native code. However, it should be
possible to find a way to get gold to leave the merged bitcode around
somewhere, or perhaps do something like llvm-ld. Nicholas?

The advantage of this approach is that it will potentially work with
build systems that call ar/ld directly.

I'm currently investigating an alternative approach to produce
whole-program bitcodes:

1) add /tmp/wrap to PATH
2) create /tmp/wrap/gcc with the following contents

#!/bin/sh
exec llvm-gcc -specs /tmp/wrap/gcc.specs "$@"

3) llvm-gcc -dumpspecs > /tmp/wrap/gcc.specs
4) modify /tmp/wrap/gcc.specs so that it always passes -emit-llvm to cc1
5) modify /tmp/wrap/gcc.specs so that it calls llvm-ld* instead of real
ld and does not pass any unknown flags to it.

With this approach I was able to compile and run airstrike (a 2d
dogfighting game) in bitcode form very transparently with:

$ make-bitcode fakeroot apt-get --build source airstrike
$ sudo dpkg -i airstrike*.deb
$ airstrike

Very clever though. :slight_smile:

- Daniel

Hi Daniel,

Thanks for your reply.

Do we know if the LLVM developers intend to
address this problem in a comprehensive way?
The existing LLVM tools are not quite drop-in
replacements for their standard GCC counterparts;
that's the source of the problems that various
people have encountered when trying to develop
a fully general way to get whole-program bitcodes.

If the LLVM tools *were* fully compatible, I
think that would remove an impediment to much
wider usage of LLVM. Is full compatibility a
goal for the LLVM developers?

-- Terence

Daniel Dunbar wrote:

Tianwei <tianwei.sheng@gmail.com> writes:

someone suggested me to use gold-plugin, I know nothing about it yet, I will
have a try later. Does anyone have a good solution for this problem?

Afaik gold does not help here. I tried it and managed to only generate
native code.

"Just" gold isn't quite good enough, because at the last final link
steps gold will still generate native code. However, it should be
possible to find a way to get gold to leave the merged bitcode around
somewhere, or perhaps do something like llvm-ld. Nicholas?

It's easy. In gold-plugin.cpp all_symbols_read_hook() where lto_codegen_compile(cg, ...) is currently called, call lto_codegen_write_merged_modules(cg, "/path/to/output.bc") instead.

If someone were to rig this up to a command-line flag (search for LDPT_OPTION in the same file) then I would like to commit that change.

Nick

Hello everyone,

I'm working on passing parameters for gold/LTO plug-in and could add this one as well.
Just need an option name. Could anybody suggest one?

Viktor

–emit-llvm??

–emit-llvm, if not conflict

Paul Davey plmdvy@gmail.com 10/30/2009 11:11 AM >>>
–emit-llvm??

any news?thanks

“Guan Jun He” gjhe@novell.com 10/30/2009 2:22 PM >>>

–emit-llvm, if not conflict

Paul Davey plmdvy@gmail.com 10/30/2009 11:11 AM >>>
–emit-llvm??