Compiling whole programs to bitcode

With Clang, it's reasonably straightforward to compile a C/C++ file to bitcode.

Is there a way to compile a program together with all the standard
library functions it uses, to that format? That is, suppose you have
hello.c that calls printf, how would you go about generating the
bitcode representation of both the main function from hello.c, and
printf itself (plus whatever other standard library functions printf
calls)?

Or put another way, given that one answer to the above question would
be 'download the source of GNU libc, manually run Clang to compile the
whole lot to bitcode, then manually extract the bitcode versions of
the functions you are interested in,' is there a more automated way
to do it (in the sense that a linker provides a more automated way to
do a similar job of building an executable)?

You can link your bitcode together with glibc's bitcode by using the
llvm-link utility. Then you can run DCE over the bitcode with "opt -dce"
and cull all the functions you don't need.

But why are you trying to do this at all?

Chip

You can link your bitcode together with glibc's bitcode by using the
llvm-link utility. Then you can run DCE over the bitcode with "opt -dce"
and cull all the functions you don't need.

Right, that's still a reasonably straightforward solution for C... I
think what I'm more concerned about is C++, where templates break the
simple model of source to object to linking. Trying to wrap my head
around what the pipeline would look like in the C++ case.

But why are you trying to do this at all?

I've got some ideas for automatic detection of bugs in C/C++ code,
looking into the feasibility of using Clang as a front-end in the hope
of only having to write a parser for the ASCII version of bitcode
(dead easy) instead of C (somewhat easy) and C++ (hard).

Don't do this. Seriously. Waste of time.

There's a parser in llvm, it's pretty good at reading and writing bitcode. Just do your analysis using the llvm apis.

-eric

Russell Wallace wrote:

You can link your bitcode together with glibc's bitcode by using the
llvm-link utility. Then you can run DCE over the bitcode with "opt -dce"
and cull all the functions you don't need.

Right, that's still a reasonably straightforward solution for C... I
think what I'm more concerned about is C++, where templates break the
simple model of source to object to linking. Trying to wrap my head
around what the pipeline would look like in the C++ case.

Exactly the same as in C. Before we ever get to LLVM IR, the templates are instantiated and the names are mangled.

If a program uses a templated function from a C++ library, either the C++ library provides an implementation of the template instantiated to that type, or else the whole implementation was available in the header and ends up in the TU of the program, not the library ((or else you'd get a link error when building normally)). Either way, the rest proceeds the same as it would in C.

Nick

Right, that's also an option. In that event, though, the question
still arises about what the overall pipeline looks like... is there
documentation anywhere on just how Clang/LLVM goes about handling C++
templates?

Ah, but that's not the case, not with modern C++ compilers - it is
perfectly possible for a templated function to be declared in a
header, defined in a .cc file, called on a user-defined type, and
nonetheless instantiated on the fly. Of course that indeed wouldn't
work in the simple C linkage model, which is why modern C++ compilers
use one or another sort of black magic behind the scenes to deal with
it. Is it documented how Clang deals with it?

If a program uses a templated function from a C++ library, either the C++
library provides an implementation of the template instantiated to that
type, or else the whole implementation was available in the header and ends
up in the TU of the program, not the library ((or else you'd get a link
error when building normally)).

Ah, but that's not the case, not with modern C++ compilers - it is
perfectly possible for a templated function to be declared in a
header, defined in a .cc file, called on a user-defined type, and
nonetheless instantiated on the fly.

Not this modern C++ compiler. We don't support the 'export' keyword. In
fact, C++0x doesn't even have 'export' anymore.

Of course that indeed wouldn't
work in the simple C linkage model, which is why modern C++ compilers
use one or another sort of black magic behind the scenes to deal with
it. Is it documented how Clang deals with it?

Only one compiler supports 'export', and that's EDG's compiler. And boy,
was it a pain for them to implement (so I've heard). Let's just leave it
at that.

Chip

Ah, good! So the same straightforward model applies to C++ as to C
here, okay, that's good news, thanks.