Compiling zlib to static bytecode archive

Hi,

I'm trying to compile zlib to produce a "libz.a" static library which is an
LLVM bytecode archive. I'm using this command line for "configure":

  AR="llvm-ar r" RANLIB=llvm-ranlib CC=llvm-gcc CFLAGS=--emit-llvm \
    ./configure

The creation of "libz.a" works, but after that, zlib's Makefile wants to
compile and link some example programs. The linking step fails:

  llvm-gcc --emit-llvm -DNO_vsnprintf -DUSE_MMAP -o example example.o \
    -L. libz.a
  example.o: file not recognized: File format not recognized
  collect2: ld returned 1 exit status

I can link it by hand using llvm-ld instead of llvm-gcc, like this:

  llvm-ld -o example example.o libz.a

However, it is not possible to let the zlib Makefile issue that command
without patching the Makefile, because the fragment that does the linking is
hardcoded to use the compiler command for linking:

  example$(EXE): example.o $(LIBS)
          $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)

Would it be possible to make llvm-gcc call llvm-ld instead of the systemwide
ld? I tried setting the environment variables COMPILER_PATH=/usr/local/bin
and GCC_EXEC_PREFIX=llvm- but that had no effect.

I'm using LLVM 2.1-pre1 and the corresponding llvm-gcc 4.0.

Bye,
    Maarten

However, it is not possible to let the zlib Makefile issue that command
without patching the Makefile, because the fragment that does the linking is
hardcoded to use the compiler command for linking:

  example$(EXE): example.o $(LIBS)
          $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)

Right, unfortunately the current Link Time Optimization model requires the linker to "know" about LLVM.
http://llvm.org/docs/LinkTimeOptimization.html

Would it be possible to make llvm-gcc call llvm-ld instead of the systemwide
ld? I tried setting the environment variables COMPILER_PATH=/usr/local/bin
and GCC_EXEC_PREFIX=llvm- but that had no effect.

I see two solutions to this. One is to have llvm-gcc call llvm-ld when it has some option passed to it. Another would be to enhance 'collect2' to know about LLVM files. 'collect2' is a GCC utility invoked at link time, it would be the perfect place to add hooks.

The thing we're missing most right now is a volunteer to tackle this project :slight_smile:

-Chris

> However, it is not possible to let the zlib Makefile issue that
> command
> without patching the Makefile, because the fragment that does the
> linking is
> hardcoded to use the compiler command for linking:
>
> example$(EXE): example.o $(LIBS)
> $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)

Right, unfortunately the current Link Time Optimization model
requires the linker to "know" about LLVM.
http://llvm.org/docs/LinkTimeOptimization.html

That's the reason I want to try and build a bytecode lib: to see if link time
optimization of executable + libs has any effect on performance and on code
size. My guess is that performance won't improve much, since there aren't
that many calls per second which cross the app-lib boundary. But code size
could improve if unused optional features can be elimated as dead code
because a function is only called in one particular way.

By the way, the example from that document does not work with the current
llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:

$ llvm-gcc a.o main.o -o main
a.o: file not recognized: File format not recognized
collect2: ld returned 1 exit status

Linking with llvm-ld does work:

$ llvm-ld a.o main.o -native -o main
$ ./main
$ echo $?
42

The link step combines one or more input files into one output file. The input
files can be all bytecode, all native or mixed. The output file can be
bytecode or native. Since it is only possible to convert from bytecode to
native and not vice versa, bytecode output requires all bytecode input. So
the combinations are:

bytecode input, bytecode output:
Can be handled by llvm-ld without invoking system compiler/linker.

native input, native output:
Handled by system compiler/linker.

bytecode or mixed input, native output:
According to the llvm-ld man page, llvm-ld will generate native code from the
bytecode files and invoke the system compiler to do the actual linking.

> Would it be possible to make llvm-gcc call llvm-ld instead of the
> systemwide
> ld? I tried setting the environment variables COMPILER_PATH=/usr/
> local/bin
> and GCC_EXEC_PREFIX=llvm- but that had no effect.

I see two solutions to this. One is to have llvm-gcc call llvm-ld
when it has some option passed to it. Another would be to enhance
'collect2' to know about LLVM files. 'collect2' is a GCC utility
invoked at link time, it would be the perfect place to add hooks.

I found the documentation of collect2 here:
  Collect2 (GNU Compiler Collection (GCC) Internals)

Its purpose seems to be to act like ld and insert calls to initialization
routines (and exit routines) before calling the real ld. The comment at the
top of the source file describes it like this:

   Collect static initialization info into data structures that can be
   traversed by C++ initialization and finalization routines.

According to this comment in the collect2 source, having collect2 accept
options that ld does not accept will cause trouble:

  /* !!! When GCC calls collect2,
     it does not know whether it is calling collect2 or ld.
     So collect2 cannot meaningfully understand any options
     except those ld understands.
     If you propose to make GCC pass some other option,
     just imagine what will happen if ld is really ld!!! */

Originally I was under the impression that llvm-ld was just an LLVM-aware
version of ld, but that is not the case. For example, when creating an output
file in native format, it runs the system compiler on the generated native
code and that compiler automatically picks up libraries such as libc, which
must be specified explicitly to ld. Also, although llvm-ld accepts many of
the options accepted by ld, GCC uses some ld options that llvm-ld does not
accept.

Going back to the two options you mentioned, they would lead to the following
invocation chains. Let's use the "mixed input, native output" scenario: if we
can support that, we can support the rest as well.

llvm-gcc calling llvm-ld:
  llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld

enhance collect2:
  llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld

llvm-collect2 is the enhanced collect2, while plain collect2 is the one that
belongs to the system compiler. Note that this assumes the system compiler is
GCC, otherwise the "gcc -> collect2 -> ld" chain will be something else, but
will perform the same function.

Since llvm-ld invokes the system compiler to do the actual linking, the
executable it produces will already have the proper init/exit sequences. So
llvm-collect2 would not have anything to do.

To summarize:
- llvm-ld (currently) does not accept all flags that GCC passes to collect2
- an LLVM-aware collect2 would never perform the core function of collect2,
  which is generating init/exit code and data

Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is
preferable.

The thing we're missing most right now is a volunteer to tackle this
project :slight_smile:

Since this is all new terrain for me, I might get stuck before producing
anything useful. But I'm willing to try.

Bye,
    Maarten

However, it is not possible to let the zlib Makefile issue that
command
without patching the Makefile, because the fragment that does the
linking is
hardcoded to use the compiler command for linking:

  example$(EXE): example.o $(LIBS)
          $(CC) $(CFLAGS) -o $@ example.o $(LDFLAGS)

Right, unfortunately the current Link Time Optimization model
requires the linker to "know" about LLVM.
http://llvm.org/docs/LinkTimeOptimization.html

That's the reason I want to try and build a bytecode lib: to see if link time
optimization of executable + libs has any effect on performance and on code
size.

Right.

My guess is that performance won't improve much, since there aren't
that many calls per second which cross the app-lib boundary. But code size
could improve if unused optional features can be elimated as dead code
because a function is only called in one particular way.

make sense!

By the way, the example from that document does not work with the current
llvm-gcc (GCC 4.0, LLVM 2.1-pre1). The last command fails:

$ llvm-gcc a.o main.o -o main
a.o: file not recognized: File format not recognized
collect2: ld returned 1 exit status

Again, this is because your native linker doesn't support liblto.

Linking with llvm-ld does work:

$ llvm-ld a.o main.o -native -o main
$ ./main
$ echo $?
42

The link step combines one or more input files into one output file. The input
files can be all bytecode, all native or mixed. The output file can be
bytecode or native. Since it is only possible to convert from bytecode to
native and not vice versa, bytecode output requires all bytecode input. So
the combinations are:

bytecode input, bytecode output:
Can be handled by llvm-ld without invoking system compiler/linker.

Yes, but note that this only works if you limit yourself to linker options known by llvm-ld. If you use funky stuff, llvm-ld won't be able to handle it. Also, llvm-ld may or may not handle archive resolution correctly (I don't remember).

native input, native output:
Handled by system compiler/linker.

bytecode or mixed input, native output:
According to the llvm-ld man page, llvm-ld will generate native code from the
bytecode files and invoke the system compiler to do the actual linking.

Yes.

Would it be possible to make llvm-gcc call llvm-ld instead of the
systemwide
ld? I tried setting the environment variables COMPILER_PATH=/usr/
local/bin
and GCC_EXEC_PREFIX=llvm- but that had no effect.

I see two solutions to this. One is to have llvm-gcc call llvm-ld
when it has some option passed to it. Another would be to enhance
'collect2' to know about LLVM files. 'collect2' is a GCC utility
invoked at link time, it would be the perfect place to add hooks.

I found the documentation of collect2 here:
  Collect2 (GNU Compiler Collection (GCC) Internals)

Its purpose seems to be to act like ld and insert calls to initialization
routines (and exit routines) before calling the real ld. The comment at the
top of the source file describes it like this:

   Collect static initialization info into data structures that can be
   traversed by C++ initialization and finalization routines.

Right, that is its intended purpose. It seems fairly straight forward to abuse it for our devious plans though :slight_smile:

Originally I was under the impression that llvm-ld was just an LLVM-aware
version of ld, but that is not the case. For example, when creating an output
file in native format, it runs the system compiler on the generated native
code and that compiler automatically picks up libraries such as libc, which
must be specified explicitly to ld. Also, although llvm-ld accepts many of
the options accepted by ld, GCC uses some ld options that llvm-ld does not
accept.

Right.

Going back to the two options you mentioned, they would lead to the following
invocation chains. Let's use the "mixed input, native output" scenario: if we
can support that, we can support the rest as well.

llvm-gcc calling llvm-ld:
  llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld

enhance collect2:
  llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld

I'd rather enhance collect2 like this:

llvm-gcc -> llvm-collect2(liblto) -> ld

Where llvm-collect2 is just collect2 that dlopen's liblto to do the optimization work. This makes it work much more naturally than adding a whole new set of steps. Depending on llvm-ld will never get you to a world where LTO is transparent, because llvm-ld doesn't support a lot of options and features that native linkers do.

To summarize:
- llvm-ld (currently) does not accept all flags that GCC passes to collect2
- an LLVM-aware collect2 would never perform the core function of collect2,
  which is generating init/exit code and data

Therefore, I think the scenario of llvm-gcc calling llvm-ld directly is
preferable.

Ah, but if the llvm-collect2 version was enhanced to do everything it does now, and additionally interface with liblto, then everyone wins :slight_smile:

The thing we're missing most right now is a volunteer to tackle this
project :slight_smile:

Since this is all new terrain for me, I might get stuck before producing
anything useful. But I'm willing to try.

Yay! Many people will appreciate this!

-Chris

So the llvm-collect2 will combine the functionality of the original collect2
and of llvm-ld?

When it executes, it would take the following steps:
1. for each input, determine whether it is in bytecode or native format
2. if there are no bytecode inputs, go to step 6
3. link the bytecode inputs and optimize the resulting bytecode, using liblto
4. if bytecode output was requested, we are done
5. generate native object in a temporary file
6. perform the init/exit fixups that the original collect2 does
7. invoke system linker to link the generated native object (if any) and the
input native objects (if any)

Assuming those steps are correct, step 6 and 7 could be implemented by using
the original collect2 and adding the generated native object to the list of
files to link. In other words, llvm-collect2 could be a separate process,
which is called instead of collect2, does some processing and then runs the
original, unmodified collect2:
  llvm-gcc -> llvm-collect2(liblto) -> collect2 -> ld

Bye,
    Maarten

llvm-gcc calling llvm-ld:
  llvm-gcc -> llvm-ld -> gcc -> collect2 -> ld

enhance collect2:
  llvm-gcc -> llvm-collect2 -> llvm-ld -> gcc -> collect2 -> ld

I'd rather enhance collect2 like this:

llvm-gcc -> llvm-collect2(liblto) -> ld

Where llvm-collect2 is just collect2 that dlopen's liblto to do the
optimization work. This makes it work much more naturally than adding
a whole new set of steps. Depending on llvm-ld will never get you to
a world where LTO is transparent, because llvm-ld doesn't support a
lot of options and features that native linkers do.

So the llvm-collect2 will combine the functionality of the original collect2
and of llvm-ld?

When it executes, it would take the following steps:
1. for each input, determine whether it is in bytecode or native format
2. if there are no bytecode inputs, go to step 6
3. link the bytecode inputs and optimize the resulting bytecode, using liblto
4. if bytecode output was requested, we are done
5. generate native object in a temporary file
6. perform the init/exit fixups that the original collect2 does
7. invoke system linker to link the generated native object (if any) and the
input native objects (if any)

Yep, exactly.

Assuming those steps are correct, step 6 and 7 could be implemented by using
the original collect2 and adding the generated native object to the list of
files to link. In other words, llvm-collect2 could be a separate process,
which is called instead of collect2, does some processing and then runs the
original, unmodified collect2:
  llvm-gcc -> llvm-collect2(liblto) -> collect2 -> ld

Sure, this would also work. Is there any reason not to merge them together?

-Chris

Ease of maintenance, mainly. Having it in a separate file makes it easier to
migrate the code to new GCC releases. Also, collect2.c is already 2658 lines,
which is more than I typically like to have in a single source file.

I'd like to turn the question around: is there an advantage to merging them?

In any case, if I can make it work as a separate process, it shouldn't be hard
to merge it into collect2 later.

Bye,
    Maarten

process,
which is called instead of collect2, does some processing and then
runs the
original, unmodified collect2:
  llvm-gcc -> llvm-collect2(liblto) -> collect2 -> ld

Sure, this would also work. Is there any reason not to merge them
together?

Ease of maintenance, mainly. Having it in a separate file makes it easier to migrate the code to new GCC releases. Also, collect2.c is already 2658 lines, which is more than I typically like to have in a single source file.

My impression is that collect2 doesn't change very much. In any case, the idea here would be that collect2 only has minimally invasive hooks to call into liblto. It seems like this would be much simpler than handling all the command line argument swizzling needed for forking subprocesses, and having the LTO app have to read all the .o files and analyze them (which collect2 is already doing).

I'd like to turn the question around: is there an advantage to merging them?

I think it would end up being simpler, and it would fit more naturally with liblto.

-Chris

After studying collect2.c a bit more, I see that quite a lot of it is for
option parsing and signal handling, so maybe merging is better indeed.

As far as I can see, collect2.c does not read the object files though: it
only runs "nm" on them, which is not what we need to determine which files
are bitcode files.

One thing I'm wondering is how to merge the C code of collect2 with the C++
code that uses liblto:
- convert collect2.c to collect2.cpp?
- put the C++ code in a separate source file and link the C object file and
the C++ object file together into a single collect2 executable?
- expose more functionality from include/llvm-c/LinkTimeOptimizer.h?
(meaning the code using liblto would be C, not be C++)

I currently have something that links the example without errors. It is not
pretty though: a Python script intercepts the invocation of collect2,
splits the list of object files into bitcode and native, calls a process I
named "precollect" to link the bitcode objects into a single native object
and then calls the real collect2 with only native objects. The precollect
tool is based on the llvm-ld source.

What does not work yet, is the actual optimization: precollect does not take
advantage of the fact that this is the final link step that will produce an
executable and all unreferenced symbols are unused. Therefore the dead code
elimination from the example is not performed. To make that possible,
precollect would have to know about all object files, including the native
ones, to determine which symbols are unused. Also, I should figure out how
to tell liblto "there are no symbol references that you do not know about";
I assume that option already exists, but I didn't look for it yet.

Bye,
    Maarten

Hello, Maarten

As far as I can see, collect2.c does not read the object files though: it
only runs "nm" on them, which is not what we need to determine which files
are bitcode files.

This can be identified via LLVM's sys::IdentifyFileType()

One thing I'm wondering is how to merge the C code of collect2 with the C++
code that uses liblto:
- convert collect2.c to collect2.cpp?
- put the C++ code in a separate source file and link the C object file and
the C++ object file together into a single collect2 executable?
- expose more functionality from include/llvm-c/LinkTimeOptimizer.h?
(meaning the code using liblto would be C, not be C++)

There are C wrappers for liblto calls. Also, ask Chandler (CCed) about
his patches. Maybe he started to work on collect integration already.

After studying collect2.c a bit more, I see that quite a lot of it is for
option parsing and signal handling, so maybe merging is better indeed.

Ok.

As far as I can see, collect2.c does not read the object files though: it
only runs "nm" on them, which is not what we need to determine which files
are bitcode files.

Ok, well you can use the LLVM functions to do this, or just assume that if NM doesn't know what it is that it is an LLVM file. The liblto methods should return an error code if the file isn't an llvm bc file.

One thing I'm wondering is how to merge the C code of collect2 with the C++
code that uses liblto:
- convert collect2.c to collect2.cpp?
- put the C++ code in a separate source file and link the C object file and
the C++ object file together into a single collect2 executable?
- expose more functionality from include/llvm-c/LinkTimeOptimizer.h?
(meaning the code using liblto would be C, not be C++)

I'd suggest using the C interfaces as Anton mentioned.

What does not work yet, is the actual optimization: precollect does not take
advantage of the fact that this is the final link step that will produce an
executable and all unreferenced symbols are unused. Therefore the dead code
elimination from the example is not performed. To make that possible,
precollect would have to know about all object files, including the native
ones, to determine which symbols are unused. Also, I should figure out how
to tell liblto "there are no symbol references that you do not know about";
I assume that option already exists, but I didn't look for it yet.

If collect2 is already calling nm, you can use the info from nm to tell liblto about symbols defined and referenced by the native .o files. This will let transparent mix and match of .o files work.

-Chris