Building libjpeg-turbo with LTO

Hi,

I have tried to build libjpeg-turbo with LTO in LLVM, using both clangbut get many errors in lld that look like the following:

ld: error: undefined symbol: jpeg_std_error

referenced by jcstest.c:76
lto.tmp:(main)

ld: error: undefined symbol: jpeg_CreateCompress

referenced by jcstest.c:86
lto.tmp:(main)

ld: error: undefined symbol: jpeg_set_defaults

referenced by jcstest.c:88
lto.tmp:(main)

ld: error: undefined symbol: jpeg_default_colorspace

referenced by jcstest.c:90
lto.tmp:(main)
referenced by jcstest.c:114
lto.tmp:(main)

This only occurs when compiling with the -flto flag. Has anyone been able to build libjpeg-turbo with LTO? Are there any modifications I need to make to the makefile or other configuration in order to do so? Thanks for your help!

Best,
Shishir Jessu

To correct a typo: I am using both clang 6.0.0, and a local build of clang 10.0.0, and each result in the same error.

Best,
Shishir Jessu

Are the object files for jcstest.c and the source files defining these symbols being directly LTO linked together, or are the defs first LTO linked into a shared library? It would be helpful to see the build commands involved.
Teresa

Adding a couple of lld folks.

I helped Shishir debug this, the link line looked like:
/home/sjessu/build/bin/clang -O0 -flto -o jcstest jcstest.o ./.libs/libjpeg.a
and the issue was that libjpeg.a was created with the system ar instead of llvm-ar. It worked when recreating libjpeg.a with llvm-ar.

I noticed that the lld code has some special handling for the case when there is a missing symbol table, which often happens with system ar created archives containing bitcode. I noticed that the lld code will sometimes emit an error, but actually contains a special hack to handle archives containing only bitcode objects, so that they are handled correctly even when there is no symbol table because it was created with the system ar. Unfortunately, in this case it neither gave an error nor did the special handling, because libjpeg.a also contains some native objects and thus had a non-zero symbol table. I created a version of libjpeg.a using the system library and containing only the bitcode objects, and confirmed it links fine with lld (the native objects weren’t needed in this case). BTW this is the code in ELF/Driver.cpp LinkerDriver::addFile.

Would it be possible to extend the hack in lld to handle cases like this with some bitcode objects and some non-bitcode objects, so that the bitcode objects are not simply ignored?

Thanks,
Teresa

Adding a couple of lld folks.

I helped Shishir debug this, the link line looked like:
  /home/sjessu/build/bin/clang -O0 -flto -o jcstest jcstest.o
./.libs/libjpeg.a
and the issue was that libjpeg.a was created with the system ar instead of
llvm-ar. It worked when recreating libjpeg.a with llvm-ar.

I noticed that the lld code has some special handling for the case
when there is a missing symbol table, which often happens with system ar
created archives containing bitcode. I noticed that the lld code will
sometimes emit an error, but actually contains a special hack to handle
archives containing *only* bitcode objects, so that they are handled
correctly even when there is no symbol table because it was created with
the system ar.

https://reviews.llvm.org/D63781 added the "archive has no index; run
ranlib to add one" error.

A clarification: the LinkerDriver::addFile code handles mix-and-match
ELF object members and bitcode members. A LazyObjFile can be
either an ELF object file or an LLVM bitcode file.

Unfortunately, in this case it neither gave an error nor did
the special handling, because libjpeg.a also contains some native objects
and thus had a non-zero symbol table. I created a version of libjpeg.a
using the system library and containing only the bitcode objects, and
confirmed it links fine with lld (the native objects weren't needed in this
case). BTW this is the code in ELF/Driver.cpp LinkerDriver::addFile.

Would it be possible to extend the hack in lld to handle cases like this
with some bitcode objects and some non-bitcode objects, so that the bitcode
objects are not simply ignored?

Thanks,
Teresa

I guess what happened here is that the archive has an incomplete symbol table.
nm -s (--print-armap) can print the symbol table.

   % ar rc a.a a.bc a.o; nm -s a.a
      Archive index:
   _start in a.o
   nm: a.bc: file format not recognized
      a.o:
   0000000000000000 T _start

Currently lld trusts the archive symbol table. If the archive symbol table
actually misses some entries (GNU ar does not add bitcode definitions to the
symbol table), lld will not know that some lazy definitions are actually
missing.

It seems that if we have to make the GNU ar scenario work, lld has to distrust the archive symbol table when it contains bitcode files...
To not pessimize the case with all bitcode members but no ELF object members, we need to refine the hack to "distrust" the archive symbol table
if (the archive symbol table exists && an ELF object member exists && a bitcode member exists).

Does this scheme sound good?

I don’t think there really ought to be an expectation that this works with an ar implementation which can’t parse the LTO files.

The only way it works with GCC is that they ship /usr/lib/bfd-plugins/liblto_plugin.so which “claims” the LTO object files and tells ar about the symbol table.

Either users should be using llvm-ar, or LLVM should be shipping a gnu binutils plugin.

I don’t think there really ought to be an expectation that this works with an ar implementation which can’t parse the LTO files.

The only way it works with GCC is that they ship /usr/lib/bfd-plugins/liblto_plugin.so which “claims” the LTO object files and tells ar about the symbol table.

Either users should be using llvm-ar, or LLVM should be shipping a gnu binutils plugin.

I believe the system ar will work in combination with the LLVM gold plugin, btw.

The confusing thing here is that it fails silently. If you don’t know what you are looking for (I didn’t even remember this when initially helping Shishir, and I spend a lot of time looking at LTO behavior), it’s impossible to figure out why the link is failing. It would be friendliest to users if lld either consistently gave a meaningful error, or consistently just worked (like it does in the all bitcode case, even without a symbol table).

Fangrui - I am not sure I followed your suggestion. But if it means that a mixed bitcode/native object case would just be handled with or without a complete symbol table, that would be awesome. In this case the symbol table is incomplete (only has symbols for the native objects).

Teresa

Teresa,

Adding a couple of lld folks.

I helped Shishir debug this, the link line looked like:
/home/sjessu/build/bin/clang -O0 -flto -o jcstest jcstest.o ./.libs/libjpeg.a
and the issue was that libjpeg.a was created with the system ar instead of llvm-ar. It worked when recreating libjpeg.a with llvm-ar.

I noticed that the lld code has some special handling for the case when there is a missing symbol table, which often happens with system ar created archives containing bitcode. I noticed that the lld code will sometimes emit an error, but actually contains a special hack to handle archives containing only bitcode objects, so that they are handled correctly even when there is no symbol table because it was created with the system ar. Unfortunately, in this case it neither gave an error nor did the special handling, because libjpeg.a also contains some native objects and thus had a non-zero symbol table. I created a version of libjpeg.a using the system library and containing only the bitcode objects, and confirmed it links fine with lld (the native objects weren’t needed in this case). BTW this is the code in ELF/Driver.cpp LinkerDriver::addFile.

Would it be possible to extend the hack in lld to handle cases like this with some bitcode objects and some non-bitcode objects, so that the bitcode objects are not simply ignored?

Interesting suggestion. So, as you summarized, lld has a special hack for LTO in terms of archive file handling. That is, if an archive file’s symbol table is completely empty, we consider it as a result that the system linker (which doesn’t understand the LLVM bitcode file format) is wrongly used against bitcode files. However, if at least one member object file is in the native ELF format, the archive file will have some symbol in its symbol table, so the hack won’t kick in.

I think one approach to fix the issue is to not trust the archive file symbol table for bitcode files at all. Instead, we can read directly from a symbol table of each archive member bitcode file. That shouldn’t be technically difficult. I’m a bit worried about the performance penalty of doing that, though, because in order to read bitcode file symbol tables, we have to identify which file is bitcode file and which file is native ELF file. That means we have to read a file magic from all archive members. That might be noticeably slow, in particular, if thin archives are in use, but that’s highly dependent on the filesystem where the input files are laid out.