Running distributed thinLTO without thin archives.

I’m trying to run distributed ThinLTO without thin archives.
When I do, I get an error in the optimizer when clang tries to open a nonexistent file:

clang++ -flto=thin -Xclang -fno-lto-unit -O3 -c main.cpp -o main.o
clang++ -flto=thin -Xclang -fno-lto-unit -O3 -c lib/lib.cpp -o lib/lib.o
clang++ -flto=thin -Xclang -fno-lto-unit -O3 -c src/lib.cpp -o src/lib.o
llvm-ar -format gnu qcs lib.a lib/lib.o src/lib.o
clang++ -flto=thin -o index -O3 -Wl,-plugin-opt,thinlto-index-only=thinlto.objects -Wl,-plugin-opt,thinlto-emit-imports-files main.o lib.a
clang++ -c -x ir main.o -O3 -flto=thin -o main-native.o -fthinlto-index=main.o.thinlto.bc
Error loading imported file ‘lib.a.llvm.2596.lib.cpp’: No such file or directory

In this case, gold has registered the modules within my archive with ThinLTO.
The string “lib.a.llvm.2596.lib.cpp” is generated with the archive in question, plus an offset indicating where in the archive the particular object file is.
Unfortunately, when the optimizer tries to include the proper modules,
it’s naively looking for a bitcode file with the name of the string provided, but there’s obviously no “lib.a.llvm.2596.lib.cpp” for it to open.

Has anyone else tried to get clang to understand distributed ThinLTO when using non thin archives?
Is there some way to get clang to understand these out of the box?

I’m actually a little confused about the “.cpp” in “lib.a.llvm.2596.lib.cpp”.
Seems like it should be a “.o”?

It didn’t seem like there was anything out of the box that supported this.
I was looking at having clang actually read in the archive file and register the correct bitcode module.
I wanted to run it by the list to get some second opinions before I started that.

Hi Tanoy,

You can’t use distributed ThinLTO with archives (thin or not), at least not today. The reason is that we need to be able to identify specific bitcode object files to import from in the backends, and that logic does not know how to deal with objects within archives. We do distributed ThinLTO in our builds but don’t use .a files, rather, we use --start-lib/–end-lib around the files that would be in the same archive when performing the thin link. I.e. if you change your thin link to be:

clang++ -flto=thin -o index -O3 -Wl,-plugin-opt,thinlto-index-only=thinlto.objects -Wl,-plugin-opt,thinlto-emit-imports-files main.o --start-lib lib/lib.o src/lib.o --end-lib

things should work.

Note you also need to do the ThinLTO backend compile for each of the archive constituents anyway, e.g. something like:
clang++ -c -x ir lib/lib.o -O3 -flto=thin -o lib/lib-native.o -fthinlto-index=lib/lib.o.thinlto.bc

etc

HTH,
Teresa

Thanks!

Question about the final link step:

Do I provide all the object files to the link step, i.e. something like:
clang++ -o thinlto main-native.o lib/lib-native.o src/lib-native.o

Do I need to provide --start-lib markers on that final link step as well?

Tanoy

Thanks!

Question about the final link step:

Do I provide all the object files to the link step, i.e. something like:
clang++ -o thinlto main-native.o lib/lib-native.o src/lib-native.o

Do I need to provide --start-lib markers on that final link step as well?

No and No. After the thin link the linker has already done its symbol resolution, and using --start-lib/–end-lib in the final link can muck with that. However, the list of files the linker selected in the right order is emitted in the argument given to thinlto-index-only (thinlto.objects in your case below). You can pass that to the native link via the “@” option:
clang++ -o thinlto @thinlto.objects
however you need to deal with the fact that this file contains the original bitcode names (not the names you gave it in your backend step like main-native.o, etc)
There are 2 options for correcting the names:

  1. Manually rename in thinlto.objects
  2. Use the thinlto_prefix_replace=oldprefix;newprefix plugin option, to replace the old path prefix of the input bitcode files with a new path prefix. In your case the old prefix is “”, so you could do something like “-Wl,-plugin-opt,thinlto_prefix_replace=:native/”. This should do 2 things: 1) the generated .thinlto.bc index files and the .imports files will be put under a “native/” subdirectory; 2) the paths in thinlto.objects should also have the “native/” prefix. If you use that prefix in your LTO backend clang invocations (e.g. -o native/main.o instead of main-native.o), then thinlto.objects can just be passed directly to the final link via “@” without any modification.

Note that if your thin link included any already native files/libraries, those still need to be passed as the thinlto.objects only includes those that were originally bitcode.

Teresa

Thanks very much!

One more question:

If I wanted to implement archive support for distributed ThinLTO, what all would I need to do?

I know I need to pull out the bitcode module during the optimizer step, which I’ve looked at.

What else would I need to change?

Hi Tanoy,

Sorry for the slow response. I haven’t thought through what would need to be done here very closely, but here are a couple of thoughts.
Somehow, the module identifier for each constituent object would need to be both unique (we currently generate this from the name of the archive plus the offset in the archive plus the name of the source file IIRC), but also correctly identify the extracted bitcode object used in the post-thinlink backend invocation. This is so it can write out the distributed index file with a filename that gets consumed by the associated backend invocation (passed to -fthinlto-index=), and so that the module paths emitted in those index files correctly identify where we can import functions from. Since the bitcode objects need to be extracted for the corresponding backend clang invocations, a couple possibilities come to mind:

  1. Do it outside the compiler/linker: wrap the whole thing in a script that does the extraction, invokes the link with extracted constituents surrounded by --start-lib/-end-lib pairs, invokes each backend through some parallel or distributed mechanism, and then invokes the final link; or
  2. Add support to pass some kind of mapping file into LTO that maps from each archive constituent to the extracted filename including path that the corresponding ThinLTO backend clang invocation will use, and have LTO set the module identifiers accordingly so that everything “just works” (in theory). We already support some munging of these names (see the thinlto_object_suffix_replace plugin option in either gold-plugin.cpp or in lld), but what you need here is a bit more complicated than a simple suffix change. But since there is already support for adjusting the name, it might not be too bad to add this support.

Hope that helps,
Teresa