A pitch for future RFC that proposes Integrated Distributed ThinLTO concept

Hello LTO folks,

This is a pitch for future RFC that proposes Integrated Distributed ThinLTO concept. We have been working this project for a while and we’d love to upstream it sooner rather than later. If we get a positive response, I will send a complete RFC in a couple of days.

Motivation in brief:

  • Sony have customers with LLVM-based toolchains using a variety of build and distribution systems, including but not limited to our own (called SN-DBS). We’d like to implement support for an integrated Distributed ThinLTO (DTLTO) approach to improve their build performance.
  • This approach should be very easy for the customers to adopt (as simple as adding a couple of options to the linker command line) and shouldn’t require any other additional makefile or build script modifications.
  • There’s an Open Source DTLTO approach already, but it requires a build system capable of handling dynamic dependencies (such as Bazel) and there are plenty of projects where such a system is not in use (LLVM itself being a case in point).

Basic idea:

  • The idea behind an integrated Distributed ThinLTO approach, very roughly, would be to have the linker (LTO library, really) orchestrate the distribution.
  • There would be an interface of some kind to allow a variety of distribution systems to be implemented by either LLVM contributors or end users.
  • The main benefit from our approach for the users is ease of adoption of DTLTO. If the support for a particular distributed build system is already implemented, the user only needs to tell the linker that he wants to do DTLTO and which distribution system to use. The distributed build system pushes jobs out via the appropriate implementation (assuming a distribution system is already available on the user’s network).
  • So, from user’s perspective the adoption of Distributed ThinLTO will only require adding a couple of linker switches to existing build scripts.

Current status of the project:

  • We already implemented support for two distributed build systems:
    • Sony’s proprietary build system (SN-DBS). We have a production level implementation.
    • Icecream . Currently we have proof of concept. We assumed that Open Source will be not interested in supporting Sony’s proprietary build system.

We have some concrete design ideas and as I mentioned earlier, working prototypes with the support for two build systems (SN-DBS and Icecream). If this doesn’t sound too crazy, we’d like to send a compete RFC this week and start sending patches in order to get our work in LLVM!

Note: We made a presentation in LLVM-2021 Dev meeting about Integrated Distributed ThinLTO with Icecream. 2021 LLVM Dev Mtg “Integrated Distributed ThinLTO with Icecream” - YouTube

This presentation can provide additional details about our approach.

Regards, Katya Romanova.

1 Like

Hi!

This is very interesting for us (Ubisoft) since we want to deploy ThinLTO in the future, but distribution have been a worry for us.

I think a prime candidate for integration would be FastBuild, considering how prevalent it is for other game companies.

I am interested in the technical details to better understand if this is something we could adopt with FastBuild.

The actual proposals are pretty different, but the high-level idea of having tighter toolchain/build system integration reminds me of RFC: Add an LLVM CAS library and experiment with fine-grained caching for builds. A lot of that code is now in Apple’s LLVM fork if you wanted to reference it or coordinate.

@aganea might be interested as well?

There’s an Open Source DTLTO approach already, but it requires a build system capable of handling dynamic dependencies (such as Bazel)

My understanding is that Bazel doesn’t actually have dynamic dependencies - it’s important that at the time you invoke the build it knows about all actions and all inputs to those actions.

What it does have, which I guess is a limited form of dynamic dependencies, is the ability to dynamically prune dependencies - so it starts off with every LTO backend compile depending on every input module, but after the thin link step it uses some of the info there to prune down those lists.

Is that a deal breaker for you/your build system? (could try the performance without the pruning & see if it’s adequate, or implement that sort of pruning?)

Would be nicer to avoid having more combinations/ways of doing things, if practical.

Hello,

Yes, this is something that will be quite possible to adopt, but some development work will be needed both on LLVM side and FastBuild side.

  • On LLVM side not many changes will be needed. It will be necessary to add a derived class for an abstract class Build Script Generator. There will be a couple of derived classes (for SN-DBS build script generator and for Icecream Makefile generator) implemented by us that could be used an example. Something similar will need to be done for Fast Build build script generator derived class.

  • On Fast Build side, several things needs to be done. Some of them might be already supported by Fast Build, I just don’t know.
    (a) Fast build needs to be aware of bitcode files and how to handle them as an input (e.g. it needs to know that it doesn’t need to invoke a C/C++ preprocessor when it sees a bitcode file)
    (b) It needs to be aware about the command line options, such as -thinlto-index=, -x IR, -ftlto and potentially about other LTO-related options.
    (c) Fast build needs to know how to handle a compiler command with several input files.
    (d) Fast build needs to know that if several import files are passed on the command line,
    these files need to be sent to the remote node together with the main bitcode file.

If we are approved to upstream our project and the upstreaming work is complete, we potentially could look into supporting DTLTO with FastBuild too. But the major part of work will need to be done in FastBuild project, and it could be done independently by anyone who is interested.

Hi David,

Pruning dependencies when they are calculated after the completion of ThinLink phase will produce exactly the same results as we are achieving with our project. After such pruning, Bazel doesn’t copy all the input bitcode files to all the remote machines for each of compilations, just the ones that are needed for importing.

Unfortunately, our proprietary distribution system (SN-DBS) nor any other commonly used distribution system (to the best of my knowledge) such as Icecream, DistCC are not nearly as advanced as Bazel and do not have this “pruning” feature and cannot handle dependencies dynamically.

I wanted to emphasize the main difference between Bazel and SN-DBS, Icecream, DistCC. Please correct me if I’m wrong… My understanding is that Bazel is a distributed build system (so it’s more closer related to build systems as ‘make’ or ‘ninja’), while SN-DBS, Icecream, DistCC are simply distribution systems (you give them list of jobs and they distribute them). I assume that Bazel constructs build execution graph and when a node that requires linking with distributed ThinLTO is being processed, ThinLink is invoked and it calculates the exact list of dependencies. After that Bazel can prune some edges from the build execution graph, because now it knows exactly which files are needed/not needed for importing. Distribution systems simply cannot do that. Unlike make, ninja or Bazel they have no knowledge about build rules and dependencies. They simply distribute the jobs that they are given.

To answer your question about “pruning”, I think it is simply impossible to do for distribution systems for the reasons described above, it’s only possible to do within distributive build systems.

My intuition is that for a huge project (let’s say LLVM) if for each module that needs to be codegen-ed there is a dependency on every input module in the project, we will create enormous network traffic by copying tens of thousands of files to the remote nodes, and as a result, we expect significant increase of the link time. How much slower it will be, I honestly don’t know, we didn’t do an experiment like that before.

If you need to have these performance numbers, it’s possible to do, but it’s several days of work for us. We will have to create a mock project that does this “poor man” dependencies evaluation (i.e. the list of dependencies are all input files) and compare the performance of this mock project with the integrated approach that we proposed (when a minimal set of dependencies are calculated). Let me know if you need this data to make the right decision.

Hello,
These are actually two orthogonal projects. Apple’s project is about caching computational results for builds, our project is all about the ease of use/adoption of Distributed ThinLTO for existing software projects (basically, you add one option of the linker command line and you have your Distributed ThinLTO up and running).

Could you provide some example commands to help understand the proposal? For the record, the below is how distributed ThinLTO currently works with Bazel.

Let’s say we want to compile a.c, b.c, and c.c with LTO and link them with two ELF relocatable files elf0.o and elf1.o.
We link LLVM bitcode files b.o and c.o as lazy files, which have archive semantics (surrounded by --start-lib and --end-lib).

echo 'int bb(); int main() { return bb(); }' > a.c
echo 'int elf0(); int bb() { return elf0(); }' > b.c
echo 'int cc() { return 0; }' > c.c
echo 'int elf0() { return 0; }' > elf0.c && clang -c elf0.c
echo '' > elf1.c && clang -c elf1.c

clang -c -O2 -flto=thin a.c b.c c.c
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp,--thinlto-emit-imports-files -Wl,--thinlto-prefix-replace=';lto/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o lto/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains lto/a.o and lto/b.o

--thinlto-index-only (--plugin-opt=thinlto-index-only) performs a thin link.
Here we use a variant --thinlto-index-only=a.rsp which additionally creates a response file.
The response file lists ELF relocatable files whose names are derived from the input file names. Unextracted lazy LLVM bitcode files are omitted.

c.o is an unextracted lazy LLVM bitcode file. It gets a nearly empty .thinlto.bc.

If --thinlto-emit-imports-files is specified, ld.lld will create import files lto/[abc].o.imports.
lto/a.o.imports lists files from which compiling a.o will import.
lto/c.o.imports will be empty: the build system does not need to know whether a lazy LLVM bitcode file is extracted or not.

clang -fthinlto-index= calls clang/lib/CodeGen/BackendUtil.cpp clang::EmitBackendOutputrunThinLTOBackendllvm/lib/LTO/LTOBackend.cpp lto::thinBackend.

The response file @a.rsp is reordered before all ELF relocatable files. This may cause strange behaviors in presence of ODR violations.

(I’ll be out of town for about 2 weeks and may not reply in time.)

2 Likes

Hello,
I have posted a complete RFC to llvm discourse today. Here is the link. It has a lot of details and examples of usage.

The main idea behind this project, is that if you already have a distribution system up and running, the only thing you need to do in order to switch from ThinLTO to distributed ThinLTO is

  • to say that (a) you want to have Distributed ThinLTO (option –thinlto-distribute)
  • point to the location of distributive system executable (option –thinlto-distribution-tool=$(DIST_CC))

So, if you have a Makefile rule for ThinLTO looked like this (libsupport.a is a archive containing bitcode file)

program.elf: main.o file1.o libsupport.a
        $(LD) --lto=thin main.o file1.o -lsupport -o program.elf

in order to enable DTLTO, the user simply needs to change the rule like this:

program.elf : main.o file1.o libsupport.a
        $(LD) --lto=thin –thinlto-distribute –thinlto-distribution-tool=$(DIST_CC) main.o file1.o -lsupport -o program.elf

In the example that you showed above, you will not be able to switch from regular TLTO build into DTLTO build by adding just a couple of options. You will have to write a set of new commands. Modifying complex build scripts/makefiles is not convenient for the users and deferring them from using distributed ThinLTO.

We already have DTLTO (Distributed ThinLTO) integrated with Sony’s proprietary distribution system (SN-DBS) and this project is production level. Since integration with any proprietary distribution system will not be appealing for LLVM community, we also created a prototype for integrating DTLTO with open source distribution system called Icecream. In order to do this, we had to support Icecream makefile generator in LLVM as well as make some changes in Icecream’s sources (though we haven’t ask for permissions to upstream our changes to Icecream yet).

And another thing that I wanted to emphasize, that with our integrated DTLTO approach the users don’t have to convert archives into thin archives or unpack the archives and surround its members with --start-lib … --end-lib. All the work is done by the linker.

The only thing that is needed for the user to switch from using regular ThinTLO into distributed ThinLTO is to add two options to the linker command line.