[RFC] Integrated Distributed ThinLTO

kromanova · March 30, 2023, 4:28am

A few days ago I posted a pitch for our Integrated Distributed ThinLTO project

And now I’m posing a compete RFC.

Integrated Distributed ThinLTO

Goal

We have customers with LLVM-based toolchains that use a variety of different build and distribution systems including but not limited to our own. We’d like to implement support for an integrated DistributedThinLTO (DTLTO) approach to improve their build performance, while making adoption of DTLTO seamless, as easy as adding a couple of options on the command line.

1. Challenges

DTLTO is more complex for integration into existing build systems, because build rule dependencies are not known in advance. They become available only after DTLTO’s ThinLink phase completes and for each of the bitcode files a list of its import files becomes known.

2. Dynamic dependency problem solution

Let’s take an example and assume that we have a Makefile rule for performing the regular ThinLTO step that looks like this:

program.elf: main.o file1.o libsupport.a
        $(LD) --lto=thin main.o file1.o -lsupport -o program.elf

Let’s use the following example:

main.o, file1.o are bitcode files
libsupport.a is an archive containing two bitcode files: file2.o and file3.o
main.o imports from file1.o and file2.o (part of libsupport.a)
file1.o imports from file2.o
file2.o has no dependencies (it doesn’t import from my other bitcode file)
file3.o (part of libsupport.a) is not referenced

For Distributed ThinLTO, we will need to find a way to overcome the problem of the dynamic dependencies described in section 1.

In the case of a build system based on Makefiles (such as Icecream or DistCC), we need the linker to produce an additional makefile that will contain the build rule with a dynamically calculated dependencies list.

So, in the first rule, the linker, when given a special option -dtlto, will generate an additional makefile.

distr.makefile: main.o file1.o libsupport.a
        $(LD) main.o file1.o --dtlto distr.makefile -lsupport -o program.elf

It will also implicitly generate individual module summary index files <filename>.thintlo.bc corresponding to each of the input bitcode files.

Let’s use the following conventions:

.o – bitcode file
.native.o – native object file
.thintlo.bc – individual summary index file
–dtlto <makefile_name> – option for producing an additional makefile for the second makefile rule; it also implicitly produces the set of individual module summary index files

Here is the body of distr.makefile file that the linker generates in the first rule described above:

DIST_CC := <path to a tool that can distribute ThinLTO codegen job>
        main.native.o : main.thinlto.bc main.o file1.o file2.o
$(DIST_CC) clang –thinlto-index=main.thinlto.bc main.o file1.o file2.o -o main.native.o
        file1.native.o : file1.thinlto.bc file1.o file2.o
$(DIST_CC) clang –thinlto-index=file1.thinlto.bc file1.o file2.o -o file1.native.o
        file2.native.o : file2.thinlto.bc file2.o
$(DIST_CC) clang –thinlto-index=file2.thinlto.bc file2.o -o file2.native.o
        program.elf: main.native.o file1.native.o file2.native.o
$(LD) main.native.o file1.native.o file2.native.o -o program.elf

In the second rule, the linker itself could invoke $(MAKE) with this additional makefile distr.makefile to produce target executable:

program.elf: distr.makefile
    $(MAKE) -j<N> -f distr.makefile

This option –dtlto= <makefile_name> was introduced for this RFC only to simplify the explanations of semantics for these rules. Since the linker performs both rules, there is no need to pass an option to choose the name of the makefile, the linker could take care of it.

From the user’s perspective, the original make rule will require only two small modifications, namely adding one additional option on the command line to tell the linker to do the distribution (–thinlto-distribute) and which distribution system to use (–thinlto-distribution-tool=“path to a tool that distributes ThinLTO codegen job”). These options needs to get implemented in the linker.

So, if the original rule for ThinLTO looked like this:

program.elf: main.o file1.o libsupport.a
        $(LD) --lto=thin main.o file1.o -lsupport -o program.elf

in order to enable DTLTO, the user simply needs to change the rule like this:

program.elf : main.o file1.o libsupport.a
        $(LD) --lto=thin –thinlto-distribute –thinlto-distribution-tool==$(DIST_CC) main.o file1.o -lsupport -o program.elf

Note that no additional work needs to be done by the user or the build system to handle archives. All this work is done by the linker.

3. Overview of existing popular Open Source & Proprietary systems that could be used for ThinLTO codegen distribution

Some or all of these systems could be potentially supported, bringing a lot of value for the ThinLTO customers who have already deployed one of these systems.

Distcc
Icecream
FastBuild
Incredibuild; Incredibuild is one of the most popular proprietary build systems.
SN-DBS; SN-DBS is a proprietary distributed build system developed by SN Systems, which is part of Sony. SN-DBS uses job description documents in the form of JSON files for distributing jobs across the network. In Sony, we already have an internal production level DTLTO implementation using SN-DBS. In our implementation, the linker is responsible for generating the JSON build files.

4. Challenges & problems

This section describes the challenges that we encountered when implementing DTLTO integration with our proprietary distributed build system called SN-DBS. All of these problems will be applicable to DTLTO integration with any distributed system in general. The solution for these problems is described in detail in Section 6.

4.1 Regular archives handling

Archive files can be huge. It would be too time-consuming to send the whole archive to a remote node. One of the solutions is to convert regular archives into thin archives and access individual thin archive members.

4.2 Processes access to file system synchronization

Since at any given moment several linker processes can be active on a given machine, they can access the same files at the same time. We need to provide a reliable synchronization mechanism. The existing LLVM file access mutex is not adequate since it does not have a reliable way to detect abnormal process failure.

4.3 File name clashes

We can have situations where file names can clash with each other. We need to provide file name isolation for each individual link target.

4.4 Remote execution is not reliable and can fail at any time

We need to provide a fallback system that can do code generation on a local machine for those modules where remote code generation failed.

5. Linker execution flow

5.1 Linker options

The following options need to be added:

An option that tells the linker to use the integrated DTLTO approach.
An option that specifies what kind of distribution system to use.
Options for debugging and testing.

5.2. Linker SCAN Phase algorithm:

If an input file is a regular archive:

Convert regular archive into a thin archive. If the regular archive contains another regular archive, it will be converted to a thin archive during the next linker scan pass.
Replace the path to the regular archive with a path to the thin archive.

After the scan phase has completed, the linker has determined a list of input bitcode modules that will participate in the final link. Also, by now, the linker has collected all symbol resolution information.

5.3. LINK Phase:

The linker uses symbol resolution information for producing individual module summary index files and cross module import lists.

The linker performs code generation on each of the input bitcode modules. This pseudo algorithm depends on the type of job distribution system used.

Check if any of the input bitcode has a corresponding cache entry. If the cache entry exists, this particular bitcode will be excluded from code generation.
Generate the build script specific to the job distribution system.
Invoke the generated build script.
Check that the list of expected native object files matches the list of the files returned after build script execution. If any of the native object files are missing, the linker uses the fallback system to perform code generation locally for all of these missing native object files.
Place native object files into corresponding cache entries.
Perform the final link and produce an executable.

6. Implementation details

6.1 Regular to Thin Archive converter

In section 4.1 we explained why dealing with regular archives is inefficient and proposed converting regular archives into thin archives, later copying only individual thin archive members to remote nodes.

We implemented a regular to thin archive converter based on llvm/Object/Archive.h

The regular to thin archive converter creates or opens an inter-process sync object.
It acquires sync object lock.
It determines to what directory to unpack the regular archive members. This decision is based on the command line option, system temp, or current process directory (in this priority).
If the thin archive doesn’t exist:
- Unpack the regular archive
- Create the thin archive from regular archive members
Else:
- Check the thin archive file modification time***
- If (the thin archive is newer than the regular archive) &&*** ( **the thin archive integrity is good):
  - Use existing thin archive
- Else:
  - Unpack the regular archive
  - Create the thin archive from regular archive members.

Note: all thin archive members match regular archive members

6.2 Fallback code generation

In section 4.4 we described a problem that remote execution is not as reliable as local execution and it can fail at any time (e.g. a network is down, remote nodes are not accessible, etc). So, we need to implement a reliable fallback mechanism that can do code generation on a local machine for all those modules that failed to generate remotely.

Check if a list of missing native object files is not empty.
Create queue of commands for performing codegen for missing native object files.
Use the process pool to execute the queue of commands.
Report fatal error if some native object files are still missing.

6.3 Build script generators

We have created an abstract class that allows adding implementions for build script/makefile generators for different distributed build systems.

We have already added two derived classes that implement an SN-DBS build script generator and an Icecream Makefile generator. If an LLVM contributor would like to add a support for a new distributed build system (e.g. Fast Build), they will have to add an implementation for a derived class for that particular distributed build system, using the classes implemented by us as an example.

6.3.1. Makefile generator

Makefile is used for Distcc, Icecream, Goma, IncrediBuild.

6.3.2. SN-DBS JSON document generator.

SN-DBS is Sony’s proprietary distributed build system.
SN-DBS job description allows multiple link targets in one job.

The job description generator takes a list of input files, creates a corresponding list of commands to perform code generation for each of these files, and writes it to the JSON files for SN-DBS to use.

tschuett · March 30, 2023, 7:24am

There was a discussion about compressing sections:

They discussed the possibility for a post-link tool to compress the sections instead of having the linker do the work.

Do you need to integrate with LLD or could you provide the same solution with a post-link tool?

pogo59 · March 30, 2023, 12:39pm

The linker is where ThinLTO compilation occurs, so a post-link tool wouldn’t be appropriate here.

tschuett · March 30, 2023, 1:10pm

I meant after the thin-link. LLD drops files on the disk. They are not sufficient to generate Makefiles?

aganea · March 30, 2023, 2:22pm

Hello @kromanova!

Thanks for the RFC! Several questions and remarks come to mind:

Does this work on Windows? You mention elf and LD, so I assume the main target is the Playstation toolchain. Have you looked at how the generation would be affected if targeting Windows?
You mention FASTBuild. Just worth noting that FASTBuild is not a distribution tool, but a build system, similar to ninja, with its own script, distribution and caching capabilities. In the same way as you do for Makefiles, you would need to generate a FASTBuild .bff file and invoke a new instance of fbuild.exe with that (temporary?) file. Otherwise, FASTBuild does not support dynamic actions, once the graph has been built in memory from the script, it is immutable. As for sending or receiving multiple files, that already works through its concept of MultiBuffer.
The instance of fbuild.exe that you would spawn wouldn’t inherit the cmd-line flags of the original fbuild.exe invocation, and some of these flags might be important. In addition to –thinlto-distribution-tool= you might want to add an option for passing flags to the distribution tool. Also the order of arguments matters, so probably a pattern where you insert the generated script option (something like fbuild.exe -config %1 -cache ...) would be even better.
Have you thought of failures, CTRL-C, stdout, stderr, process exit code? How are those propagated back to original build system invocation? One drawback I see with a possible FASTBuild integration, is that the user won’t see the summary report of the ThinLTO step. They won’t see what files were compiled, cached, distributed, etc. They will only see a monolithic elapsed time for the topmost lld-link.exe invocation. One other example is when the build is cancelled within Visual Studio, the whole child hierarchy needs to be terminated gracefully (fbuild.exe maintains a surrogate instance to avoid corrupting files, see “wrapper” in the FASTBuild code). The ThinLTO lld-link.exe instance needs to somehow handle this cancellation event.
I think if you provided a ninja-build integration, that would be a good step forward for integration with other build systems like FASTBuild. It would be only local, but it would iron out some of the issues mentioned above.
One thing I dislike perhaps is the fact that the new instance of the distributed tool (launched by lld-link.exe) is cannibalizing the resources (and is not aware of) of the original tool instance. Your current proposal works well when linking one executable, since its the last step nothing else happens in parallel, but what about compiling LLVM/Clang/LLD? We generate about 200 link steps that could happen in parallel. Since there’s no good orchestration we mitigate that with cmake -DLLVM_PARALLEL_LINK_JOBS=2 but that’s just a big hack. It all depends on the local system, users are free to fiddle with it. With your proposal, and if someone wants to use –thinlto-distribute, we will generate LLVM_PARALLEL_LINK_JOBS * -j<N> jobs. How do you plan on mitigating that?
Along the same lines, how would this integrate with cargo.exe, Rust’s package manager? The overall promise is interesting (ie.distributing ThinLTO jobs) and I bet folks would like to use it there too. Cargo uses the make jobserver protocol to orchestrate its sub-child tasks. Could we not use that instead? Or as an alternative? I suppose that could solve the -j situation above, if the tokens are acquired prior to launching the distribution tool.

For a while now, I feel we’re missing a piece of the puzzle. This is broader than just ThinLTO distribution. In my ideal world, the build system/orchestrator (ie.ninja.exe, fbuild.exe) and the build tools (clang-cl.exe, lld-link.exe) would communicate through some message channel by using some standard protocol (like LSP). The build orchestrator would support dynamic actions sent through this channel. That way, the orchestrator would have a coherent view or the jobs, and could appropriately schedule jobs locally, load-balance then, distribute them, cache them, etc.

Overall in this RFC I like the “discovery” part which is domain-specific and couldn’t be done outside by the build system. But I’m torn about that “execution” part which seems again like a mitigation, like LLVM_PARALLEL_LINK_JOBS. However I understand the short-term desire to have something practical running with distribution.

I’m looking forward to your patch!

kromanova · March 30, 2023, 7:35pm

I’m not sure I completely understand your question and what you are proposing to do, so it’s not likely that I will give you a satisfactory answer. It might be better if you provide an example of what post-link tool do you have in mind and why it might be useful.

We do invoke thin-link from within the linker to create a list of dependencies and dynamically generate Makefile.

kromanova · March 30, 2023, 7:39pm

Hi Alexandre,
You asked a lot of good questions. I don’t want to give you premature response. Let me think about it and make some experiments before providing you with answers.

efriedma-quic · March 30, 2023, 8:53pm

The way this proposal is written is sort of confusing… if I’m understanding correctly:

You want to distribute the computation involved in ThinLTO.
To make this work, you need to dispatch “ThinLTO jobs” to other computers, which convert a bitcode file into an ELF (or other native) object file.
To avoid transmitting regular archives to remote nodes, you need to extract individual bitcode files from archives, and transmit them. (I’m not sure why thin archives are involved here; from what I can tell, the thin archives aren’t actually transmitted anywhere?)
Existing distributed build systems don’t have a native interface to dynamically dispatch sub-jobs, and they don’t have reliable error handling, so you’ve built an abstraction layer to implement those features on top of those systems.

tschuett · March 31, 2023, 6:37am

As noted above, build systems prefer immutable dependency graphs. Google brought ThinLTO for distributed build systems and the advertisement was cross-tu inlining.

I have one billion translation units, compile them to bitcode and pass them to the linker for the thin-link.

Case a:
The linker decides translation unit A is not going to inline anything from other files. Ship TU to a node and compile to binary.

Case b:
The linker decides TU a is going to inline functions from all other TUs. You have to ship TU a and one billion -1 files to the node that is going to process TU a.

This decision is made by the linker depending on input. Apparently Bazel is more powerful and can prune the dependency graph.

The proposed approach seems kind of hacky. In the first phase I run ninja and then the linker invokes make for the second phase?

Distributed build systems are complicated enough. You should teach LLD GRPC and have the linker talk to the build system.

kromanova · March 31, 2023, 11:51am

Yes

Thin archives are not actually transmitted, only individual thin archive members. Thin archives themselves are not very important. All we need is to be able to access archive members as individual bitcode files.

I’m not 100% sure that I understand your statement, but I will try to comment on it anyway.

Many existing distributed build system or distribution systems do have an interface to dynamically dispatch sub-jobs.
Most of distributed build systems has some kind of error handling, but it’s might not able to handle all kind software/hardware and network errors. If for a particular job, the distributed build system failed to produce an expected result, we do this job locally.
In some sense you could say that we built an abstraction layer to implement those features on top of those distributive build systems. But this layer is not implemented within a distributive build system, it’s implemented within the linker.

kromanova · April 1, 2023, 2:11am

You could have any kind of build scripts (for example: ninja, makefile, fastbuild, MSbuild, etc) and the linker could be invoked from any of these build scripts. For the second phase the linker could invoke distribution system client directly (e.g. Icecream, Distcc, SN-DBS) or the it could generate another build script and invoke the client indirectly through make, ninja, MSBuild, etc.

kromanova · April 3, 2023, 11:00am

This should work for any host and target that llvm/clang support (including Windows of course).

Currently we support Windows as a host platform for our implementation of integrated DTLTO approach because our Playstation toolchain is Windows-hosted and our distribution system SN-DBS is Windows-hosted as well. But we also have a working prototype DTLTO hosted on Linux and integrated with Icecream.

You completely understood the idea behind our integrated DTLTO approach! Everything you said is exactly right. Invoking a distribution tool or a build system with a build script with exact dependencies known is the way to overcome limitation of these build systems that don’t support dynamic actions.

I’m glad to hear it.

Do you know if FASTBuild can properly handle compiler command line containing bitcode file as input?
Do you know if FASTbuild will do load balancing after we dynamically start a new project (when we invoke a new instance of FASTBuild) in the middle of execution of the current project?

If this is the case, supporting FASTBuild as a distributive build system should be a relatively easy task.

Good idea. This won’t be too difficult to implement.

In our implementation of DTLTO approach integrated with SN-DBS (hosted on Windows) CTRL-C, redirection of stdout, stderr as well as summary reports are supported.
In the implementation of DTLTO approach integrated with Icecream (hosted on Unix) only few of these things are supported (e.g. CTRL-C), but remember we only implemented a proof of concept so far. All of these things are reasonably easy to implement (depending on the platform of course).

MaskRay · April 4, 2023, 4:08am

For dynamic dependencies, you can leverage the existing --thinlto-emit-imports-files option. In the following example at the end of this reply, if --thinlto-emit-imports-files is specified, ld.lld will create import files lto/[abc].o.imports.

lto/a.o.imports lists files from which compiling a.o will import.
lto/c.o.imports will be empty: the build system does not need to know whether a lazy LLVM bitcode file is extracted or not.

You can post-process *.imports to get Makefile fragments. It’s certainly less convenient than having lld output Makefile fragments itself, but I don’t see a good justification for a redundant output interface.

I don’t understand the (regular archive to thin archive conversion) part as well.

Archives are unsupported because it’s unclear where to emit the generated files for an archive member and clang -fthinlto-index= doesn’t support taking an archive member as input. See https://discourse.llvm.org/t/running-distributed-thinlto-without-thin-archives/52261.

Distributed ThinLTO example

Let’s say we want to compile a.c, b.c, and c.c with LTO and link them with two ELF relocatable files elf0.o and elf1.o.
We link LLVM bitcode files b.o and c.o as lazy files, which have archive semantics (surrounded by --start-lib and --end-lib).

echo 'int bb(); int main() { return bb(); }' > a.c
echo 'int elf0(); int bb() { return elf0(); }' > b.c
echo 'int cc() { return 0; }' > c.c
echo 'int elf0() { return 0; }' > elf0.c && clang -c elf0.c
echo '' > elf1.c && clang -c elf1.c

clang -c -O2 -flto=thin a.c b.c c.c
clang -flto=thin -fuse-ld=lld -Wl,--thinlto-index-only=a.rsp,--thinlto-emit-imports-files -Wl,--thinlto-prefix-replace=';lto/' elf0.o a.o -Wl,--start-lib b.o c.o -Wl,--end-lib elf1.o
clang -c -O2 -fthinlto-index=lto/a.o.thinlto.bc a.o -o lto/a.o
clang -c -O2 -fthinlto-index=lto/b.o.thinlto.bc b.o -o lto/b.o
clang -c -O2 -fthinlto-index=lto/c.o.thinlto.bc c.o -o lto/c.o
clang -fuse-ld=lld @a.rsp elf0.o elf1.o  # a.rsp contains lto/a.o and lto/b.o

kromanova · April 4, 2023, 10:47am

Everything that you wrote in your example looks right. And all of it works well for a small toy project that you gave as an example.

But what it will take you to enable Distributed ThinLTO (DTLTO) for a large existing project (let’s take LLVM/CLang as an example)? What if you are not even familiar with a project and you are not a buildscript guru? How you as a user or a Makefile/buildscript owner will do it? How many days or months do you think it will take you to rewrite LLVM/Clang build scripts and enable DTLTO for LLVM framework?

With our project we could enable Distributed ThinLTO for LLVM (or for any other large scale software project) in a few seconds. All it would take is to add one flag, namely --thinlto-distribute to LLD_FLAGS - and the Distributed ThinLTO will happen. That’s exactly the idea behind our project - easy adaptation of DTLTO for existing software projects.

The linker will orchestrate everything that needs to be done to enable DTLTO, including handling archives. The user or the buildscript owner doesn’t even need to know how DTLTO works.

Note: we have this project in production for Sony. Our users add one option to the linker command line and their project previously built with regular ThinLTO turns into Distributed ThinTLO, which helps them dramatically reduce link time.

Hopefully this clarifies the goals of our project.

kromanova · April 4, 2023, 7:25pm

Hello ALexandre,
I couple of days ago I have replied to most of of your questions, except the last two.

I will explain how it will work for SN-DBS. I don’t know much about FASTBuild, but I would expect that it will work in a similar fashion.

In the case of SN-DBS, lld will spawn an instance of dbsbuild and pass it jobs to be executed, described in a JSON input file. dbsbuild coordinates with a background service and a networked broker (orchestrator) to ensure jobs are executed, with load balancing, across local and remote cores. The other jobs in the build (such as regular compile jobs) also go through SN-DBS and are subject to load-balancing, also. This avoids problems of over-subscription.

So with SN-DBS we don’t have the problem that you are worried about. Whether FASTBuild is doing load-balancing or not, I don’t know.

I don’t know anything about Rust package manager and this is something that we didn’t need for Sony.

dblaikie · April 4, 2023, 11:01pm

how bad would it be if this feature were implemented as a wrapper script (in tree or out of tree) around the various invocations of explicit thinlto (such a wrapper could, I suppose, even distribute the thin link step too, if it wanted to (though I guess without build system integration it’d lack the benefit of using the separate thin summary files, it’d just have to consume the unseparated summaries in the .o files) - then distribute the various backend compile actions), rather than as a more deeply integrated feature of implicit thinlto?

kromanova · April 5, 2023, 9:15am

Hi David,
I’m not sure that I completely understand your proposal. Let me reiterate. Please correct me if I’m wrong.

You propose to create a wrapper and move the functionality of the linker that is responsible for the DTLTO to this wrapper.

That’s what this wrapper should do:

Parse all the options that the linker supports.
Determine which set of files need to participate in the DTLTO. This only can be done within the linker after the scan phase bas been performed.
Convert regular archives into thin archives (because sending regular archives to the remote nodes instead of minimal set of required bitcode files it will take a lot of time).
Do Thin Link step
Do codegen (in distributive fashion)
Perform a native link.

Did I understand your proposal right? What are the advantages of your proposal?

The only advantage I see is that we can move some of our code (approximately 300 LOC) out of the linker to the wrapper. Note: for our project we didn’t add too much code to the linker, majority of the code has been added to LLVM side, and this will not change whether we will leave the DTLTO functionality in the linker or move it to the wrapper.

I see the following disadvantages/problems with this proposal.
(1) It will not be as easy for some of the existing software projects to replace linker with the wrapper. At least it will not be as easy as adding one additional option to the linker command line.
(2) We need to add a new functionality to the linker described in step (2)
(3) There linker scan phase will performed twice (on step 2 and step 6), instead of bring performed once.
(4) There will be option-handling code in the wrapper described in step 1 which will be do the same thing as option-handling code in the linker is doing. It’s not clear how this will be synchronized between the linker and the wrapper (example: if a new option is added or changed in the linker, the same thing needs to be done in the wrapper).

aganea · April 5, 2023, 1:54pm

Thanks for the response @kromanova.

Both SN-DBS and Icecream have a local daemon/agent that will handle the load balancing and remote distribution. Therefore it does not matter when and where, in the build process hierarchy, the dbsbuild or icecc command is called. Other build systems like FASTBuild or ninja or Rust cargo don’t have that. For example, the fbuild.exe process will itself handle the local load balancing and also schedule directly jobs to the remote FASTBuild agents. Like I was mentioning, if we would like to implement DTLTO as you propose for those build systems, and properly handle load balancing, some kind of channel of communication would be needed (between the child fbuild.exe instance that lld-link.exe would spawn, and the topmost fbuild.exe instance). We wouldn’t need that if only one .exe/.elf is linked, like for a game project. But if there are several, or many executables, linked at the same time like when building LLVM or Rust projects, that won’t work.

Are you able to upload the patch so we can continue the discussion in practical terms?

@MaskRay I am also wondering how ThinLTO distribution is handled for large projects like Chromium, with Goma? Are you able to point to the (Bazel?) build scripts that handle that?

christylee · April 5, 2023, 10:23pm

Hello @kromanova, thanks for the RFC!

We integrated Distributed ThinLTO with the Buck2 build system (buck2/dist_lto.bzl at main · facebook/buck2 · GitHub) at Meta last year, so distributed thinlto enablement is definitely a topic near and dear to my heart. Buck2 first schedule the thin link and call --thinlto-emit-imports-files to obtain the dynamic dependencies, it then dispatch the dynamic dependencies to a remote executor which load balances remote workers for LTO. Finally, the post-optimized bitcode is sent back to the local machine for final link step.

The benefit of integrating thinlto directly into the build system is that any any build targets, no matter how complex they are, can leverage distributed thinlto with no extra work required. In fact, we don’t need to anything about the project nor the build scripts to make it work, all we needed was the build graph. Is it possible to integrate directly dynamic dispatch directly into your build system?

Conceptually the linker should be transparent to the build environment, just like the compiler, and vice versa. My worry is that linkers do not have knowledge of the underlying build system and won’t be able to leverage build system characteristics, e.g. caching. How would this patch look like for build systems that do not use Makefiles?

kromanova · May 3, 2023, 9:39am

Sorry for the late reply, I was on vacation for a few weeks without internet access.

Blockquote
We integrated Distributed ThinLTO with the Buck2 build system (buck2/dist_lto.bzl at main · facebook/buck2 · GitHub ) at Meta last year, so distributed thinlto enablement is definitely a topic near and dear to my heart. Buck2 first schedule the thin link and call --thinlto-emit-imports-files to obtain the dynamic dependencies, it then dispatch the dynamic dependencies to a remote executor which load balances remote workers for LTO. Finally, the post-optimized bitcode is sent back to the local machine for final link step.
Blockquote

We are doing exactly the same thing but within the linker.

Blockquote
Is it possible to integrate dynamic dispatch directly into your build system?
Blockquote

The problem is that our clients (game studios) do not have a specific build system. Most of them use CMAKE, which generates projects for MSBUILD. However, some of them generate projects for ninja or make. But potentially our clients could use any build system that they need.

So, the life is simple for the developers who are using Buck or Bazel and want to enable DTLTO. But can you actually estimate percentage of projects who use Bazil or Buck as their build system? I suspect it’s not a large percent of all the existing software projects. We are trying allow DTLTO adoption for the projects that are not using Buck and Bazel and we are doing in such a way, so that it will be very simple - just by adding one command option to the linker!

Blockquote
Conceptually the linker should be transparent to the build environment, just like the compiler, and vice versa.
Blockquote

I totally agree with this statement, but at the same time giving the linker knowledge about a specific common build environment (or several of them) will allow developers who use these build environment to enable DTLTO very easily. Of course, the developers who want to use DTLTO could rewrite their build scripts on Buck or Bazel, but it’s not easy task and require a lot of time and knowledge, especially with huge projects.

Blockquote
My worry is that linkers do not have knowledge of the underlying build system and won’t be able to leverage build system characteristics, e.g. caching.
Blockquote

I will reply specifically about caching, but if you worry about something else, feel free to ask.

We support caching in the linker for the DTLTO and we will achieve a better performance result compared to caching supported by distribution system, because we use internal knowledge about the bitcode files being cached and calculate cache entry key much faster. We had compared our DTLTO caching with the caching provided by SN-DBS (Sony’s proprietary distribution system), and our DTLTO caching time is 5x faster.

Blockquote
How would this patch look like for build systems that do not use Makefiles?
Blockquote

I think, by giving an example in the RFC with the makefiles, I gave an impression that we can
only support makefiles. I only did it for the same of example, because makefiles are common and everyone is familiar with them.

For our Sony’s internal implementation of DTLTO we don’t generate makefiles. Our distribution system takes JSON file with the list of execution jobs as an input.

So, the linker prepares a list of codegen jobs that needs to be executed in distributed fashion, writes it to the JSON file and invokes SN-DBS with this file as an input. SN-DBS take care of the load balancing, but as I mentioned earlier, we actually disable SN-DBS caching, because our DTLTO cache is superior.

Topic		Replies	Views
A pitch for future RFC that proposes Integrated Distributed ThinLTO concept LLD lto	9	1076	March 30, 2023
RFC: ThinLTO Impementation Plan LLVM Dev List Archives	70	473	July 21, 2015
Updated RFC: ThinLTO Implementation Plan LLVM Dev List Archives	28	425	August 21, 2015
# [RFC] Distributed ThinLTO Build for Kernel IR & Optimizations thinlto , llvm	7	899	April 22, 2025
distributed thinlto usage LLVM Dev List Archives	8	217	January 11, 2019

[RFC] Integrated Distributed ThinLTO

Distributed ThinLTO example

Related topics