Question about thinLTO

Christudasan_D · July 12, 2017, 4:56pm

Hello,

My impression on thinLTO when I first heard of it, (EuroLLVM2015) was about achieving Cross Module Optimization (CMO) at the IR level.

Having parallel front-end compilation & initial optimization first, a thin-link of individual input units, more optimization by calling opt again on the combined IR, and finally the target codegen using llc.

A transformation similar to the following:

Input File 1: Clang+opt (with thinLTO)

Input File 2: Clang+opt (with thinLTO) — llvm-link ---- opt (for CMO) — llc for target codegen.

……

Input File n: Clang+opt (with thinLTO)

But from the presentation on LLVM Developers’ Meeting 2016, I believe thinLTO is more than that. The full advantage of this optimization should require a significant changes in the backend as well (I suppose).

Before I post my question, kudos to the entire team behind thinLTO optimization - Teresa Johnson, Mehdi Amni, Xinliang David Li, other developers and test engineers across the globe.

I am working on the compiler for a target wherein the code size improvement is a critical factor. We are still using LLVM 3.5 code base. We assume, by moving to LLVM4.0.0 (with thinLTO and other recent target independent optimizations) we would be able to improve the codegen to a considerable number.

With thinLTO in LLVM4.0.0 compiler, when we build an application with multiple compilation units, is it possible to achieve any benefit purely with LLVM IR passes (without really involving the compiler backend)?

If yes, can anyone provide me the information about the command-line options and the sequence to call the llvm components (clang, opt, etc.) to achieve it.

I truly value any input in this regard.

Regards,

Christu

teresajohnson · July 12, 2017, 5:19pm

Hi Christu,

Thanks for the note!

teresajohnson · July 12, 2017, 5:25pm

Hi Christu,

Thanks for the note!

Hello,

My impression on *thinLTO* when I first heard of it, (EuroLLVM2015) was
about achieving Cross Module Optimization (CMO) at the IR level.

Having parallel front-end compilation & initial optimization first, a
thin-link of individual input units, more optimization by calling opt again
on the combined IR, and finally the target codegen using llc.

A transformation similar to the following:

Input File 1: Clang+opt (with thinLTO)

Input File 2: Clang+opt (with thinLTO) ---
llvm-link ---- opt (for CMO) --- llc for
target codegen.

…..

Input File n: Clang+opt (with thinLTO)

But from the presentation on LLVM Developers’ Meeting 2016, I believe
thinLTO is more than that. The full advantage of this optimization should
require a significant changes in the backend as well (I suppose).

Right, the model in the EuroLLVM talk was just an initial prototype that
used llvm-link/opt, and yes now we do whole program optimizations during
the thin link, beyond just linking in additional IR for inlining etc.

Before I post my question, kudos to the entire team behind thinLTO
optimization - Teresa Johnson, Mehdi Amni, Xinliang David Li, other
developers and test engineers across the globe.

I am working on the compiler for a target wherein the code size
improvement is a critical factor. We are still using LLVM 3.5 code base. We
assume, by moving to LLVM4.0.0 (with thinLTO and other recent target
independent optimizations) we would be able to improve the codegen to a
considerable number.

With thinLTO in LLVM4.0.0 compiler, when we build an application with
multiple compilation units, is it possible to achieve any benefit purely
with LLVM IR passes (without really involving the compiler backend)?

If yes, can anyone provide me the information about the command-line
options and the sequence to call the llvm components (clang, opt, etc.) to
achieve it.

Do you just want the bitcode out after all optimization passes and before
codegen? It is doable with llvm-lto -thinlto-action=run -save-temps

Sorry, that last option should be -thinlto-save-temps=foo, where "foo" will
be the prefix of the generate temp files (can include a path). You want the
foo*.opt.bc files for the output of the opt pipeline in the backends.

I believe (although you will get more bitcode output files than you want,

Christudasan_D · July 13, 2017, 9:54am

Thank you Teresa.

Yes, I would like to save the IR (*.bc and/or *.ll) after all optimizations (especially thinLTO) are done and call llc separately.
Is there any specific document available online to see more about this feature and various command-line switches that a compiler developer can take advantage of?
It would help us to enable this feature for a custom architecture.

Regards,
Christu

teresajohnson · July 13, 2017, 2:19pm

Thank you Teresa.

Yes, I would like to save the IR (*.bc and/or *.ll) after all
optimizations (especially thinLTO) are done and call *llc* separately.
Is there any specific document available online to see more about this
feature and various command-line switches that a compiler developer can
take advantage of?
It would help us to enable this feature for a custom architecture.

Unfortunately I don't see any documentation online of llvm-lto. I guess the
best bet for now is to look at llvm-lto.cpp, or some of the ThinLTO tests
in the tree that use it with those options. I assume you plan to use the
internal tools just for testing, and eventually hook up your own code
generator to the compiler directly? Which linker do you use? Both gold and
lld have support for ThinLTO and a newer LTO interface that uses linker
resolution information.

Teresa

Christudasan_D · July 13, 2017, 3:37pm

Hi Teresa,

Yes, we plan to have our code at CG directly.

We use our own linker. That’s the pain. We might only get a partial benefit of thinLTO which occurs at compile time.

I will have a close look at the LTO-tests for any useful command-line option.

Thanks for the quick response.

Regards,
Christu

teresajohnson · July 13, 2017, 3:45pm

Hi Teresa,

Yes, we plan to have our code at CG directly.
We use our own linker. That's the pain. We might only get a partial
benefit of thinLTO which occurs at compile time.

There is no compile-time only benefit of ThinLTO. You'll need the linker to
interface with the LTO API for either ThinLTO or LTO to work. Unless you
use internal tools to get native objects from ThinLTO, and feed those to
your linker. But that is not a supported model the internal tools are just
developer tools. Or you could use gold or lld but have them save the temps
files which will give you the native .o files after ThinLTO, and feed those
through your linker. But in either case (llvm-lto or gold/lld), you may not
get the same symbol resolution as with your own linker, which would be a
problem.

Do you support regular LTO right now? If so, how do you do get that to work
with your linker?

Teresa

I will have a close look at the LTO-tests for any useful command-line

Christudasan_D · July 13, 2017, 4:38pm

I understand the potential risk of having thinLTO in our model now. No, we don’t support LTO with our custom linker. We are moving to gold soon, it is under progress. I hope this feature can be enabled at that point of time.

Thanks,
Christu

Tobias_Edler_Von_Koc · July 13, 2017, 8:34pm

Christu,

Just a heads-up: if code size is your main concern, you'll probably see better results with Regular LTO (if that's possible in your scenario). ThinLTO's main optimization, for now, is cross-module inlining. While this is great for performance, it's quite possible that you'll actually see an increase in code size.

Tobias

Christudasan_D · July 14, 2017, 3:52am

Hi Tobias,

Unlike a regular processor, calling conventions in our architecture is quite different. It introduces extra instructions at call-site & callee’s prologue. In an average, a call occupies 7-10 extra instructions in the code memory (I am not allowed to talk more about it). For that reason, inlining a function would sometimes bring us better code even for a higher inline-threshold.

Like you mentioned, cross-module inlining is the main transformation with thinLTO. Though it increases the code size after inlining, it also opens up the opportunities to optimize the combined code with regular IR passes (const and copy propagation, dce, cse, etc…) and there by reducing the code size further. That was the key motivation for us to think about this optimization.

We have a simplified LTO (remove dead functions in the linked binary) implemented in the custom linker now. But that won’t be sufficient to address the linking challenges thinLTO require.

Thanks,
Christu

Topic		Replies	Views
ThinLTO: passing TargetOptions to LLVMgold.so LLVM Dev List Archives	4	71	October 6, 2016
Does -flto generate optimized or unoptimized LLVM IR? IR & Optimizations lto	2	528	February 17, 2022
Is it possible to output the optimizations performed by thinlto IR & Optimizations lto , thinlto	1	180	November 29, 2023
LTO query LLVM Dev List Archives	6	185	May 11, 2018
How to use distributed thin lto？ IR & Optimizations lto , thinlto	13	1015	November 16, 2023

Question about thinLTO

Related topics