Clarifying the semantics of LTO post-link optimization

A recent PR triggered some test failures, leading to this discussion about the semantics of LTO post-link optimization. Let’s demonstrate this with a simple example. Suppose we have three source files: a.cpp, b.cpp, and c.cpp. We want to take advantage of LTO. First, we compile each of them individually into bitcode: a.bc, b.bc, and c.bc. At link time, instead of linking all three .bc files at once, we first link a.bc and b.bc, running the full LTO pipeline (including the post-link optimization pipeline), and emit a bitcode file ab.bc. Next, we link ab.bc with c.bc, running the full LTO pipeline, including post-link optimization once again. The question here is: Is this a valid usage of LTO?

A more general question from this scenario is: What assumptions can a pass running in the LTO post-link optimization pipeline reasonably make? Specifically, can such a pass assume the availability of the IR from the entire program? I understand the existence of the lto-whole-program-visibility flag, but my understanding is that it is primarily for ABI-level interactions (such as those with external libraries), rather than IR-level assumptions. If no such assumptions can be made, how is running the LTO pipeline meaningfully different from simply linking bitcode files and subsequently running the regular optimization pipeline?

Here is additional context and motivation for this discussion: A recent module optimization pass adds a module flag after it runs, to make sure it doesn’t run multiple times. This pass was inserted into the full LTO post-link pipeline via registerFullLinkTimeOptimizationLastEPCallback. (Whether this pass could be modified to allow multiple executions is outside the scope of this discussion.) There is a case where it first compile source files individually to bitcode, linked together, optimized using the full LTO pipeline, generating a bitcode library. Later, this bitcode library is linked with other user code, also in bitcode format compiled with LTO enabled. However, because the previously added module flag is also linked into the final module during this second link, the optimization pass in question no longer runs, leading to test failures.

I’d appreciate feedback on the validity of these assumptions, as well as clarity regarding the expected semantics of the LTO post-link optimization pipeline.

1 Like

manual bump :slight_smile:

Let me give a try to an answer, even though it is hard to answer clearly to your question, because “link-time optimization” is very coupled to the linking process itself.
There is a dance happening between the linker and the compiler to set the right expectation in place.

From an LLVM IR point of view, there is in general no notion of “LTO” or semantics of “before” and “after”. There should be a valid, self-contained, description of a program according to LangRef.
This is the most consistent answer to your description matching

At link time, instead of linking all three .bc files at once, we first link a.bc and b.bc , running the full LTO pipeline (including the post-link optimization pipeline),

Here this is the traditional LTO way of just merging the IR to increase the scope that the optimization passes can take on.

Now however, some LTO specific passes will go beyond what is in LangRef to take advantage of information that the linker can provide. One such example is the internalize pass which can takes a list of symbols the linker is asking to retain, and make everything else private (which benefits a bunch of other optimizations).

You won’t exercise this through “merging and running the LTO pipeline”, because the “LTO pipeline” isn’t “special”: it’s very similar to the regular optimization pipeline.

LLVM also provide an API defined in llvm/include/llvm/LTO so that a linker can control better the information to inject in LLVM that are only known from the linker itself (like providing symbols to internalize / retain).

There are also some other transformations that are a bit more tricky, like the “LTO visibility” mechanisms, which is based on a complicated setup of metadata and assumptions about the current state of the program, in particular from this documentation:

LTO visibility is a property of an entity that specifies whether it can be referenced from outside the current LTO unit. A linkage unit is a set of translation units linked together into an executable or DSO, and a linkage unit’s LTO unit is the subset of the linkage unit that is linked together using link-time optimization; in the case where LTO is not being used, the linkage unit’s LTO unit is empty. Each linkage unit has only a single LTO unit.

In this case there is a subtle way to consider what is a “LTO unit” and the fact that a particular set of input .bc will participate in “LTO” exactly once.
This is all setup very specifically for C++ rules and semantics, and is trying to take advantage of the higher-level language properties, with the added assumption of this concept of a “LTO unit”, to achieve devirtualization and similar C+±oriented optimizations.
I wouldn’t call these “LLVM LTO” in the most general sense, and the --lto-whole-program-visibility is a non-default lld option that translates to enabling:

  /// Asserts whether we can assume whole program visibility during the LTO
  /// link.
  bool HasWholeProgramVisibility = false;

in llvm/include/llvm/LTO/Config.h (which I find very underwhelmingly documented here by the way). This turns a whole set of things in the IR to help the optimizer before running it, with all the expected invariants of the linker-specific API defined in llvm/include/llvm/LTO.

A recent module optimization pass adds a module flag after it runs, to make sure it doesn’t run multiple times

This is a bit iffy to me, unless it is something like the global vtable analysis or something similar as part of the “HasWholeProgramVisibility” kind of expectations. But even then I’m not sure why a module flag is warranted (other than to detect invalid future module merges maybe?).

1 Like

Thanks for the detailed response. If I understand correctly, there’s generally nothing specific that can be assumed in the post-link pipeline beyond what the language reference guarantees by default. And if there is anything additional, it would need to be part of an explicit “agreement” between the high-level language and the linker, likely enabled via specific switches or options when invoking the linker.

As for the linking order, are you suggesting that a “reduction”-style linking, like (a + b) + c instead of a + b + c, is entirely valid?

Yes, as far as I know, it is something that LLVM support, you can merge LLVM module and re-optimize them over and over.

2 Likes