A recent PR triggered some test failures, leading to this discussion about the semantics of LTO post-link optimization. Let’s demonstrate this with a simple example. Suppose we have three source files: a.cpp, b.cpp, and c.cpp. We want to take advantage of LTO. First, we compile each of them individually into bitcode: a.bc, b.bc, and c.bc. At link time, instead of linking all three .bc files at once, we first link a.bc and b.bc, running the full LTO pipeline (including the post-link optimization pipeline), and emit a bitcode file ab.bc. Next, we link ab.bc with c.bc, running the full LTO pipeline, including post-link optimization once again. The question here is: Is this a valid usage of LTO?
A more general question from this scenario is: What assumptions can a pass running in the LTO post-link optimization pipeline reasonably make? Specifically, can such a pass assume the availability of the IR from the entire program? I understand the existence of the lto-whole-program-visibility flag, but my understanding is that it is primarily for ABI-level interactions (such as those with external libraries), rather than IR-level assumptions. If no such assumptions can be made, how is running the LTO pipeline meaningfully different from simply linking bitcode files and subsequently running the regular optimization pipeline?
Here is additional context and motivation for this discussion: A recent module optimization pass adds a module flag after it runs, to make sure it doesn’t run multiple times. This pass was inserted into the full LTO post-link pipeline via registerFullLinkTimeOptimizationLastEPCallback. (Whether this pass could be modified to allow multiple executions is outside the scope of this discussion.) There is a case where it first compile source files individually to bitcode, linked together, optimized using the full LTO pipeline, generating a bitcode library. Later, this bitcode library is linked with other user code, also in bitcode format compiled with LTO enabled. However, because the previously added module flag is also linked into the final module during this second link, the optimization pass in question no longer runs, leading to test failures.
I’d appreciate feedback on the validity of these assumptions, as well as clarity regarding the expected semantics of the LTO post-link optimization pipeline.