Link-Time Optimization (LTO) has been supported in LLVM for a long time, and for most user level applications, it functions well. However, LTO has many deficiencies with respect to linker semantics, which show up more frequently in kernel or embedded code bases. For now, we will only focus on ELF linking, and how LTO breaks many of the subtle but load bearing cases for ELF. All of the issues we discuss here affect Full LTO, though several also impact Thin LTO.
In this post we will attempt to highlight several cases where things break down, many of which are well known to the community. When possible, we’ve attributed related issues in the GitHub issue tracker, and tried to provide some concrete examples here for discussion. Our goal is to bring this issue to the broader community, and hopefully agree on the next steps in the process for addressing these problems. In a few cases, there has already been significant discussion or RFCs on how to address the issue. We’d like to avoid rehashing all of those here, but also don’t want to ignore the issues, since they are at least somewhat related and in some cases may have the same root cause. We’ve specifically called out issues where we think a dedicated discussion or new RFC is more appropriate, and hope that we can delay a detailed discussion of those particular issues until then.
For a few cases, we’ve outlined some steps we think may be able to address the problem, and are providing details about our rationale. Based on feedback here, we plan to make dedicated RFCs for these as appropriate, but given the degree of overlap between these problems, it seems prudent to have a broader discussion about some of our design and implementation decisions regarding LTO.
Problems
Incomplete IR Symbol Table
Problem 1. Because the compiler can introduce new symbol references (e.g. function calls) through libcalls or other optimizations (e.g. replacing a call to memcmp with a call to bcmp), it is possible for the compiler to delete a function through DCE only to later reintroduce a call to that function. This happens more frequently when things like libc or compiler-rt are provided as bitcode, since they frequently supply symbols used for libcalls or which the compiler assumes knowledge. Similar problems may arise if something in the ABI is changed, for example during partial inlining, because something was misclassified as safe to internalize, for example due to something lazily brought into the link. A similar problem can also happen when the main binary is providing a definition for something brought in lazily from an archive, but that definition was DCEd.
An example of this can be found in Lazy files extracted post LTO compilation might reference other lazy bitcode files, leading to incorrect absolute defined symbols · Issue #127284 · llvm/llvm-project · GitHub and some related tests in [ELF][LTO] Add baseline test for invalid relocations against runtime calls by arichardson · Pull Request #127286 · llvm/llvm-project · GitHub.
As another example, consider the case where your libc is participating in LTO. This means that symbols, like memcmp and bcmp are being provided as bitcode. If during the link all the calls to bcmp are inlined, and bcmp is marked as internal, then Global DCE will remove bcmp’s definition. This is the right thing to do, except that later passes, like Instcombine can transform a call to memcmp into a call to bcmp, which has no way of being satisfied any more, since its definition has been deleted. Here, neither memcmp nor bcmp are libcalls, but are instead classified as Lib_Func, which means they are not treated the same way as other functions we could emit calls to later in the optimization pipeline.
Problem 2. Any libcall in an bitcode archive is eagerly extracted and code-generated thereby breaking traditional archive semantics, because symbols that should never participate in the link are now brought in. This can lead to undefined symbols being brought into shared libraries for example.
A concrete example of this is when building a shared library from a single TU with no unresolved symbol references (e.g. the TU provides all definitions required itself, which could be a single function). However, libunwind is one of the linker inputs, provided as a bitcode archive via –as-needed -lunwind –no-as-needed, which means it should only participate in the link if required. In the course of normal ELF linking, the archive is never consulted, since there are no unresolved symbols needed by the shared library.. However, we found that when libunwind is bitcode, libcalls like _UnwindResume are eagerly extracted from the archive, which in turn brings in several other symbols, none of which were needed by the original library, and which should not have participated in the link based on normal ELF archive semantics.
A related problem was brought up in [RFC][LTO] Handling math libcalls with LTO, where Joseph Huber outlines issues they run into using LTO on the GPU, since all math functions from libm are considered libcalls. This causes code generation for a significant number of functions that are never used/referenced, and they often survive into the final binary. Since the libcall is introduced late in codegen, they also do not get a chance to be inlined.
Problem 3. Inline assembly can generate symbols that won’t be in the LTO symbol table. Obviously, this is an issue since the compiler/linker could mistakenly assume that the symbol reference doesn’t exist. There are at least a few instances of this issue: LTO scan of module-level inline assembly does not respect CPU · Issue #67698 · llvm/llvm-project · GitHub and Incorrect branch targets in RISC-V executables built with LTO · Issue #65090 · llvm/llvm-project · GitHub
Target Features and ABI
Problem 4. Module level inline assembly is parsed to search for symbol references, but the correct target features may not be used.
Under LTO, there isn’t a very good mechanism of ensuring target features are used correctly at link time. This has been problematic for RISC-V, which has made more use of these features. Patches like https://github.com/llvm/llvm-project/pull/73721, [RISCV] Use 'riscv-isa' module flag to set ELF flags and attributes. by topperc · Pull Request #85155 · llvm/llvm-project · GitHub, https://github.com/llvm/llvm-project/pull/100833, and https://github.com/llvm/llvm-project/pull/73721 are good examples ways we’ve tried to address this issue.
While a lot of these problems show up in RISC-V, they are potentially an issue for all target architectures.
Module Boundaries
Problem 5. Full LTO breaks file globbing rules in linker scripts, since the merged module is only operated on via a monolithic LTO.o file, while the globbing rules are written to reference the original object files. This can be worked around via section attributes, but it is not always feasible to migrate a codebase to use these attributes. Previous RFCs on the topic: [RFC] (Thin)LTO with Linker Scripts and LTO with Linker Scripts.
Potential Solutions
Fixing the Symbol Table
Problems 1-3 can be considered to have the same root cause: the symbol table is not fixed when performing LTO, but LTO code generation needs to act as though it is fixed. To address the underlying problem here, we could modify how both the compiler and linker behave w.r.t. emitting new references to symbols and when it is safe to remove them. To begin, LLVM already has a method for marking functions to be preserved via libcalls. This should be leveraged and used as the only source of truth for emitting new function calls to code outside of the module, and should probably be extended to include things that could be emitted as an optimization, such as replacing a call to memcpy with a call to bcmp. This is an important detail if we want to avoid incorrectly removing function definitions.
We can use this conservative list of APIs to preserve these symbols, thereby ensuring that they won’t be removed if a later libcall references them. This would also require linker support, since bringing in a TU due to a libcall may transitively cause more symbols to be referenced. The linker would need to supply a list of all symbols that might become referenced as a consequence of a libcall being emitted. There is a potential here to scan each function and annotate it with a conservative list of any callees that could be introduced.
Linking with LTO can continue without modification until we reach the linking phase that could bring in new bitcode via new libcalls, or via lazy evaluation from archive members. At this point any new bitcode object being brought into the link would be compiled as a relocatable link ( ld -r) which will generate a new .o for the entire TU being brought in from the archive. After the new objects are generated, linking can continue normally, as we’ve completed all potential code generation. This approach may lose out on some performance, through missed inlining, but should be safe and fairly straightforward to implement in LLD.
While we’ve thought a great deal about how this works for ELF, we are not sure if this should/would apply to other object file formats, or needs to. Our intuition is that these are still problems regardless of the target, but we are unsure if that is an accurate assessment, given that there are many subtle details to ELF linking that would not hold true for COFF or Mach-O.
Module level assembly
Problem 4 requires a different approach that allows the module level flags and features to be used with the inline assembly. In RISC-V, we’ve run into similar problems with LTO builds not getting the correct target features. Today we have an ad-hoc solution that plumbs the arch string to the Target machine, so that it can be instantiated with the same flags as the surrounding module. We can either follow suit for other architectures, or we can design a new mechanism. @jrtc27 has mentioned that they have ideas in this area, and I don’t have a good sense for what a good long term solution would look like, so feedback here is most welcome.
We should note that https://github.com/llvm/llvm-project/pull/100833 may help, but it’s unclear if module level ASM will be/can be addressed this way and others have suggested it could be solved by moving the asm parsing logic into frontends https://discourse.llvm.org/t/rfc-target-cpu-and-features-for-module-level-inline-assembly/74713/2.
Linker scripts
Addressing the lack of LTO support for file name globbing in linker scripts requires more extensive changes to the compiler and/or linker, and we’re considering addressing those details here to be out of scope for this discussion. This is an issue our team has considered for quite some time to better support our embedded users and is sufficiently complex as to require its own RFC and dedicated discussion, especially given that there are at least two previous RFCs trying to address this specific problem ([RFC] (Thin)LTO with Linker Scripts and LTO with Linker Scripts).
Related GitHub issues/erata
LTO plugin uses wrong ABI for LTO objects on riscv · Issue #50591 · llvm/llvm-project · GitHub