Summary
Currently we cannot ensure safe whole program class hierarchy analysis in the presence of native objects/libaries during the (Thin)LTO link, for whole program devirtualization (WPD) and future related optimizations, as native objects do not always contain symbols identifying overridden base classes. To address this issue we propose adding the necessary information to a special section in the native object (ELF initially) that will be communicated from lld to LTO for analysis. Specifically, for WPD this would be the ThinLTO bitcode summary records identifying the type metadata and vtables in the native object. In the future, this can be extended with additional information as needed to support whole program analysis beyond WPD.
Background
Both ThinLTO and regular LTO optionally perform whole program devirtualization (WPD) based on class hierarchy analysis. Regular LTO uses the IR, whereas ThinLTO uses summary information, and an additional hybrid LTO mode (with split LTO modules) uses a combination.
It is only safe to perform WPD when we know that the LTO unit contains the full class hierarchy for the class type containing the virtual call target of an indirect call. Otherwise, we may incorrectly devirtualize a call that could actually target an unseen virtual function override.
In LLVM we employ several mechanisms to detect situations where the entire class hierarchy may not be visible during the whole program LTO link, and avoid devirtualizing in those cases, as described below.
Hidden visibility class types
Whole program devirtualization is generally enabled for types with hidden LTO visibility. By default this is just for those types that have hidden symbol visibility (e.g. those declared in anonymous spaces). It is always safe to assume that these cannot be overridden outside the LTO unit.
This is also the case when compiling with -fvisibility=hidden, which asserts that all symbols should have hidden visibility. However, assertion that must be made carefully, when we know that the resulting IR object files will only be linked into a statically linked binary that does not link any shared libraries or other native objects that would need the symbols in the LTO unit or contain any overrides of its virtual functions.
Refining LTO visibility for statically linked binaries
For statically linked targets that do not link in shared libraries, other than e.g. system libraries that we know will not override user code types, it is generally safe to apply hidden visibility to all sources. However, in a build system where we may build multiple targets in a single build invocation, each C++ source file may feed multiple LTO links with varying visibility requirements.
To enable more aggressive use of hidden LTO visibility where appropriate, a mechanism was implemented to enable refinement of the LTO visibility on vtables at LTO link time under a new –lto-whole-program-visibility option. We use this mechanism to automatically enable WPD more aggressively for binaries linked statically (no shared libraries except for some system libraries).
By delaying the visibility refinement to hidden until the LTO link action, we can also take better advantage of build system caching, so that the bitcode objects generated by ThinLTO compiles can be shared by more LTO links across build invocations.
However, like -fvisibility=hidden, this is also an assertion, and must be applied only when appropriate.
Problem Description
Problems can arise if ThinLTO is disabled for any source files that may contain virtual function overrides. Occasionally it might be disabled for certain libraries or individual source files to work around a compiler or build system limitation, for example.
This can lead to incorrect devirtualizations if the –lto-whole-program-visibility LTO link option is applied, as shown in the following example:
base.h
class Base {
virtual int foo();
};
base.cc
#include "base.h"
int Base::foo() { return 1; }
int bar(Base *B) { return B->foo(); }
derived.cc
#include "base.h"
class Derived : public Base {
int foo() { return 2; }
};
int baz() {
Derived d;
return bar(&d);
}
main.cc
#include "base.h"
#include <stdio.h>
extern int baz();
int main() {
Base b;
printf("%d %d\n", bar(&b), baz());
}
# No LTO, correct output
$ clang++ *.cc -O
$ a.out
1 2
# ThinLTO+WPD all files, correct output
$ clang++ -c *.cc -flto=thin -fwhole-program-vtables -O
$ clang++ *.o -flto=thin -Wl,--lto-whole-program-visibility -fuse-ld=lld
$ a.out
1 2
# Recompile derived.cc without ThinLTO and link with ThinLTO objects
$ clang++ -c derived.cc -O
$ clang++ *.o -flto=thin -Wl,--lto-whole-program-visibility -fuse-ld=lld
# Incorrect output!
$ a.out
1 1
The reason for the incorrect output in the last case is that the thin link (ThinLTO whole program step) has no idea that class Base is overridden in derived.cc. Therefore we think there is a single implementation and incorrectly devirtualize the callsite in bar().
We cannot simply look at uses of base class vtables in native objects. The linker sees which symbols are referenced from native objects, and this information is already communicated to LTO which needs to preserve any defintions for uses outside of the IR being LTO linked. Unfortunately, it is not required that the vtable symbol be referenced from another object in order for that type’s virtual functions to be overridden. In the above example, derived.o does not contain any reference to Base’s vtable.
Even if the base class vtable is not referenced by the native object, by default the typeinfo symbol for the base class will be. However, when compiling with -fno-rtti, typeinfo symbols are not created, so we cannot rely on looking for references to those symbols in native objects.
For example, when compiling derived.cc by default (-frtti), the vtable for Derived (_ZTV7Derived) contains a reference to its typeinfo object _ZTI7Derived which in turn contains a reference to the typeinfo object of Base _ZTI4Base:
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr @_ZTI7Derived, ptr @_ZN7Derived3fooEv] }, comdat, align 8, !type !0, !type !1, !type !2, !type !3
@_ZTVN10__cxxabiv120__si_class_type_infoE = external global ptr
@_ZTS7Derived = linkonce_odr dso_local constant [9 x i8] c"7Derived\00", comdat, align 1
@_ZTI4Base = external constant ptr
@_ZTI7Derived = linkonce_odr dso_local constant { ptr, ptr, ptr } { ptr getelementptr inbounds (ptr, ptr @_ZTVN10__cxxabiv120__si_class_type_infoE, i64 2), ptr @_ZTS7Derived, ptr @_ZTI4Base }, comdat, align 8
When this is compiled to a native object, by default it therefore contains a reference to the typeinfo object of base class Base (_ZTI4Base):
$ nm derived.o c++filt
…
U typeinfo for Base
…
However, if we add -fno-rtti, this reference disappears both in the vtable of Derived, which now has a null pointer in that slot as shown below, as well as in the resulting native object.
@_ZTV7Derived = linkonce_odr dso_local unnamed_addr constant { [3 x ptr] } { [3 x ptr] [ptr null, ptr null, ptr @_ZN7Derived3fooEv] }
Solution Alternatives
At a high level, we want a way to perform WPD, and any future whole program class hierarchy analysis optimizations, as broadly as possible with automatic detection and handling of cases where it cannot be applied safely.
Suppressing --lto-whole-program-visibility
Ideally we want to apply –lto-whole-program-visibility automatically in the build system for all statically linked targets, to get the widest benefit from WPD. One solution is to try to detect when we are linking with native objects that might have virtual function overrides and automatically prevent use of this option. However, automatically detecting and preventing use of this linker option when LTO is disabled for source files with potential virtual function overrides is difficult and suboptimal for a few reasons.
First, while the linker invoked during this action does know what native objects are being linked, as described above it is generally always the case that a link involves at least some native libraries/objects as we have system libraries and an occasional assembly file compiled to native objects being linked. So disabling whole program visibility when linking any native code would shut off the WPD refinement of LTO visibility described earlier for all binaries. We would need to maintain an allowlist of native objects that don’t require disabling WPD, which is a brittle and manual solution.
Additionally, even if we could automatically detect that LTO has been disabled for source files containing potential overrides, suppressing --lto-whole-program-visibility on all targets linking that source file would essentially completely disable WPD for all of those binaries (except on statically hidden visibility classes as described earlier), even though only a single virtual function may be impacted. This can result in unexpected or undesirable performance impacts.
Advantages:
- If technically feasible, would prevent correctness issues by disallowing WPD (and any future optimizations) on binaries linking native code that breaks whole program visibility assumptions
Disadvantages:
- Brittle as this would require some kind of allowlist for non-problematic native objects
- Large indescriminate hammer: disables most WPD for all dependent binaries, leading to unexpected/negative performance impacts
Speculative Devirtualization
One possibility is to perform all WPD as a speculative devirtualization (i.e. what we do for profile-based Indirect Call Promotion). Specifically, instead of transforming a virtual call unconditionally to the devirtualized direct call, turn it into a check of the virtual function pointer against the expected direct call target, falling back to the original indirect call in case of a mismatch.
Unfortunately, the performance overhead vs devirtualization without the speculative check was noticeable when this solution was tested.
Advantages:
- Foolproof solution to this and any future analysis blind spots found
- Avoids failures due to undefined behavior in code (incorrect casts) which are sometimes difficult to track down and fix
Disadvantages:
- Performance impact
- Affects every single devirtualization, as we cannot tell which might be impacted
Fatter Objects
In order to enable visibility into the native object for WPD and any future whole program optimizations, one possibility is to include additional information from the original IR into a special section in the native objects. For ThinLTO, which operates on the summary during the whole program step, this could include all or some of the summary bitcode. For regular LTO, which mostly operates on IR, this would need to include some or all of the IR, unless we change regular LTO whole program based transformations to utilize a summary.
We would need to teach the linker and LTO how to handle this bitcode. We would presumably want lld to do its normal symbol resolution based on the symbols in the ELF portion, but then still hand off the bitcode to LTO. And in LTO we would need to know to treat the bitcode (or its summary, in the case of ThinLTO) as read-only. Specifically, we cannot do any whole program optimizations that require changes (e.g. linkage types) on anything from the fat object’s bitcode. This is likely easier in ThinLTO where we are just dealing with summaries than in regular LTO, which operates directly on IR.
Additionally, for the saved bitcode (summary or IR) to be usable for WPD, we need to change clang to add type metadata to vtables even when not compiling for LTO units (which currently requires an -flto* option). It is possible that this change may require other modifications to avoid side-effects (although initial testing did not surface any issues).
The two extremes of this approach are to include either all of the module (IR for regular LTO / summary for ThinLTO), or to include only what we need for WPD class hierarchy analysis (the type metadata or its summary). These two are discussed in more detail below, but an intermediate option between these two extremes is also possible, and can be driven by whole program analyses needs. This will be touched on later.
Full Module Summary/IR Bitcode Section
At one extreme is a fat object including all of the summary for ThinLTO and all of the IR for regular LTO. The new bitcode section would need to be provided to LTO by the linker when these native objects are LTO linked, to allow visibility into that object’s class hierarchy (for WPD) and other information (for future whole program analyses). While LLVM does not currently have support for fat objects, it was recently proposed.
For handling the missing whole program visibility, creation of these fat objects would only be required for files being compiled to native objects without LTO, that might be linked with LTO objects. However, in many/most build systems it will not be possible to determine what type of objects a file being compiled will be linked with, which means we would likely need to conservatively always produce fat objects for all non-LTO compiles. This has an object file and linker input size cost.
Advantages:
- Enables LTO to see the original IR (or just the summary for ThinLTO) in cases where LTO is disabled for some objects
- Allows disabling WPD only when necessary: for overridden virtual functions (leaving it on for the rest of the binary)
- Supports whole program analysis beyond just WPD
- Fat objects can be emitted by other compilers (e.g. -ffat-lto-objects in gcc) and were recently proposed for Clang/LLVM
Disadvantages:
- Expensive from an object file size and linker input standpoint due to limitations in detecting at compile time where the fat objects are required
- Non-trivial changes to lld and LTO to treat the bitcode specially (albeit likely easier for ThinLTO where we are operating on summaries, as described earlier)
Type Metadata In ELF Section
A more limited version of the above idea is to simply include some additional class hierarchy information in a special section of the native ELF objects. This could be the bitcode summary for just the type metadata and vtable definitions, which would be sufficient for ThinLTO (and regular LTO WPD could also be taught to utilize this summary when available for native objects). The vtable type metadata will expose any overrides of virtual functions in the corresponding native objects.
This would be a much smaller set of additional information to place in the native objects, helping address overhead concerns described above for fat objects.
The linker would need to be taught to provide the bitcode summary from the special section of any native objects to the LTO handling. The LTO side changes, at least for ThinLTO, should be fairly minimal as the vtable type metadata summary information is read-only to ThinLTO WPD analysis, although as described later we may need to teach the other summary based analyses not to try to apply any optimizations to the corresponding vtable definition in the summary.
Similar to fat objects, we will need to ensure type metadata is applied to vtable definitions even for non-LTO compiles.
Advantages:
- Enables LTO to see any virtual function overrides in native code where LTO was disabled
- Allows disabling WPD only when necessary: for overridden virtual functions (leaving it on for the rest of the binary)
- Less expensive to have always-on for source files being compiled without LTO
- Less invasive changes required in LTO than for fat objects
Disadvantages:
- Only (initially) addresses class hierarchy analysis related visibility issues (although this could be expanded over time, see below)
Intermediate Solution
We could also start with the minimal amount of summary required for solving WPD and other class hierarchy issues, and expand it is needed for any new whole program analyses in the future. Note that the vast majority of LTO whole program optimizations are symbol resolution driven, and we have all the information required to suitably drive aggressiveness based on the native object symbol tables and the resulting linker symbol resolution information.
Alternate Non-Bitcode Solution
A variation of this suggested by Peter Collingbourne is to add a section that simply lists the strings of all type ids from type metadata in that file into a new (non-bitcode) section. WPD would then need to treat all type tests using that type identifier conservatively (block devirtualization). This makes it easier for other compilers to support emission of this section (e.g. linking with native objects produced with a different toolchain), although we will likely need an escape hatch to handle this case anyway as other compilers will not emit this section in the near term (i.e. the linker can communicate to LTO that it is linking a native object without the required section, and LTO can optionally disable all WPD conservatively). It also simplifies “ld -r” handling (which will concatenate the contents of the same section from the merged objects). With “ld -r” the bitcode would also be concatenated into the merged object section, but some bitcode reader support may need to be added to split them apart or interpret it correctly after merging.
Some of the advantages and disadvantages are the same as for including the type metadata related bitcode as described above, but here are the tradeoffs for listing type id strings in a string section vs type metadata related bitcode:
Advantages:
- Easier for other compilers to generate this new section
- Potentially more compact than type metadata summary
- More natural handling with “ld -r”
Disadvantages:
- Not (easily) extensible to include additional information needed by TBD future whole program optimizations.
- Will not be able to determine which virtual functions within the corresponding vtable are overridden (need to conservatively block devirtualization for all virtual calls related to that type). The more fine grained handling could be achieved by including the offsets as well, which then becomes essentially the bitcode summary entries but in a different custom format.
- Requires more custom WPD handling (vs the type metadata summary merged into the combined index which should fit seamlessly into the existing analysis), although this shouldn’t be too onerous.
Recommendation
In order to provide the most targeted but accurate fix for the visibility issue when disabling LTO for some source files, either fat objects or the more limited type metadata ELF section are ideal. To address overhead and complexity concerns around fat objects, a type metadata ELF section is likely to be the more attractive solution, unless fat objects are supported and enabled in the build for other needs. It is also possible to extend the new ELF section to include additional ThinLTO summary entries as we add new whole program analyses that need to see into the native object code. Using bitcode rather than a custom format is more LLVM-centric, however, simplifies extending the section to contain additional summary info for LLVM’s whole program analysis, and using fat objects when they are enabled in the build.
For the above reasons, I recommend that we pursue the approach of adding type metadata ThinLTO summary bitcode into a special ELF section for non-ThinLTO compiles, and utilizing it when fed to an LTO link.
Design Overview
Clang changes to always add type metadata
As mentioned earlier, type metadata insertion currently requires building for LTO, which would need to be changed. A simple change to clang to suppress this requirement only resulted in a couple of LLVM test failures that would be expected from the change due to type data simply appearing where it hadn’t before.
Definition of new summary bitcode ELF section
LLVM supports wrapping bitcode in a native object file in the .llvmbc section. It’s possible that this section name can be utilized for the summary bitcode, rather than it’s current purpose of holding the full LLVM IR bitcode. However, this section is used for other specialized purposes, such as MLGO, which utilizes the bitcode embedded in this section under the -fembed-bitcode option for non-LTO compiles. Overloading this section to hold some summary bitcode instead of or in addition to full IR bitcode may confuse existing users of the section.
The recent -ffat-lto-objects proposal for LLVM appears to use a new .llvm.lto section in the WIP patch. If fat objects are enabled, we can simply utilize the information from that section, which will include IR and ThinLTO summary. If fat objects are not enabled, one option is to place the minimal summary information required for WPD in a section with the same name, which only requires the linker to look for and handle one special section, regardless of whether that includes partial summary, full summary, or full IR + summary.
Linker changes to pass new type summary section to LTO
Currently, lld and gold ignore the .llvmbc section when LTO linking. They would need to be taught to look at this section (or a new TBD section) in any linked native objects during LTO linking. Specifically, the linker will need to add the summary to the LTO unit via a likely custom version of LTO::add().
LTO changes to utilize vtable type summary
As described earlier, the vtable type metadata summary is read only. The only effect it should have is to expand and complete the set of types descriptions compatible with a given type id, which is sufficient for appropriately guiding any WPD from virtual calls in IR modules. So in theory we should be able to simply add the summary entries into the module summary index and allow WPD to proceed as usual.
A complication is that we will need to include in the summary section the vtable symbol definition summaries. We will need to ensure that these are not used for optimizations like symbol importing or anything requiring a linkage type change. I.e. anything that relies on being able to modify the IR of the associated native object, since we do not have IR at that point. The GlobalValueSummary object already has a NotEligibleToImport flag which we can leverage for this purpose.