RFC: Safe Whole Program Devirtualization Enablement

Please send any comments. As mentioned at the end I will follow up with some patches as soon as they are cleaned up and I create some test cases.

RFC: Safe Whole Program Devirtualization Enablement

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not

I can be missing something, but why can’t we use type metadata instead of !vcall_visibility to identify vtable pointers? ​We can skip emission of !type for vtables having [[clang::lto_visibility_public]] attribute and postpone decision on other vtables in the way you suggested.

Specifically, what we want to know at LTO time is whether the vtable has hidden LTO visibility or not

I can be missing something, but why can’t we use type metadata instead of !vcall_visibility to identify vtable pointers? We can skip emission of !type for vtables having [[clang::lto_visibility_public]] attribute and postpone decision on other vtables in the way you suggested.

I’m not sure if you mean the vtables that have received this attribute manually, or just the ones that by default would get public LTO visibility (the latter is the vast bulk of the interesting case). Regardless, it is the same reason. At LTO link time we want to optionally treat these as hidden (i.e. delay the effects of what would have been done at compile time under -fvisibility=hidden). If we don’t emit the !type metadata, then we cannot do this as we lose the class hierarchy info necessary for WPD. The vcall_visibility attribute just tells us which vtables we must treat conservatively as public without the LTO link time assertion provided by the proposed new link option that we can safely treat public classes as hidden due to the link mode.

Teresa

(cc list this time)

Hi Teresa,

Apologies if this has been discussed before but ...

The LTO visibility of a class is derived at compile time from the class’s symbol visibility.
Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility.
Compiling with -fvisibility=hidden tells the compiler that, unless
otherwise marked, symbols are assumed to have hidden visibility, which
also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute).
This results in much more aggressive devirtualization.

Note that by default, unlike GCC, LLVM is liberal on visibility-constrained optimizations. In particular it freely performs inlining, IPA and cloning on them (see https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html which also suggested adding -fsemantic-interposition to actually respect visibility in optimizations). It's unclear why devirtualization should behave differently than other optimizations (at least by default).

-I

(cc list this time)

Hi Teresa,

Apologies if this has been discussed before but …

The LTO visibility of a class is derived at compile time from the class’s symbol visibility.
Generally, only classes that are internal at the source level (e.g. declared in an anonymous namespace) receive hidden LTO visibility.
Compiling with -fvisibility=hidden tells the compiler that, unless
otherwise marked, symbols are assumed to have hidden visibility, which
also implies that all classes have hidden LTO visibility (unless decorated with a public visibility attribute).
This results in much more aggressive devirtualization.

Note that by default, unlike GCC, LLVM is liberal on visibility-constrained optimizations. In particular it freely performs inlining, IPA and cloning on them (see https://lists.llvm.org/pipermail/llvm-dev/2016-November/107625.html which also suggested adding -fsemantic-interposition to actually respect visibility in optimizations). It’s unclear why devirtualization should behave differently than other optimizations (at least by default).

Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)? I believe that will be too aggressive for class LTO visibility. It is common to override a virtual functions across shared library boundaries (e.g. a test may override a virtual function from a shared library with a mock class). But with what I am proposing we will assume it is safe under the proposed LTO link option, which should be applied when linking statically other than e.g. system libraries.

Thanks,
Teresa

Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)?
I believe that will be too aggressive for class LTO visibility.
It is common to override a virtual functions across shared library boundaries

I'm myself a fan of GCC's conservative approach so I'd agree with you on that. But regardless of my personal opinions, LLVM already optimizes normal functions in shared libraries (thus mercilessly breaking overrides via LD_PRELOAD, etc.) and I don't see how virtual functions are any different. I think default LLVM behavior needs to be consistent for all inter-procedural optimizations (be it inlining, devirtualization or cloning).

Maybe it's time to resurrect Hal's -fsemantic-interposition flag and use it consistently throughout compiler? Users who need GCC-like semantics will be able to employ this flag to prevent unsafe optimizations.

-I

(Readding Hal)

Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)?
I believe that will be too aggressive for class LTO visibility.
It is common to override a virtual functions across shared library boundaries

I'm myself a fan of GCC's conservative approach so I'd agree with you on that. But regardless of my personal opinions, LLVM already optimizes normal functions in shared libraries (thus mercilessly breaking overrides via LD_PRELOAD, etc.) and I don't see how virtual functions are any different. I think default LLVM behavior needs to be consistent for all inter-procedural optimizations (be it inlining, devirtualization or cloning).

Maybe it's time to resurrect Hal's -fsemantic-interposition flag and use it consistently throughout compiler? Users who need GCC-like semantics will be able to employ this flag to prevent unsafe optimizations.

-I

(Readding Hal)

Are you suggesting that we should be more aggressive by default (i.e. without -fvisibility=hidden or any new options)?
I believe that will be too aggressive for class LTO visibility.
It is common to override a virtual functions across shared library boundaries

I’m myself a fan of GCC’s conservative approach so I’d agree with you on that. But regardless of my personal opinions, LLVM already optimizes normal functions in shared libraries (thus mercilessly breaking overrides via LD_PRELOAD, etc.) and I don’t see how virtual functions are any different. I think default LLVM behavior needs to be consistent for all inter-procedural optimizations (be it inlining, devirtualization or cloning).

Maybe it’s time to resurrect Hal’s -fsemantic-interposition flag and use it consistently throughout compiler? Users who need GCC-like semantics will be able to employ this flag to prevent unsafe optimizations.

-fsemantic-interposition controls whether the compiler may assume that symbols are not interposed, and it has nothing to do with the optimization proposed here. The concern here is whether the user may derive from a class defined in another shared object, and that does not involve symbol interposition.

Peter

-fsemantic-interposition controls whether the compiler may assume that symbols are not interposed,
and it has nothing to do with the optimization proposed here.

Thanks Peter, you are probably right. I've overestimated the information that's available to LTO optimizer at link time.

Leaving shared libraries aside, one might argue that when LTOptimizing main executable file compiler/linker is aware of all participating libraries and so can decide whether class is not derived from and apply devirtualization based on that information (assuming that run-time library implementations or dlopen calls do not introduce new inherited classes at runtime, that's where -fno-semantic-interposition assumption comes into play). But the "decide whether class is not derived from" part here is problematic - derived classes in libraries may have hidden visibility and will go undetected.

-I

-fsemantic-interposition controls whether the compiler may assume that symbols are not interposed,
and it has nothing to do with the optimization proposed here.

Thanks Peter, you are probably right. I’ve overestimated the information that’s available to LTO optimizer at link time.

Leaving shared libraries aside, one might argue that when LTOptimizing main executable file compiler/linker is aware of all participating libraries and so can decide whether class is not derived from and apply devirtualization based on that information (assuming that run-time library implementations or dlopen calls do not introduce new inherited classes at runtime, that’s where -fno-semantic-interposition assumption comes into play). But the “decide whether class is not derived from” part here is problematic - derived classes in libraries may have hidden visibility and will go undetected.

Right, this is why we need some guarantee from the user that the LTO link will see all derived classes. Currently at head the only way to do so is via -fvisibility=hidden when compiling to bitcode. This proposal adds another mechanism, pre-enabling the bitcode for WPD and delaying the guarantee until LTO link time with an option there.

Thanks,
Teresa

FYI I mailed 3 patches this morning that together implement the RFC. PTAL:

D71907: [WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables

D71911: [ThinLTO] Summarize vcall_visibility metadata

D71913: [LTO/WPD] Enable aggressive WPD under LTO option

Teresa