RFC: dynamic_cast optimization in LTO

Hi,
There was a mention of optimizing away C++ dynamic_casts in LTO in this presentation: https://www.youtube.com/watch?v=Fd3afoM3UOE&t=1306
I couldn’t find any discussion on llvm-dev pertaining to this optimization.

What is the optimization (TL;DR version):
The tranformation tries to convert a __dynamic_cast function call, into an address comparison and VFT-like lookup, when the following conditions are met:

  1. the destination type is a leaf type, i.e. is never derived from (similar to C++ final semantics) in the entire program.
  2. the static type of the expression being casted is a public base (potentially multi-base and never private) class of the destination type.

Example:
Given a the C++ expression:
NULL != dynamic_cast<A*>(ptr) // where B* ptr;
which coming out of clang would look like so:
NULL ! = __dynamic_cast(ptr,
&_ZTI1B, // typeinfo of B, the static type of ptr.
&_ZTI1A, // typeinfo of A, the destination type.
hint) // a static hint about the location of the source subobject w.r.t the complete object.

If the above conditions can be proven to be true, then an equivalent expression is:
(destType == dynamicType) where: std::typeinfo destType = &_ZTI1A;
std::typeinfo dynamicType = ((void
)ptr)[-1];

Detailed description:
A C++ dynamic_cast<A*>(ptr) expression can either
(1) be folded by the FE into a static_cast, or
or (2) converted to a runtime call to __dynamic_cast if the FE does not have enough information (which is the common case for dynamic_cast).

The crux of the transformation is trying to prove that a type is a leaf.
We utilize the !type metadata (https://llvm.org/docs/TypeMetadata.html)) that is attached to the virtual function table (VFT) globals to answer this question.
For each VFT, the !type MD lists the other VFTs that are “compatible” with it. In general, the VFT of a class B is considered to be “compatible” with the VFT of a class A, iff A derives (publicly or privately) from B.
This means that the VFT of a leaf class type is never compatible with any other VFT, and we use this fact to decide which type is a leaf.
The second fact that we need to prove is the accessibility of the base type in the derived object.
Unfortunately we couldn’t find a way to compute this information from the existing IR, and had to introduce a custom attribute that the Frontend would place on the __dynamic_cast call. The presence of the attribute implies that the static type (B in our example) is a public base class and never a private base class (in case there are multiple subobjects of the static_type inside the complete object) of the destination type (A in our example). Hence, if the attribute gets deleted by some pass, our transformation will simply do nothing for that __dynamic_cast call.

There are two issues that I could think of that might cause a problem in our approach:

  1. the !type MD gets removed by some pass which will erase the evidence that class types, corresponding to the VFTs that were listed in the MD, are non-leaf.
  2. the supposedly leaf class is actually derived from in a shared library, and the transformation would become invalid.
    I’m hoping this problem is not unique to my situation, and there must be an existing solution to such a scenario. For example, bail out if we know we’re linking any shared libaries or if we’re producing a shared library.

Questions:

  1. Is there interest in adding such an optimization pass to the LTO pipeline?
  2. We implemented the optimization locally and are interested in upstreaming it. However, from what I read the community prefers that we don’t just post a patch and expect it to be reviewed and approved. So this RFC is to get comments on the approach we’ve taken and whether there’s room for improvement (if the approach was correct).
    Specifically I would appreciate comments from people from the AMD compiler since they are the ones who presented the optimization.

Thanks.

Wael Yehia
Compiler Development
IBM Canada Lab

Quiet Ping (thanks)

Hi,
There was a mention of optimizing away C++ dynamic_casts in LTO in this presentation: https://www.youtube.com/watch?v=Fd3afoM3UOE&t=1306
I couldn’t find any discussion on llvm-dev pertaining to this optimization.

What is the optimization (TL;DR version):
The tranformation tries to convert a __dynamic_cast function call, into an address comparison and VFT-like lookup, when the following conditions are met:

  1. the destination type is a leaf type, i.e. is never derived from (similar to C++ final semantics) in the entire program.
  2. the static type of the expression being casted is a public base (potentially multi-base and never private) class of the destination type.

Example:
Given a the C++ expression:
NULL != dynamic_cast<A*>(ptr) // where B* ptr;
which coming out of clang would look like so:
NULL ! = __dynamic_cast(ptr,
&_ZTI1B, // typeinfo of B, the static type of ptr.
&_ZTI1A, // typeinfo of A, the destination type.
hint) // a static hint about the location of the source subobject w.r.t the complete object.

If the above conditions can be proven to be true, then an equivalent expression is:
(destType == dynamicType) where: std::typeinfo destType = &_ZTI1A;
std::typeinfo dynamicType = ((void
)ptr)[-1];

Detailed description:
A C++ dynamic_cast<A*>(ptr) expression can either
(1) be folded by the FE into a static_cast, or
or (2) converted to a runtime call to __dynamic_cast if the FE does not have enough information (which is the common case for dynamic_cast).

The crux of the transformation is trying to prove that a type is a leaf.
We utilize the !type metadata (https://llvm.org/docs/TypeMetadata.html) that is attached to the virtual function table (VFT) globals to answer this question.
For each VFT, the !type MD lists the other VFTs that are “compatible” with it. In general, the VFT of a class B is considered to be “compatible” with the VFT of a class A, iff A derives (publicly or privately) from B.
This means that the VFT of a leaf class type is never compatible with any other VFT, and we use this fact to decide which type is a leaf.
The second fact that we need to prove is the accessibility of the base type in the derived object.
Unfortunately we couldn’t find a way to compute this information from the existing IR, and had to introduce a custom attribute that the Frontend would place on the __dynamic_cast call. The presence of the attribute implies that the static type (B in our example) is a public base class and never a private base class (in case there are multiple subobjects of the static_type inside the complete object) of the destination type (A in our example). Hence, if the attribute gets deleted by some pass, our transformation will simply do nothing for that __dynamic_cast call.

There are two issues that I could think of that might cause a problem in our approach:

  1. the !type MD gets removed by some pass which will erase the evidence that class types, corresponding to the VFTs that were listed in the MD, are non-leaf.
  2. the supposedly leaf class is actually derived from in a shared library, and the transformation would become invalid.
    I’m hoping this problem is not unique to my situation, and there must be an existing solution to such a scenario. For example, bail out if we know we’re linking any shared libaries or if we’re producing a shared library.

Questions:

  1. Is there interest in adding such an optimization pass to the LTO pipeline?
  2. We implemented the optimization locally and are interested in upstreaming it. However, from what I read the community prefers that we don’t just post a patch and expect it to be reviewed and approved. So this RFC is to get comments on the approach we’ve taken and whether there’s room for improvement (if the approach was correct).
    Specifically I would appreciate comments from people from the AMD compiler since they are the ones who presented the optimization.

Thanks.

Wael Yehia
Compiler Development
IBM Canada Lab

Hi Wael,

Sorry for the slow reply. +Peter who is a good person to comment as well.

This sounds interesting, and very related to the analysis needed for WholeProgramDevirt. What kind of gains do you see from the optimization in practice?

This is essentially the same type of analysis with some of the same constraints on type metadata etc we need to do for WPD. Is your patch implementing this for regular LTO or Thin LTO or both? You could take a look at WPD to see how it implements this. It handles WPD for pure regular LTO, for pure ThinLTO (the single implementation devirt only), and for a hybrid mode where modules are split and the vtables are all in a regular LTO module.

Specifically, on the two issues you mention:

  1. the !type MD gets removed by some pass which will erase the evidence that class types, corresponding to the VFTs that were listed in the MD, are non-leaf.

We also have this constraint for WPD, so it shouldn’t be a problem here.

  1. the supposedly leaf class is actually derived from in a shared library, and the transformation would become invalid.
    I’m hoping this problem is not unique to my situation, and there must be an existing solution to such a scenario. For example, bail out if we know we’re linking any shared libaries or if we’re producing a shared library.

We have this issue as well for WPD. It is handled a couple of ways. The first is that by default only vtables with hidden LTO visibility are considered. See https://clang.llvm.org/docs/LTOVisibility.html for more info on that. Secondly, I recently added a mechanism to allow refining the LTO visibility to hidden at link time if it is known then that the LTO link is safe from this constraint, leveraging some vcall_visibility metadata added for another whole program vtable related optimization (Dead Virtual Function Elimination). See the discussion on the RFC: http://lists.llvm.org/pipermail/llvm-dev/2019-December/137543.html, which was subsequently implemented upstream with these patches:

D71907: [WPD/VFE] Always emit vcall_visibility metadata for -fwhole-program-vtables

D71911: [ThinLTO] Summarize vcall_visibility metadata

D71913: [LTO/WPD] Enable aggressive WPD under LTO option

Thanks,
Teresa

Hi Teresa, thank you for your reply and for the valuable pointers (reading through them now)

What kind of gains do you see from the optimization in practice?
The gains are in the few percent range for benchmarks that heavily use dynamic_cast.

Is your patch implementing this for regular LTO or Thin LTO or both?
I only did it for regular LTO for now. Did not attempt Thin LTO, mainly because I haven’t dabbled in that part of LLVM yet :slight_smile:

-----Teresa Johnson <tejohnson@google.com> wrote: -----