Optimizing vcalls from structors and virtual this-adjusting thunks

Hi John,

I've noticed Clang doesn't devirtualize all vcalls in ctors/dtors.

e.g. for this code:

Hi John,

I’ve noticed Clang doesn’t devirtualize all vcalls in ctors/dtors.

This sounds bad, please file a bugzilla!

-Chris

Hi John,

I've noticed Clang doesn't devirtualize all vcalls in ctors/dtors.

e.g. for this code:
--------------------------
struct A { virtual void a(); };
struct B { virtual void b(); };
struct C : virtual A, virtual B {
  C();
  virtual void key_function();
  virtual void a();
  virtual void b();
};

C::C() { a(); b(); }
void C::key_function() {}
--------------------------
the assembly for C::C() at -O3 is
--------------------------
_ZN1CC1Ev: # complete ctor
        pushq %rbx
        movq %rdi, %rbx
        movq $_ZTV1C+40, (%rbx)
        movq $_ZTV1C+88, 8(%rbx)
        callq _ZN1C1aEv # call to C::a is devirtualized
        movq (%rbx), %rax
        movq %rbx, %rdi
        popq %rbx
        jmpq *16(%rax) # call to C::b is not!

This looks like it was just standard LLVM optimizations forwarding the vptr
store and evaluating the load from the constant global. Because C::a is
external, we think it could have modified the vptr, so we fail to
devirtualize b.

_ZN1CC2Ev: # base ctor
        pushq %rbx
        movq %rdi, %rbx
        movq (%rsi), %rax
        movq %rax, (%rbx)
        movq 8(%rsi), %rcx
        movq -32(%rax), %rax
        movq %rcx, (%rbx,%rax)
        movq 16(%rsi), %rax
        movq (%rbx), %rcx
        movq -40(%rcx), %rcx
        movq %rax, (%rbx,%rcx)
        movq (%rbx), %rax
        callq *(%rax) # looks like even C::a is not devirtualized
        movq (%rbx), %rax
        movq %rbx, %rdi
        popq %rbx
        jmpq *16(%rax) # call C::b is not devirtualized

I don't fully understand how VTTs are supposed to work here, but it looks
like we don't have a vptr store to forward, so LLVM can't devirtualize. It
would have to be a clang IRGen optimization.

Nick sent some patches to try to teach LLVM that the vptr is usually
constant across most calls, but they failed to handle certain corner cases
involving placement new that John raised.

I’m sure GCC is just statically recognizing that it’s a virtual call on ’this’ occurring within the constructor.

John.

This sounds bad, please file a bugzilla!

Sure, filed http://llvm.org/PR17863

Hi John,

I've noticed Clang doesn't devirtualize all vcalls in ctors/dtors.

e.g. for this code:
--------------------------
struct A { virtual void a(); };
struct B { virtual void b(); };
struct C : virtual A, virtual B {
  C();
  virtual void key_function();
  virtual void a();
  virtual void b();
};

C::C() { a(); b(); }
void C::key_function() {}
--------------------------
the assembly for C::C() at -O3 is
--------------------------
_ZN1CC1Ev: # complete ctor
        pushq %rbx
        movq %rdi, %rbx
        movq $_ZTV1C+40, (%rbx)
        movq $_ZTV1C+88, 8(%rbx)
        callq _ZN1C1aEv # call to C::a is devirtualized
        movq (%rbx), %rax
        movq %rbx, %rdi
        popq %rbx
        jmpq *16(%rax) # call to C::b is not!

This looks like it was just standard LLVM optimizations forwarding the vptr
store and evaluating the load from the constant global. Because C::a is
external, we think it could have modified the vptr, so we fail to
devirtualize b.

_ZN1CC2Ev: # base ctor
        pushq %rbx
        movq %rdi, %rbx
        movq (%rsi), %rax
        movq %rax, (%rbx)
        movq 8(%rsi), %rcx
        movq -32(%rax), %rax
        movq %rcx, (%rbx,%rax)
        movq 16(%rsi), %rax
        movq (%rbx), %rcx
        movq -40(%rcx), %rcx
        movq %rax, (%rbx,%rcx)
        movq (%rbx), %rax
        callq *(%rax) # looks like even C::a is not devirtualized
        movq (%rbx), %rax
        movq %rbx, %rdi
        popq %rbx
        jmpq *16(%rax) # call C::b is not devirtualized

I don't fully understand how VTTs are supposed to work here, but it looks
like we don't have a vptr store to forward, so LLVM can't devirtualize. It
would have to be a clang IRGen optimization.

Nick sent some patches to try to teach LLVM that the vptr is usually
constant across most calls, but they failed to handle certain corner cases
involving placement new that John raised.

Why generate virtual call (in LL) in the first place?

Hi John,
...
I also have a somewhat related ABI question.
Is there any reason to keep virtual this-adjusting thunks in the
vtable when the class is fully constructed?
I think all the offsets between bases are known statically at the end
of the complete object constructor, so a special "final vtable" with
only static this adjusting thunks can be used instead of a regular
vtable?
Am I missing something?

John, do you have any opinions on the most derived class vtable question?

This is a special case of the general optimization called customization, described e.g. here:
  http://dl.acm.org/citation.cfm?id=74831
The idea is to reduce dynamism in a function by emitting a version of it that's only valid when a parameter (typically "this") has an exact dynamic type.

In general, customization is both valid and useful, with the caveat that it’s a trade-off between dynamic performance and code size. For the specific case of thunks, though, the code size cost is probably tiny enough to not be worth worrying about, especially if you make a point of sharing customized thunks when possible.

However, you do have to worry about the thing that thunks always have to worry about, though, which is that you cannot always faithfully forward a function call. If the call ABI relies on the exact position on the stack (e.g. if the call is variadic or has non-trivially-copyable arguments under the MS C++ ABI), then it can only be forwarded with a tail call. That’s not always possible — e.g. if your thunk is also a covariant return thunk — and, even when it is, LLVM does not currently provide a guaranteed replace-this-argument-and-tail-call operation.

As a special case, when emitting v-base adjustment thunks for a method defined in a final class, we should always be using a static adjustment.

John.