C++ devirtualization

TL;DR: where I wonder about the viability of a seemingly simple change to the code generation in Clang that seem to trick LLVM in doing a lot more devirtualization work than it’s doing today… without any change to LLVM itself.

Hello,

I was reading Jan Hubicka’s serie of articles on devirtualization in gcc, of which I posted the latest opus to Reddit [1] where you can find links to the 5 parts.

In Part 2, Jan brings up a discussion on llvm-dev [2] where were debated the best ways to bring devirtualizations in the backend, for example in the presence of opaque functions:

#include

struct Base { virtual void virtualCall() = 0; };
struct Derived: Base { virtual void virtualCall() override { printf(“Hello, World!\n”); } };

void inlinable(Base& b) { b.virtualCall(); }
void opaque(Base& b);

void func() {
Derived d;

inlinable(d); // statement A
opaque(d); // Unknown implementation, LLVM assumes it may change the virtual pointer

inlinable(d); // statement B
}

Now, the difference of treatment between the statement A and the statement B is that, after inlining, LLVM will de-virtualize the call in A but not in B, as can be seen thanks to Coliru [3]:

define void @_Z4funcv() #0 {
; Derived d;
%d = alloca %struct.Derived, align 8
%1 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0, i32 0
store i32 (…)** bitcast (i8** getelementptr inbounds ([3 x i8*]* @_ZTV7Derived, i64 0, i64 2) to i32 (…)), i32 (…)* %1, align 8, !tbaa !1
%2 = getelementptr inbounds %struct.Derived* %d, i64 0, i32 0
%3 = bitcast %struct.Derived* %d to void (%struct.Base*)***

; inlinable(d); // statement A
%puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64 0, i64 0))

; opaque(d);
call void @_Z6opaqueR4Base(%struct.Base* %2)

; inlinable(d); // statement B
%4 = load void (%struct.Base*)*** %3, align 8, !tbaa !1
%5 = load void (%struct.Base*)** %4, align 8
call void %5(%struct.Base* %2)

ret void
}

This is typical of the rift between the C+±aware front-end (Clang) and the C+±unaware middle-end (LLVM):

  • Clang knows that C++ does not allow a change of dynamic type, and thus of v-ptr, during the lifetime of an object (after construction ended and before destruction begins); but cannot do anything because it knows not how to inline

  • LLVM knows how to inline but is unaware of C++ rules and that those obscure bitcasts are all about virtual calls

Now, as mentioned in [2] the discussion on how to make LLVM more “aware” of what is going on is complicated; I have seen several proposals over time, the merit of various intrinsics being discussed and no approach being taken: it’s a hard problem.

In this very special case, however, it seems to me that Clang could reach out to LLVM. Suppose that Clang introduced a simple store instruction right after the “opaque” call ?

If we simulate the Itanium ABI using a more C-ish approach [4]:

#include

struct Base;

struct BaseVTableLayout {
void (virtualCall)(Base);
};

struct Base { BaseVTableLayout const* vptr; };
struct Derived: Base {};

void Derived_virtualCall(Base*) { printf(“Hello, World!\n”); }

BaseVTableLayout const BaseVTable = { 0 };
BaseVTableLayout const DerivedVTable = { &Derived_virtualCall };

void inlinable(Base* b) { b->vptr->virtualCall(b); }

void opaque(Base* b);

void func() {
Derived d;
d.vptr = &DerivedVTable;

inlinable(&d); // statement A

opaque(&d); // Unknown implementation, LLVM assumes it may change the virtual pointer

inlinable(&d); // statement B
}

We see that the exact same behaviour is exhibited by LLVM: in fact the IR is nearly a carbon-copy, except for the name of the globals.

In this C-ish approach, though, introducing the “redundant” store is dead-easy. In fact, we can even be completely naive about it:

void func() {
Derived d;
d.vptr = &DerivedVTable;

inlinable(&d); // statement A
d.vptr = &DerivedVTable; // inform LLVM that the function is not allowed to change the virtual pointer under our feet

opaque(&d); // Unknown implementation, LLVM assumes it may change the virtual pointer
d.vptr = &DerivedVTable; // inform LLVM that the function is not allowed to change the virtual pointer under our feet

inlinable(&d); // statement B
d.vptr = &DerivedVTable; // inform LLVM that the function is not allowed to change the virtual pointer under our feet
}

And let’s look at what LLVM generates:

define void @_Z4funcv() #1 {
; Derived d;
%d = alloca %struct.Base, align 8
%1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0

; inlinable(d); // statement A
%puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64 0, i64 0)) #3

; opaque(d);
store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %1, align 8, !tbaa !1

call void @_Z6opaqueP4Base(%struct.Base* %d)

store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %1, align 8, !tbaa !1

; inlinable(d); // statement B
%puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64 0, i64 0)) #3

ret void

}

Bingo! The indirect virtual call in statement B has been inlined!

But let’s not stop there. There are plenty of occasions where Clang does not know the dynamic type and yet that can benefit from being simple-minded. Suppose that we start from:

void funky(Base* b) {
inlinable(b); // statement A

opaque(b); // Unknown implementation, LLVM assumes it may change the virtual pointer

inlinable(b); // statement B
}

void func() {
Derived d;
d.vptr = &DerivedVTable;

funky(&d);
}

Then let’s apply this “the dynamic type will not change” rule [5]:

void funky(Base* b) {
BaseVTableLayout const* vptr = b->vptr;
inlinable(b); // statement A
b->vptr = vptr; // inform LLVM that the function is not allowed to change the virtual pointer under our feet

opaque(b); // Unknown implementation, LLVM assumes it may change the virtual pointer
b->vptr = vptr; // inform LLVM that the function is not allowed to change the virtual pointer under our feet

inlinable(b); // statement B
b->vptr = vptr; // inform LLVM that the function is not allowed to change the virtual pointer under our feet
}

void func() {
Derived d;
d.vptr = &DerivedVTable;

funky(&d);
}

And behold:

define void @_Z4funcv() #1 {
; Derived d;
%d = alloca %struct.Base, align 8
%1 = getelementptr inbounds %struct.Base* %d, i64 0, i32 0

; inlinable(d); // statement A
%puts.i = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64 0, i64 0)) #3

; opaque(d);
store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %1, align 8, !tbaa !1

call void @_Z6opaqueP4Base(%struct.Base* %d)

store %struct.BaseVTableLayout* bitcast ({ void (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %1, align 8, !tbaa !1

; inlinable(d); // statement B
%puts.i1 = call i32 @puts(i8* getelementptr inbounds ([14 x i8]* @str, i64 0, i64 0)) #3

ret void
}

For those wondering, in the back row, yes that is the exact same LLVM IR than before; it just saw throw our the call to “funky”.

And for more fun, trying it in a loop:

int add(Base** array, size_t size) {
int total = 0;
for (size_t i = 0; i != size; ++i) {
Base* b = array[i];

opaque(b);

total += b->vptr->virtualCall(b);
}
return total;
}

int func() {
Derived d0;
d0.vptr = &DerivedVTable;
Derived d1;
d1.vptr = &DerivedVTable;
Derived d2;
d2.vptr = &DerivedVTable;

Base* b[3] = { &d0, &d1, &d2 };

return add(b, 3);
}

The IR looks ugly [6]: “add” is inlined in “func” and the loop is unrolled, but LLVM does not know that “opaque” cannot change the v-ptr, and thus a lot of virtual calls ensue. Fun fact: I started by forgetting to put the call to “opaque” and LLVM directly optimized “func” to “ret 126”.

So, let’s apply our simple “the dynamic type will not change” rule like before [7]:

int add(Base** array, size_t size) {
int total = 0;
for (size_t i = 0; i != size; ++i) {
Base* b = array[i];
BaseVTableLayout const* vptr = b->vptr;

opaque(b);
b->vptr = vptr;

total += b->vptr->virtualCall(b);
b->vptr = vptr;
}
return total;
}

And now I dare present the IR:

define i32 @_Z4funcv() #1 {
.lr.ph.i:
%d0 = alloca %struct.Base, align 8
%d1 = alloca %struct.Base, align 8
%d2 = alloca %struct.Base, align 8

%0 = getelementptr inbounds %struct.Base* %d0, i64 0, i32 0

store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %0, align 8, !tbaa !5

%1 = getelementptr inbounds %struct.Base* %d1, i64 0, i32 0

store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %1, align 8, !tbaa !5

%2 = getelementptr inbounds %struct.Base* %d2, i64 0, i32 0

store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %2, align 8, !tbaa !5

call void @_Z6opaqueP4Base(%struct.Base* %d0)

store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %0, align 8, !tbaa !5

call void @_Z6opaqueP4Base(%struct.Base* %d1)

store %struct.BaseVTableLayout* bitcast ({ i32 (%struct.Base*)* }* @_ZL13DerivedVTable to %struct.BaseVTableLayout*), %struct.BaseVTableLayout** %1, align 8, !tbaa !5

call void @_Z6opaqueP4Base(%struct.Base* %d2)

ret i32 126
}

Where one note that the additions of 42 were folded together.

Conclusion/Digest:

The attempt to implement devirtualization in CodeGenFunction::CanDevirtualizeMemberFunctionCall only get us so far. It seems reasonable to take advantage of the “final” keyword and other language-level constructs, but Clang does not (and should not) implement inlining and constant propagation, and thus does not cross the bridge. Meta-data proposals are still pending, and I still have hope for them, in the mean time though a stop-gap alternative seems interesting.

A simple addition to the CodeGen process, expressing the rule the dynamic type of an object is not changed by a function call in LLVM IR instructions (caching the vptr of an instance as soon as it comes into being and restoring it after each function call involving this instance), seems to be sufficient to enable the devirtualization of calls by LLVM whenever inlining and constant propagation make it possible.

The only downside that I could fathom are the redundant store instructions: they are the witnesses that LLVM did not consider them redundant and therefore could not have devirtualized calls without them. Proper meta-data would solve this issue I suppose.

Still, in my naivete, I believe those redundant stores to cost less than the devirtualization opportunities currently being missed.

I would be very interested in hearing the community feedback on this.

[1] http://www.reddit.com/r/cpp/comments/23rjda/honza_hubi%C4%8Dkas_blog_devirtualization_in_c/
[2] http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-December/046072.html
[3] http://coliru.stacked-crooked.com/a/0dc6a12cb5f27543
[4] http://coliru.stacked-crooked.com/a/705f607005017c43
[5] http://coliru.stacked-crooked.com/a/a34b6674255ad66c
[6] http://coliru.stacked-crooked.com/a/93a493d80e7bcc3b
[7] http://coliru.stacked-crooked.com/a/0f0d8470c1b066d0

Have you tried using llvm.invariant.start / llvm.invariant.end markers
instead of the stores?

Joerg

The last time we discussed this, this test case came up:

char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
A a = reinterpret_cast<A>(buffer);
B b = reinterpret_cast<B>(buffer);

new(buffer) A;
a->vfn();
new(buffer) B;
b->vfn();

Imagine those placement news are buried in an external function call.

Applying your transformation of adding stores breaks the program, if I’m not mistaken:

char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
A a = reinterpret_cast<A>(buffer);
B b = reinterpret_cast<B>(buffer);

new(buffer) A;
void *vptr = a->vptr;
a->vfn();
new(buffer) B;
a->vptr = vptr; // Nope, we changed it. a is dead, long live b.

b->vfn();

When Clang finally adds support for (something like) MSVC’s __assume perhaps you could replace the store with something like __assume( vfptr == something )…

“What Huxley teaches is that in the age of advanced technology, spiritual devastation is more likely to come from an enemy with a smiling face than from one whose countenance exudes suspicion and hate.” Neil Postman