"load groups" IR feature to improve C++ devirtualization

I’m looking into how we can improve devirtualization in clang, and there a language in C++ feature I’d like to take advantage of which would let us perform elimination of more vptr loads. In this code:

Cls *p = new Cls;
p->virtual_method1();
p->method_changing_vptr(); // uses placement new to legally change the vptr
p->virtual_method2(); // invalid!
Cls *q = p;
q->virtual_method2(); // this must get a new vptr lookup.

there is no need to reload p’s vptr, even if the method did update the vptr.

C++ [basic.life] gives us a guarantee that a pointer will only update to point to a new object allocated at the same place in memory under certain circumstances. If the C++ code uses the same pointer, reference or name to refer to the object, then we can prove that the vptr and any const members did not change.

I’d like clang to compute whether a load is eligible for this treatment per the rules in C++, and encode that in LLVM IR for further optimization. (Note that this is different from @llvm.invariant because method_changing_vptr_through_placement_new may be inlined and is required to see the updated vptr.)

To implement this, I propose a new intrinsic in LLVM:

declare {}* @llvm.load.group()

and new metadata on loads:

!load.group %group

where %group must be the result of a call to llvm.load.group. Any two loads with the same %group value are known to produce the same value. We can then choose to eliminate the latter of the loads as redundant in GVN, or in the event of high register pressure we could choose to reload without spilling to the stack.

For clang, let us say that two expressions E1 and E2 of the same type denote the same value if:

  • E1 and E2 both name the same variable, which is either of class or reference-to-class type or const pointer-to-class type, or is of non-const pointer type and is known to have not changed between the evaluations of E1 and E2. (By introductory text in [basic.life]).
  • E1 and E2 are of the form E3.x and E4.x, or E3->x and E4->x, or E3[x] and E4[x], for the same x, and E3 and E4 denote the same value, and the denoted subobject either has a const-qualified type or is a reference. (By bullet 3, ibid).
  • (Fudging a bit on E1 and E2 being expressions…) E1 and E2 are both references to the same vptr slot. (By bullet 2 or 4, ibid).

Let me know if you think this design can be improved, or if there are cases it doesn’t handle (or gets wrong). An explicit non-goal is the “constructor not defined in TU” problem.

Nick

This is not how I understand the [basic.life] rules. The question is whether a pointer value, reference, or name is formally forwarded to point to the new object. Because the dynamic type is different, the pointer value held in 'p' is not updated. Copying that value into 'q' does not change the fact that the pointer value still refers to a non-existent object.

It is unclear what, exactly, under the rules constitutes forming a valid pointer to the newly-constructed object except using the result of the new-expression itself. I think an explicit cast might, ignoring all "object-ness" of the source pointer and simply treating it formally as a pointer to some storage that you are casting to the type of an object stored there?

John.

> I'm looking into how we can improve devirtualization in clang, and there
a language in C++ feature I'd like to take advantage of which would let us
perform elimination of more vptr loads. In this code:
>
> Cls *p = new Cls;
> p->virtual_method1();
> p->method_changing_vptr(); // uses placement new to legally change
the vptr
> p->virtual_method2(); // invalid!
> Cls *q = p;
> q->virtual_method2(); // this must get a new vptr lookup.

I bungled my example, and I want to fix that first. I was thinking:

  Derived *p = new Derived;
  p->virtual_method1();
  p->method_changing_vptr(); // uses placement new to legally change the
vptr to Base
  p->virtual_method2(); // invalid!
  Base *q = p;
  q->virtual_method2(); // this must get a new vptr lookup.

which doesn't address your concerns.

This is not how I understand the [basic.life] rules. The question is

whether a pointer value, reference, or name is formally forwarded to point
to the new object. Because the dynamic type is different, the pointer
value held in 'p' is not updated. Copying that value into 'q' does not
change the fact that the pointer value still refers to a non-existent
object.

I'm actually okay with the simple copy not forming a new object pointer.
However, "Base *q = reinterpret_cast<Base*>(p);" really ought to.

It is unclear what, exactly, under the rules constitutes forming a valid

pointer to the newly-constructed object except using the result of the
new-expression itself. I think an explicit cast might, ignoring all
"object-ness" of the source pointer and simply treating it formally as a
pointer to some storage that you are casting to the type of an object
stored there?

I want to make optimizations to the program that people can't object to
through a cursory reading of the standard, which is made difficult by the
standard being contradictory on many relevant points here. Ultimately I've
chosen to be very liberal about what I'm allowing to be considered a newly
formed valid pointer.

BTW, Richard came up with a wonderful example. What do you make of this?:

  char alignas(A, B) buffer[max(sizeof(A), sizeof(B))];
  A *a = reinterpret_cast<A*>(buffer);
  B *b = reinterpret_cast<B*>(buffer);
  new(buffer) A;
  a->vfn();
  new(buffer) B;
  b->vfn();

Valid?

Nick

Hi Nick,

Won’t this approach run into problems with inlining, unrolling, or just about anything that duplicates code? Suppose I have a bunch of loads of the same load group in the body of the loop, but update the pointer they’re dereferencing on the backedge. If that loop gets unrolled, all the loads in the unrolled copies will have the same load group, but it would not be correct to replace loads from the first iteration with loads from the second iteration.

–Owen

Hi Nick,

Won't this approach run into problems with inlining, unrolling, or just
about anything that duplicates code? Suppose I have a bunch of loads of
the same load group in the body of the loop, but update the pointer they're
dereferencing on the backedge. If that loop gets unrolled, all the loads
in the unrolled copies will have the same load group, but it would not be
correct to replace loads from the first iteration with loads from the
second iteration.

No, @llvm.load.group returns a new group each time it's run. We'll
unroll/inline/duplicate that call with the rest of the loads.

Nick

Yes, I agree that it ought to.

Being conservative is fair. My point was just that it has absolutely nothing to do with being held in a named pointer variable. Also, this is C++, so you almost certainly need to be able to track the object-ness of values across inlining in order to have much hope of meaningful optimization. And potentially not just across inlining, but through memory — who actually allocates polymorphic values and doesn’t use smart pointers these days?

Let’s answer a simpler question. Why is this valid?

A a = reinterpret_cast<A>(buffer);

new(buffer) A;
a->vfn();

My initial interpretation is that the initial value of ‘a’ is a type-punned pointer that refers to an object of type char[n]. The lifetime of that object ends when we reuse its storage for a new object of type A. (This is okay to do to any type; additionally, since the char[n] object does not have a non-trivial destructor, we are not required to put a char[n] object back before it goes out of scope.) This leaves the name ‘buffer’ referring only to allocated storage. The lifetime of the A object begins after the constructor completes. ‘a’ does not automatically refer to the new object because the type of the object it refers to does not match the type of the object now created there. So, what’s the theory that allows us to use ‘a’ here as if it referred to the new object?

To be clear, I think we obviously have to permit both of these examples.

The only additional complication with Richard’s example is that yet another object comes into existence. I don’t see that as fundamentally changing anything.

John.

This is not how I understand the [basic.life] rules. The question is
whether a pointer value, reference, or name is formally forwarded to point
to the new object. Because the dynamic type is different, the pointer
value held in 'p' is not updated. Copying that value into 'q' does not
change the fact that the pointer value still refers to a non-existent
object.

I'm actually okay with the simple copy not forming a new object pointer.
However, "Base *q = reinterpret_cast<Base*>(p);" really ought to.

Yes, I agree that it ought to.

It is unclear what, exactly, under the rules constitutes forming a valid

pointer to the newly-constructed object except using the result of the
new-expression itself. I think an explicit cast might, ignoring all
"object-ness" of the source pointer and simply treating it formally as a
pointer to some storage that you are casting to the type of an object
stored there?

I want to make optimizations to the program that people can't object to
through a cursory reading of the standard, which is made difficult by the
standard being contradictory on many relevant points here. Ultimately I've
chosen to be very liberal about what I'm allowing to be considered a newly
formed valid pointer.

Being conservative is fair. My point was just that it has absolutely
nothing to do with being held in a named pointer variable. Also, this is
C++, so you almost certainly need to be able to track the object-ness of
values across inlining in order to have much hope of meaningful
optimization. And potentially not just across inlining, but through memory
— who actually allocates polymorphic values and doesn't use smart pointers
these days?

You essentially want to have some optimizations fire to clean things up
before we devirtualize? And you want to devirtualize more often? So
demanding. It's not unreasonable, it's just hard. I don't have a way to
salvage the existing proposal.

Let me propose to extend LLVM's pointers by defining that they are a pair
of object ID and memory address. As a matter of lowering to target ABI, we
only pass the memory addresses not the object IDs, ptrtoint produces an
integer from only the memory address, and inttoptr creates a pointer with
some object ID, but we don't know which one. icmp only compares the memory
address part of these pointers. Bitcast and GEP preserve the object ID.

To create a new object ID, a new intrinsic @llvm.newobject takes and
returns pointers of the same type, where the memory address is the same,
but with a newly allocated object ID. Given "%x = i8* call
@llvm.newobject(i8* %y)", %x and %y are mayalias in BasicAA (TBAA may
decide they're NoAlias anyhow, and that's fine).

The only observer of these object IDs are load instructions annotated with
!llvm.invariant.object metadata. Loads with this annotation that load the
same pointer (including both object ID and memory address) may be assumed
to load the same value no matter what other operations happen in between.

At the C++ level, we state that a pointer's associated vptr is fixed on
first use. Things which break that association emit a call to
@llvm.newobject. That includes pointer casts, and calls to placement new.
Yes, if you call placement new you need to use the newly returned pointer,
or else your program has undefined behaviour.

This proposal is much more rough than the last one and I'd appreciate any
help identifying the problems. Please review!

Nick

where %group must be the result of a call to llvm.load.group. Any two loads
with the same %group value are known to produce the same value. We can then
choose to eliminate the latter of the loads as redundant in GVN, or in the
event of high register pressure we could choose to reload without spilling
to the stack.

What mark would GVN leave on the IR for the register allocator to use
when rematerializing the load?

Nick

Cheers,
Rafael

It's something novel we would have to create, probably by slapping metadata
on the load.

This proposal is dead, in favour of my newer proposal for @llvm.newobject
(in the same thread). That proposal is stalled because it relies on picking
a certain interpretation of the C++ standard in an area where the standard
is self-contradictory.

There are other issues in devirtualization which I can make progress on
(PR11331 is one, and "knowing the initial value of the vtable
post-constructor where the constructor is not defined in this translation
unit" is another), but I think I need to drop the category of "same pointer
implies same vtable" optimizations until there are language changes that
support it.

Nick