C++11 and enhacned devirtualization

Hi everyone,

C++11 added features that allow for certain parts of the class hierarchy to be closed, specifically the 'final' keyword and the semantics of anonymous namespaces, and I think we take advantage of these to enhance our ability to perform devirtualization. For example, given this situation:

struct Base {
  virtual void foo() = 0;
};

void external();
struct Final final : Base {
  void foo() {
    external();
  }
};

void dispatch(Base *B) {
  B->foo();
}

void opportunity(Final *F) {
  dispatch(F);
}

When we optimize this code, we do the expected thing and inline 'dispatch' into 'opportunity' but we don't devirtualize the call to foo(). The fact that we know what the vtable of F is at that callsite is not exploited. To a lesser extent, we can do similar things for final virtual methods, and derived classes in anonymous namespaces (because Clang could determine whether or not a class (or method) there is effectively final).

One possibility might be to @llvm.assume to say something about what the vtable ptr of F might be/contain should it be needed later when we emit the initial IR for 'opportunity' (and then teach the optimizer to use that information), but I'm not at all sure that's the best solution. Thoughts?

Thanks again,
Hal

There a quite a few open bugs about devirtualization. A quick search finds

pr18972, pr3100, pr13227, pr15961, pr15963, pr16984, pr17863, pr19545, pr6747.

A fairly important restriction to keep in mind is that the itanium abi
requires some vtables to be exported from a given file (the one with
the key function), but the functions in those vtables don't have to be
exported.

That means that to devirtualize we have to produce a copy, which hits
issues with code that wants avoid #including definitions.

See the commit message of r189852 for more details.

Cheers,
Rafael

Hi Rafael,

Thanks for the list of bug reports. The situations I've highlighted are indeed PR16984 and PR13227. Do you have any thoughts on a design to fix them? PR16984 mentions that it is likely best to put the necessary information into the IR and let the optimizer take care of the constant propagation to get rid of the indirect call. I agree, this sounds appealing. The question then is what form should that information take. I could do this with @llvm.assume, but I'm somewhat afraid of littering the IR with them in response to a common core language property.

-Hal

My preference for cases like PR16984 would be to change how clang
finds the vtable. Instead of doing a load from this, have clang
directly use the _ZTV variable when it is known at compile time.

I believe Piotr Prazek (cc’d) is working with Richard Smith on a proposal/plan for a general device for devirtualization (something like a restricted assume for pointer loads, if I understand it correctly - as we have nonnull and other attributes for other special cases of assume)

From: "Rafael Espíndola" <rafael.espindola@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>, "Richard Smith" <richard@metafoo.co.uk>
Sent: Thursday, July 16, 2015 10:26:13 AM
Subject: Re: [cfe-dev] C++11 and enhacned devirtualization

My preference for cases like PR16984 would be to change how clang
finds the vtable. Instead of doing a load from this, have clang
directly use the _ZTV variable when it is known at compile time.

Agreed.

Correction to my previous message: When I said, "PR16984 mentions..." below, I should have said, "PR13227 mentions...".

-Hal

The proposal of our solution will be published next week. Brief introduction:

“Basic idea is to mark vtable loads with !invariant , and also put assume() after calling constructor to show how vtable will look like. Function @llvm.invariant.barrier that breaks the invariant will be needed to cope with placement new and other cases where the vptr is legitimately permitted to change.”

We also want to make some improvements on generating vtables as avaliable_externally - for example when there are no virtual inline functions

Piotr

Nice, this should also avoid loading the vtable pointer multiple times in

struct C {
  virtual void foo();
};
void g();
void f(C *x) {
  x->foo();
  g();
  x->foo();
}

Cheers,
Rafael

From: "Piotr Padlewski" <prazek@google.com>
To: "David Blaikie" <dblaikie@gmail.com>
Cc: "Rafael Espíndola" <rafael.espindola@gmail.com>, "Hal Finkel" <hfinkel@anl.gov>, "Richard Smith"
<richard@metafoo.co.uk>, "cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>
Sent: Thursday, July 16, 2015 12:02:47 PM
Subject: Re: [cfe-dev] C++11 and enhacned devirtualization

The proposal of our solution will be published next week. Brief
introduction:

"Basic idea is to mark vtable loads with !invariant , and also put
assume() after calling constructor to show how vtable will look
like.

Interesting, thanks. Just putting these after the constructor call would not, however, handle the case I outlined in my original e-mail.

Function @llvm.invariant.barrier that breaks the invariant
will be needed to cope with placement new and other cases where the
vptr is legitimately permitted to change."

We already have @llvm.invariant.start/@llvm.invariant.end, could those be used instead? (No need to address this now -- just address this in the proposal when you make it.)

Thanks,
Hal

From: "Rafael Espíndola" <rafael.espindola@gmail.com>
To: "Piotr Padlewski" <prazek@google.com>
Cc: "David Blaikie" <dblaikie@gmail.com>, "Hal Finkel" <hfinkel@anl.gov>, "Richard Smith" <richard@metafoo.co.uk>,
"cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>
Sent: Thursday, July 16, 2015 12:14:29 PM
Subject: Re: [cfe-dev] C++11 and enhacned devirtualization

> The proposal of our solution will be published next week. Brief
> introduction:
>
> "Basic idea is to mark vtable loads with !invariant , and also put
> assume()
> after calling constructor to show how vtable will look like.
> Function
> @llvm.invariant.barrier that breaks the invariant will be needed to
> cope
> with placement new and other cases where the vptr is legitimately
> permitted
> to change."

Nice, this should also avoid loading the vtable pointer multiple
times in

struct C {
  virtual void foo();
};
void g();
void f(C *x) {
  x->foo();
  g();

Not if g() might placement-new something else on top of x. The problem is that if we need to make conservative assumptions about whether or not g() might contain a '@llvm.invariant.barrier', we must assume it might.

-Hal

It can, but it will be UB, so we don’t have to worry about it.

From: "Piotr Padlewski" <prazek@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Rafael Espíndola" <rafael.espindola@gmail.com>, "David Blaikie" <dblaikie@gmail.com>, "Richard Smith"
<richard@metafoo.co.uk>, "cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>
Sent: Thursday, July 16, 2015 12:19:40 PM
Subject: Re: [cfe-dev] C++11 and enhacned devirtualization

It can, but it will be UB, so we don't have to worry about it.

Why is it UB? Or when it the placement new changing the vtable not UB?

-Hal

Because standard say, that when You put new object into previous object (f.e. by placement new) the lifetime of an object has ended, and using previous pointer to it leads to UB.

see c++11 standard, 3.8 Object lifetime

From: "Piotr Padlewski" <prazek@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Rafael Espíndola" <rafael.espindola@gmail.com>, "David Blaikie" <dblaikie@gmail.com>, "Richard Smith"
<richard@metafoo.co.uk>, "cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>
Sent: Thursday, July 16, 2015 12:30:14 PM
Subject: Re: [cfe-dev] C++11 and enhacned devirtualization

Because standard say, that when You put new object into previous
object (f.e. by placement new) the lifetime of an object has ended,
and using previous pointer to it leads to UB.

see c++11 standard, 3.8 Object lifetime

Ah, right. Thanks!

-Hal

The problem with any sort of @llvm.assume-encoded information about memory contents is that C++ does actually allow you to replace objects in memory, up to and including stuff like:

{
  MyClass c;

  // Reuse the storage temporarily. UB to access the object through ‘c’ now.
  c.~MyClass();
  auto c2 = new (&c) MyOtherClass();

  // The storage has to contain a ‘MyClass’ when it goes out of scope.
  c2->~MyOtherClass();
  new (&c) MyClass();
}

The standard frontend devirtualization optimizations are permitted under a couple of different language rules, specifically that:
1. If you access an object through an l-value of a type, it has to dynamically be an object of that type (potentially a subobject).
2. Object replacement as above only “forwards” existing formal references under specific conditions, e.g. the dynamic type has to be the same, ‘const’ members have to have the same value, etc. Using an unforwarded reference (like the name of the local variable ‘c’ above) doesn’t formally refer to a valid object and thus has undefined behavior.

You can apply those rules much more broadly than the frontend does, of course; but those are the language tools you get.

John.

Right. Our current plan for modelling this is:

1) Change the meaning of the existing !invariant.load metadata (or add
another parallel metadata kind) so that it allows load-load forwarding
(even if the memory is not known to be unmodified between the loads) if:
  a) both loads have !invariant.load metadata with the same operand, and
  b) the pointer operands are the same SSA value (being must-alias is not
sufficient)
2) Add a new intrinsic "i8* @llvm.invariant.barrier(i8*)" that produces a
new pointer that is different for the purpose of !invariant.load. (Some
other optimizations are permitted to look through the barrier.)

In particular, "new (&c) MyOtherClass()" would be emitted as something like
this:

  %1 = call @operator new(size, %c)
  %2 = call @llvm.invariant.barrier(%1)
  call @MyOtherClass::MyOtherClass(%2)
  %vptr = load %2
  %known.vptr = icmp eq %vptr, @MyOtherClass::vptr, !invariant.load
!MyBaseClass.vptr
  call @llvm.assume(%known.vptr)

-- Richard

From: "Richard Smith" <richard@metafoo.co.uk>
To: "John McCall" <rjmccall@apple.com>
Cc: "Hal Finkel" <hfinkel@anl.gov>, "cfe-dev@cs.uiuc.edu Developers" <cfe-dev@cs.uiuc.edu>, "chandlerc"
<chandlerc@gmail.com>
Sent: Thursday, July 16, 2015 1:46:05 PM
Subject: Re: C++11 and enhacned devirtualization

>
> Hi everyone,
>
> C++11 added features that allow for certain parts of the class
> hierarchy to be closed, specifically the 'final' keyword and the
> semantics of anonymous namespaces, and I think we take advantage
> of these to enhance our ability to perform devirtualization. For
> example, given this situation:
>
> struct Base {
> virtual void foo() = 0;
> };
>
> void external();
> struct Final final : Base {
> void foo() {
> external();
> }
> };
>
> void dispatch(Base *B) {
> B->foo();
> }
>
> void opportunity(Final *F) {
> dispatch(F);
> }
>
> When we optimize this code, we do the expected thing and inline
> 'dispatch' into 'opportunity' but we don't devirtualize the call
> to foo(). The fact that we know what the vtable of F is at that
> callsite is not exploited. To a lesser extent, we can do similar
> things for final virtual methods, and derived classes in anonymous
> namespaces (because Clang could determine whether or not a class
> (or method) there is effectively final).
>
> One possibility might be to @llvm.assume to say something about
> what the vtable ptr of F might be/contain should it be needed
> later when we emit the initial IR for 'opportunity' (and then
> teach the optimizer to use that information), but I'm not at all
> sure that's the best solution. Thoughts?

The problem with any sort of @llvm.assume-encoded information about
memory contents is that C++ does actually allow you to replace
objects in memory, up to and including stuff like:

{
MyClass c;

// Reuse the storage temporarily. UB to access the object through ‘c’
now.
c.~MyClass();
auto c2 = new (&c) MyOtherClass();

// The storage has to contain a ‘MyClass’ when it goes out of scope.
c2->~MyOtherClass();
new (&c) MyClass();
}

The standard frontend devirtualization optimizations are permitted
under a couple of different language rules, specifically that:
1. If you access an object through an l-value of a type, it has to
dynamically be an object of that type (potentially a subobject).
2. Object replacement as above only “forwards” existing formal
references under specific conditions, e.g. the dynamic type has to
be the same, ‘const’ members have to have the same value, etc. Using
an unforwarded reference (like the name of the local variable ‘c’
above) doesn’t formally refer to a valid object and thus has
undefined behavior.

You can apply those rules much more broadly than the frontend does,
of course; but those are the language tools you get.

Right. Our current plan for modelling this is:

1) Change the meaning of the existing !invariant.load metadata (or
add another parallel metadata kind) so that it allows load-load
forwarding (even if the memory is not known to be unmodified between
the loads) if:
a) both loads have !invariant.load metadata with the same operand,
and
b) the pointer operands are the same SSA value (being must-alias is
not sufficient)

Why is being must-alias not sufficient? This seems scary.

-Hal

Hi everyone,

C++11 added features that allow for certain parts of the class hierarchy
to be closed, specifically the 'final' keyword and the semantics of
anonymous namespaces, and I think we take advantage of these to enhance our
ability to perform devirtualization. For example, given this situation:

struct Base {
  virtual void foo() = 0;
};

void external();
struct Final final : Base {
  void foo() {
    external();
  }
};

void dispatch(Base *B) {
  B->foo();
}

void opportunity(Final *F) {
  dispatch(F);
}

When we optimize this code, we do the expected thing and inline 'dispatch'
into 'opportunity' but we don't devirtualize the call to foo(). The fact
that we know what the vtable of F is at that callsite is not exploited. To
a lesser extent, we can do similar things for final virtual methods, and
derived classes in anonymous namespaces (because Clang could determine
whether or not a class (or method) there is effectively final).

Out of interest, is there data to suggest that this type of optimization
will happen often in practice?

> From: "Richard Smith" <richard@metafoo.co.uk>
> To: "John McCall" <rjmccall@apple.com>
> Cc: "Hal Finkel" <hfinkel@anl.gov>, "cfe-dev@cs.uiuc.edu Developers" <
cfe-dev@cs.uiuc.edu>, "chandlerc"
> <chandlerc@gmail.com>
> Sent: Thursday, July 16, 2015 1:46:05 PM
> Subject: Re: C++11 and enhacned devirtualization
>
>
>
>
>
>
> >
> > Hi everyone,
> >
> > C++11 added features that allow for certain parts of the class
> > hierarchy to be closed, specifically the 'final' keyword and the
> > semantics of anonymous namespaces, and I think we take advantage
> > of these to enhance our ability to perform devirtualization. For
> > example, given this situation:
> >
> > struct Base {
> > virtual void foo() = 0;
> > };
> >
> > void external();
> > struct Final final : Base {
> > void foo() {
> > external();
> > }
> > };
> >
> > void dispatch(Base *B) {
> > B->foo();
> > }
> >
> > void opportunity(Final *F) {
> > dispatch(F);
> > }
> >
> > When we optimize this code, we do the expected thing and inline
> > 'dispatch' into 'opportunity' but we don't devirtualize the call
> > to foo(). The fact that we know what the vtable of F is at that
> > callsite is not exploited. To a lesser extent, we can do similar
> > things for final virtual methods, and derived classes in anonymous
> > namespaces (because Clang could determine whether or not a class
> > (or method) there is effectively final).
> >
> > One possibility might be to @llvm.assume to say something about
> > what the vtable ptr of F might be/contain should it be needed
> > later when we emit the initial IR for 'opportunity' (and then
> > teach the optimizer to use that information), but I'm not at all
> > sure that's the best solution. Thoughts?
>
> The problem with any sort of @llvm.assume-encoded information about
> memory contents is that C++ does actually allow you to replace
> objects in memory, up to and including stuff like:
>
> {
> MyClass c;
>
> // Reuse the storage temporarily. UB to access the object through ‘c’
> now.
> c.~MyClass();
> auto c2 = new (&c) MyOtherClass();
>
> // The storage has to contain a ‘MyClass’ when it goes out of scope.
> c2->~MyOtherClass();
> new (&c) MyClass();
> }
>
> The standard frontend devirtualization optimizations are permitted
> under a couple of different language rules, specifically that:
> 1. If you access an object through an l-value of a type, it has to
> dynamically be an object of that type (potentially a subobject).
> 2. Object replacement as above only “forwards” existing formal
> references under specific conditions, e.g. the dynamic type has to
> be the same, ‘const’ members have to have the same value, etc. Using
> an unforwarded reference (like the name of the local variable ‘c’
> above) doesn’t formally refer to a valid object and thus has
> undefined behavior.
>
> You can apply those rules much more broadly than the frontend does,
> of course; but those are the language tools you get.
>
>
> Right. Our current plan for modelling this is:
>
>
> 1) Change the meaning of the existing !invariant.load metadata (or
> add another parallel metadata kind) so that it allows load-load
> forwarding (even if the memory is not known to be unmodified between
> the loads) if:
> a) both loads have !invariant.load metadata with the same operand,
> and
> b) the pointer operands are the same SSA value (being must-alias is
> not sufficient)

Why is being must-alias not sufficient? This seems scary.

Must-alias remains sufficient if the value is known to not have changed
between (but that's orthogonal to this metadata). The key property is that
given:

  %x = load %p, !invariant.load !a
  %q = call @llvm.invariant.barrier(%p)
  call @foo(%q)
  %y = load %q, !invariant.load !a

... we cannot deduce that %x and %y load the same value unless we can see
the definition of @foo (the value could have been overwritten by @foo).
However, it may still be reasonable to deduce that %p and %q are mustalias
(it's probably not particularly important to actually deduce that, though,
so we'd probably lose little if alias analysis can't look through the
intrinsic).

-Hal

invariant.load currently allows the load to be reordered pretty aggressively, so I think you need a new metadata.

Hmm. And all v-table loads have this invariant metadata?

I am concerned about mixing files with and without barriers.

John.