RFC: Emitting empty invariant group for vtable loads

Hi all,

I would like to propose a new way clang would decorate vtable loads in order to handle devirtualization better.

I’ve added llvm-dev also, because this can start a discussion about changing invariant.group to just invariant.

PDF version of this RFC can be found here:

https://drive.google.com/file/d/0B72TmzNsY6Z8ZmpOUnB5dDZfSFU/view?usp=sharing

Hi,
I would really like to hear some feedback about this.

Piotr

Hi Piotr,

I think makes sense. Modulo bitcasts, the invariant is identified by a particular pointer SSA value. Given that you can’t sensibly have two nonequivalent invariants associated with the same pointer SSA value simultaneously, there’s no need to also identify the invariant with a metadata string as well. When we need a new “identifier” for the pointed-to value, we get one using invariant.group.barrier.

-Hal

Hi Piotr,

I think makes sense. Modulo bitcasts, the invariant is identified by a
particular pointer SSA value. Given that you can't sensibly have two
nonequivalent invariants associated with the same pointer SSA value
simultaneously, there's no need to also identify the invariant with a
metadata string as well. When we need a new "identifier" for the pointed-to
value, we get one using invariant.group.barrier.

-Hal

What is your opinion about changing invariant.group to just invariant?

Piotr

You mean changing !invariant.group → !invariant and changing @llvm.invariant.group.barrier to @llvm.invariant.barrier? I don’t have a strong opinion, but I’m inclined to leave it as-is (saying “group” implies that there might be things outside the group, which is true in this case). -Hal

Hi Piotr,

I think makes sense. Modulo bitcasts, the invariant is identified by a
particular pointer SSA value. Given that you can't sensibly have two
nonequivalent invariants associated with the same pointer SSA value
simultaneously, there's no need to also identify the invariant with a
metadata string as well. When we need a new "identifier" for the pointed-to
value, we get one using invariant.group.barrier.

As I recall, the original motivation for the identifier was to support
cases where the "invariant" region's value changes and then changes back
(remember that invariant.group does not imply the storage doesn't change,
just that a particular set of loads and stores witness the same value):

struct A { void *vptr; /*...*/ };
struct B { void *vptr; /*...*/ };
union U { A a; B b; };

void f(U *u) {
  // #1, load A vptr
  load u->a.vptr, !invariant "A::vptr"
  // #2, change union member to B ...
  store u->b.vptr, !invariant "B::vptr"
  // #3, change union member back to A ...
  f(u); // performs: store u->a.vptr, !invariant "A::vptr"
  // #4, load A vptr again, can be forwarded from #1 but not from #2
  load u->a.vptr, !invariant "A::vptr"
}

However, I don't immediately see a way in which the C++ object model would
require us to track multiple distinct groups of loads and stores, so if it
isn't useful to be able to do that outside of C++ vptr / const member
invariant tracking, I think we can remove it.

Hi Piotr,

I think makes sense. Modulo bitcasts, the invariant is identified by a
particular pointer SSA value. Given that you can't sensibly have two
nonequivalent invariants associated with the same pointer SSA value
simultaneously, there's no need to also identify the invariant with a
metadata string as well. When we need a new "identifier" for the pointed-to
value, we get one using invariant.group.barrier.

As I recall, the original motivation for the identifier was to support
cases where the "invariant" region's value changes and then changes back
(remember that invariant.group does not imply the storage doesn't change,
just that a particular set of loads and stores witness the same value):

struct A { void *vptr; /*...*/ };
struct B { void *vptr; /*...*/ };
union U { A a; B b; };

void f(U *u) {
  // #1, load A vptr
  load u->a.vptr, !invariant "A::vptr"
  // #2, change union member to B ...
  store u->b.vptr, !invariant "B::vptr"
  // #3, change union member back to A ...
  f(u); // performs: store u->a.vptr, !invariant "A::vptr"
  // #4, load A vptr again, can be forwarded from #1 but not from #2
  load u->a.vptr, !invariant "A::vptr"
}

Excellent point, I forgot that one don't have to call placement new in

order to emplace different type.
I discussed it with Krzysztof and we belive the best way to solve it is to
introduce barriers before every use of union if it contains any polymorphic
class (recursively for each class in union). This might look very bad, but
assuming we will be able skip barrier for optimizations not relying on
!invariant.group, then it will not pessimize anything.

However, I don't immediately see a way in which the C++ object model would
require us to track multiple distinct groups of loads and stores, so if it
isn't useful to be able to do that outside of C++ vptr / const member
invariant tracking, I think we can remove it.

It should not make optimizations in LLVM any harder with group or without,
so I will postpone the removal of groups for some time.

In light of this, can we go back and reevaluate your list of proposed solutions? You pointed out two problems: 1. “This works well if the pointer type doesn’t change, but when it does, devirtualization might not happen” (e.g. a conversion to a base-class pointer/reference). 2. “The other problem is that when we combine 2 instructions with different invariant.group metadata, then we pick one of them, because for now we can have only single !invariant.group metadata.” Given what’s been said, I’m leaning toward favoring solution (b), “having sub invariant groups - like inheritance, so we could figure out that one group is subgroup of another” because 1) it seems to solve both problems and 2) it seems like we could get all of the non-trivial logic (for generation, comparison, and merging) from our TBAA implementation (we might even be able to directly reuse the same metadata). -Hal

Hi Piotr,

I think makes sense. Modulo bitcasts, the invariant is identified by a
particular pointer SSA value. Given that you can't sensibly have two
nonequivalent invariants associated with the same pointer SSA value
simultaneously, there's no need to also identify the invariant with a
metadata string as well. When we need a new "identifier" for the pointed-to
value, we get one using invariant.group.barrier.

As I recall, the original motivation for the identifier was to support
cases where the "invariant" region's value changes and then changes back
(remember that invariant.group does not imply the storage doesn't change,
just that a particular set of loads and stores witness the same value):

struct A { void *vptr; /*...*/ };
struct B { void *vptr; /*...*/ };
union U { A a; B b; };

void f(U *u) {
  // #1, load A vptr
  load u->a.vptr, !invariant "A::vptr"
  // #2, change union member to B ...
  store u->b.vptr, !invariant "B::vptr"
  // #3, change union member back to A ...
  f(u); // performs: store u->a.vptr, !invariant "A::vptr"
  // #4, load A vptr again, can be forwarded from #1 but not from #2
  load u->a.vptr, !invariant "A::vptr"
}

Excellent point, I forgot that one don't have to call placement new in

order to emplace different type.
I discussed it with Krzysztof and we belive the best way to solve it is to
introduce barriers before every use of union if it contains any polymorphic
class (recursively for each class in union). This might look very bad, but
assuming we will be able skip barrier for optimizations not relying on
!invariant.group, then it will not pessimize anything.

In light of this, can we go back and reevaluate your list of proposed
solutions? You pointed out two problems:
1. "This works well if the pointer type doesn’t change, but when it does,
devirtualization might not happen" (e.g. a conversion to a base-class
pointer/reference).
2. "The other problem is that when we combine 2 instructions with
different invariant.group metadata, then we pick one of them, because for
now we can have only single !invariant.group metadata."

Given what's been said, I'm leaning toward favoring solution (b), "having
sub invariant groups - like inheritance, so we could figure out that one
group is subgroup of another" because 1) it seems to solve both problems
and 2) it seems like we could get all of the non-trivial logic (for
generation, comparison, and merging) from our TBAA implementation (we might
even be able to directly reuse the same metadata).

-Hal

I was thinking about this, and as long as I think about it I am convincing
myself that different groups, or even sub groups that you have mentioned,
will have no use for const memory propagation. In the future we probably
would like to decorate const member loads and
stores with* !invariant.group* (or *!invariant* if groups will be removed
[if name will change :P]).

In order to decorate const members, we will need to insert barrier before
*every* use of union member.
So I was thinking about this example: Compiler Explorer but it
doesn't compile, but Krzysztof figured out this one:
Compiler Explorer (BTW it seems that clang is pretty bad at
cleaning out iostreams)

struct A {
    const int x;
};
union B {
    A a1;
    A a2;

    B() : a1{42} { }
    void switch2(int x) {
        a1.~A();
        new (&a2) A{x};
    }
};
int main() {
    B b;
    std::cout << b.a1.x << std::endl;
    b.switch2(50);
    std::cout << b.a2.x << std::endl;
}

As he checked in standard, it seems to be valid C++11 code by *9.5.4*.
So as you can see different groups won't gonna work in case of const
members inside union, like:

int main() {
    B b;
    std::cout << b.a1.x << std::endl;
    b.switch2(50);
    std::cout << b.a2.x << std::endl;
    b.switch2(51);
    std::cout << b.a2.x << std::end; // This is not 50, as we would
thought based on invariant.groups
}

I am not familiar with the TBAA implementation, but after digging more int
MSSA, I have other reason to not like the sub groups.
It might be pretty hard to handle them efficiently in many algorithms
including MSSA.

Hal, what are the things that you dislike about the emitting barriers
everywhere for unions?
It only seems to be very heavy, but I am pretty sure we can get to the
point where we will be able to
delete barriers when we can, and teach optimizations and analysis how to
skip them, so they will not pesimize any code.

*TL;DR *
different groups works for devirtualization, but it is seem to not be a
general way to go with "invariant" things.

Piotr

The original idea was that the metadata would indicate the class that introduced the vptr; that seems like it should work ok with multiple inheritance. Is there a problem with that approach? (We may still be able to remove it entirely, which could still be preferable.)

Hi all,

I would like to propose a new way clang would decorate vtable loads in
order to handle devirtualization better.

I've added *llvm-dev* also, because this can start a discussion about
changing invariant.group to just invariant.

PDF version of this RFC can be found here:

Google Drive: Sign-in
/view?usp=sharing

Background:

Initial old design:

http://lists.llvm.org/pipermail/cfe-dev/2015-July/044227.html

My talk from LLVM Dev Meeting

http://llvm.org/devmtg/2016-11/#talk6

The problem

Clang with -fstrict-vtable-pointers decorates vtable loads with metadata
corresponding to mangled pointer type name like:

void g(A& a){
   a.foo();
}

define void @_Z1gR1A(%struct.A* dereferenceable(8) %a) local_unnamed_addr
#0 {
entry:
%0 = bitcast %struct.A* %a to void (%struct.A*)***
%vtable = load void (%struct.A*)**, void (%struct.A*)*** %0,
!invariant.group !7
%1 = load void (%struct.A*)*, void (%struct.A*)** %vtable
tail call void %1(%struct.A* nonnull %a)
ret void
}

!7 = !{!"_ZTS1A"}

This works well if the pointer type doesn’t change, but when it does,
devirtualization might not happen like here:

struct A {
   A();
   virtual void foo();
};
struct B : A{
   B();
   virtual void foo();
};
void g(A& a){
   a.foo();
   a.foo();
}
void clobber(A&);
void f() {
     B b;
     clobber(b);
     g(b);
}

The other problem is that when we combine 2 instructions with different
invariant.group metadata, then we pick one of them, because for now we can
have only single !invariant.group metadata.

The solution

I had some initial ideas how it can be solved, like

   1.

   introducing multi invariant groups
   2.

   having sub invariant groups - like inheritance, so we could figure out
   that one group is subgroup of another
   3.

   decorating all loads with base pointer MD (doesn’t work with multiple
   inheritance)

The original idea was that the metadata would indicate the class that
introduced the vptr; that seems like it should work ok with multiple
inheritance. Is there a problem with that approach? (We may still be able
to remove it entirely, which could still be preferable.)

You are referring to the last point, right? So it is possible to decorate
with the base class type (the introducing vptr class, which will be only
one for every vfunction). But there is problem with it. Consider example:

struct A {
  virtual void foo();
};
struct B {
  virtual void bar();
};
struct C : A, B {
  virtual void foo();
  virtual void bar();
};
void test(C &c) {
  c.foo();

  c.bar();
}

Right now call to foo and bar will get the same invariant.group, so we will
end up with only one load (the *!invariant.loads* - doesn't play any role
here):

define void @_Z4testR1C(%struct.C* dereferenceable(16) %c) {entry:
  %0 = bitcast %struct.C* %c to void (%struct.C*)***
  %vtable = load void (%struct.C*)**, void (%struct.C*)*** %0,
*!invariant.group* !7
  %1 = load void (%struct.C*)*, void (%struct.C*)** %vtable, !invariant.load !8
  tail call void %1(%struct.C* nonnull %c)

  %vfn2 = getelementptr inbounds void (%struct.C*)*, void
(%struct.C*)** %vtable, i64 1
  %2 = load void (%struct.C*)*, void (%struct.C*)** %vfn2, !invariant.load !8
  tail call void %2(%struct.C* nonnull %c)
  ret void
}

!7 = !{!"_ZTS1C"}
!8 = !{}

If we would decorate vtable load from c.foo() with !invariant.group
!"typeA", and from c.bar() with !"typeB", then

the second one won't be removed:

define void @_Z4testR1C(%struct.C* dereferenceable(16) %c) {entry:
  %0 = bitcast %struct.C* %c to void (%struct.C*)***
  %vtable = load void (%struct.C*)**, void (%struct.C*)*** %0,
*!invariant.group* !0
  %1 = load void (%struct.C*)*, void (%struct.C*)** %vtable, !invariant.load !2
  tail call void %1(%struct.C* nonnull %c)

  %vtable1 = load void (%struct.C*)**, void (%struct.C*)*** %0,
*!invariant.group* !1
  %vfn2 = getelementptr inbounds void (%struct.C*)*, void
(%struct.C*)** %vtable1, i64 1
  %2 = load void (%struct.C*)*, void (%struct.C*)** %vfn2, !invariant.load !2
  tail call void %2(%struct.C* nonnull %c)
  ret void
}

!0 = !{!"_ZTS1A"}
!1 = !{!"_ZTS1B"}
!2 = {}

Piotr

I don’t see why. TBAA is handled just fine (including in MSSA); it is just a more-complicated pair-compatibility rule. I was thinking about Richard’s point about the changing-and-then-changing-back scenario, plus I wonder how these barriers affect optimizations. We don’t seem to look through invariant_group_barrier in stripPointerCasts, BasicAA, etc. because int_invariant_group_barrier does not have Returned<0> on the definition in include/llvm/IR/Intrinsics.td. We can fix that, but we need to be very careful not to remove the dependence on the barrier’s return value, and maybe that’s the right solution, but even so, it just makes me wonder if there’s a way without the barriers. Thanks again, Hal

Hi Piotr,

I think makes sense. Modulo bitcasts, the invariant is identified by a
particular pointer SSA value. Given that you can't sensibly have two
nonequivalent invariants associated with the same pointer SSA value
simultaneously, there's no need to also identify the invariant with a
metadata string as well. When we need a new "identifier" for the pointed-to
value, we get one using invariant.group.barrier.

As I recall, the original motivation for the identifier was to support
cases where the "invariant" region's value changes and then changes back
(remember that invariant.group does not imply the storage doesn't change,
just that a particular set of loads and stores witness the same value):

struct A { void *vptr; /*...*/ };
struct B { void *vptr; /*...*/ };
union U { A a; B b; };

void f(U *u) {
  // #1, load A vptr
  load u->a.vptr, !invariant "A::vptr"
  // #2, change union member to B ...
  store u->b.vptr, !invariant "B::vptr"
  // #3, change union member back to A ...
  f(u); // performs: store u->a.vptr, !invariant "A::vptr"
  // #4, load A vptr again, can be forwarded from #1 but not from #2
  load u->a.vptr, !invariant "A::vptr"
}

Excellent point, I forgot that one don't have to call placement new in

order to emplace different type.
I discussed it with Krzysztof and we belive the best way to solve it is
to introduce barriers before every use of union if it contains any
polymorphic class (recursively for each class in union). This might look
very bad, but assuming we will be able skip barrier for optimizations not
relying on !invariant.group, then it will not pessimize anything.

In light of this, can we go back and reevaluate your list of proposed
solutions? You pointed out two problems:
1. "This works well if the pointer type doesn’t change, but when it
does, devirtualization might not happen" (e.g. a conversion to a base-class
pointer/reference).
2. "The other problem is that when we combine 2 instructions with
different invariant.group metadata, then we pick one of them, because for
now we can have only single !invariant.group metadata."

Given what's been said, I'm leaning toward favoring solution (b), "having
sub invariant groups - like inheritance, so we could figure out that one
group is subgroup of another" because 1) it seems to solve both problems
and 2) it seems like we could get all of the non-trivial logic (for
generation, comparison, and merging) from our TBAA implementation (we might
even be able to directly reuse the same metadata).

-Hal

I was thinking about this, and as long as I think about it I am convincing
myself that different groups, or even sub groups that you have mentioned,
will have no use for const memory propagation. In the future we probably
would like to decorate const member loads and
stores with* !invariant.group* (or *!invariant* if groups will be removed
[if name will change :P]).

In order to decorate const members, we will need to insert barrier before
*every* use of union member.
So I was thinking about this example: Compiler Explorer but it
doesn't compile, but Krzysztof figured out this one:
Compiler Explorer (BTW it seems that clang is pretty bad at
cleaning out iostreams)

struct A {
    const int x;
};
union B {
    A a1;
    A a2;

    B() : a1{42} { }
    void switch2(int x) {
        a1.~A();
        new (&a2) A{x};
    }
};
int main() {
    B b;
    std::cout << b.a1.x << std::endl;
    b.switch2(50);
    std::cout << b.a2.x << std::endl;
}

As he checked in standard, it seems to be valid C++11 code by *9.5.4*.
So as you can see different groups won't gonna work in case of const
members inside union, like:

int main() {
    B b;
    std::cout << b.a1.x << std::endl;
    b.switch2(50);
    std::cout << b.a2.x << std::endl;
    b.switch2(51);
    std::cout << b.a2.x << std::end; // This is not 50, as we would thought based on invariant.groups
}

I am not familiar with the TBAA implementation, but after digging more int
MSSA, I have other reason to not like the sub groups.
It might be pretty hard to handle them efficiently in many algorithms
including MSSA.

I don't see why. TBAA is handled just fine (including in MSSA); it is just
a more-complicated pair-compatibility rule.

Hal, what are the things that you dislike about the emitting barriers
everywhere for unions?

I was thinking about Richard's point about the changing-and-then-changing-back
scenario, plus I wonder how these barriers affect optimizations. We don't
seem to look through invariant_group_barrier in stripPointerCasts, BasicAA,
etc. because int_invariant_group_barrier does not have Returned<0> on the
definition in include/llvm/IR/Intrinsics.td. We can fix that, but we need
to be very careful not to remove the dependence on the barrier's return
value, and maybe that's the right solution, but even so, it just makes me
wonder if there's a way without the barriers.

Thanks again,
Hal

So we need barriers anyway, and the different groups are not sufficient

for optimizations like const propagation of const member (like in
Krzysztof's example with unions).
Right now no optimization skips invariant barriers (no optimizations uses
the fact that barrier returns the same pointer), but many treat barrier as
function not modyfing any memory.
I am working on the patch for MSSA to use the same memory use as the
dominating load/store etc. With this it will skip the invariant.groups at
least for MSSA.
We will still need to handle it in many passes, but I am sure we will be
able to make it work so it won't pesimize anything. I don't see the
*-fstrict-vtable-pointes* being default option without it.

Piotr