Missed devirtualization opportunities

I took the output of clang, simplified it, and used it as a testbase.
Essentially, there is one class with one virtual function; I create an
instance and call the virtual method all in one function:

; The TestVirtual class vtbl
@classvtbl.TestVirtual = constant %classvtbltype.TestVirtual {
  ; Pointers to the virtual methods for the TestVirtual class
  ...
}
; ...

define i32 @main() nounwind {
; create the instance
%pinstance = alloca %class.TestVirtual

; %ppVtbl becomes a pointer to the instance's vtbl pointer.
%ppVtbl = getelementptr %class.TestVirtual* %pinstance, i64 0, i32 0

; Populate the instance's vtbl pointer. After this, the instance is
constructed.
store %classvtbltype.TestVirtual* @classvtbl.TestVirtual,
%classvtbltype.TestVirtual** %ppVtbl

; If this next call is commented out, the virtual method call is
devirtualized by -std-compile-opts
%puts = call i32 @puts(i8* getelementptr inbounds ([2 x i8]* @str, i32
0, i32 0))

; Call the virtual function.

; load the instance's vtbl pointer.
%pVtbl1 = load %classvtbltype.TestVirtual** %ppVtbl

; load the function pointer from the vtbl
%ppVfn1 = getelementptr %classvtbltype.TestVirtual* %pVtbl1, i64 0, i32 0
%pVfn1 = load void (%class.TestVirtual*)** %ppVfn1

; call the virtual method.
call void %pVfn1(class.TestVirtual* %pinstance)

; ...
}

(clang put in a bunch of bitcasts and stuck the vtbl itself into an
array; ripping all that out made no difference)

Tracing through MemoryDependenceAnalysis and BasicAA, it turns out
that the store into %ppVtbl (a pointer to the instance's vtbl pointer)
is clobbered by the call to @puts because PointerMayBeCaptured gets
called on it when the @puts call is encountered and returns true
because %pinstance is passed through %pVfn1 to make the virtual method
call, and the call to %pVfn1 (unlike the actual function that %pVfn1
must point to) does not declare the 'this' parameter nocapture.

The devirtualization could have happened anyway if one of the
following had happened:

1. PointerMayBeCaptured was able to determine that the possible
capture happened *after* the call to @puts. PointerMayBeCaptured does
not expose a parameter for the instruction we're trying to
getModRefInfo on (the @puts call), and even if it did, I don't know of
a fast way to determine in the general case whether one instruction
always executed before another (unless they're in the same block, and
even then you have to scan the block). But this would be a starting
point for a more complete escape analysis, which could also be used to
help eliminate gc allocations in many cases.

2. Some analysis existed that could tell us the complete set of
functions that %pVfn could point to, and we could then determine that
all possible targets declared that instance pointer nocapture, and
thus the instance is never captured. Adding this bit of analysis to
the CallGraph pass would also allow several other passes to handle
indirect calls in cases where the function pointer in question could
be guaranteed to point to one of a finite set of targets. However,
this would involve some overhead for the CallGraph construction.

3. The front-end, recognizing that scribbling on an instance's vtbl
pointer has undefined results, eliminated the loads of the vtbl
pointer and replaced them with @classvtbl.TestVirtual. This would
allow devirtualization within a function at least, but (I think) would
do less to allow analysis to spot devirtualization opportunities
across functions. (Although ArgPromotion could arrange to have the
vtbl pointer passed in separately, and then inlining and/or partial
specialization could be made to see that it's a pointer to a constant
and thus add in the IndirectCallBonus)

At this point I'm looking for suggestions and feedback. I think
implementing (1) and (2) would go a long way toward making several
other transformations safely more aggressive, but would involve
noticeable (unacceptable?) overhead. Does what I'm looking for
already exist? Have they already been considered and rejected due to
the overhead involved?

There are always going to be cases that can only be deduced with
language knowledge. We already do some frontend de-virtualization;
this may just be a missing case. Can you post your test?

John.

I compiled the attached c++ code to bitcode using clang++ -O3, then
ran it through opt -std-compile-opts. I got a reload of the vtbl
pointer after the printf call. The compiler is allowed to cache the
vtbl pointer as soon as the object is constructed or passed into a
function, right?

TestVirtual.cpp (366 Bytes)

TestVirtual.h (97 Bytes)

Yes. This is [basic.life]p7, which lists the conditions under which a pointer
or reference to an object are considered to automatically forward to a new
object that’s created at that location (as opposed to formally still pointing
at the destroyed object). One key constraint is that the objects have to be
of identical type, i.e. their pointers would have to match exactly. So this is
a fair optimization as long as we make sure that we keep pointers separate.

For example, consider the following test case:
struct A { virtual void foo(); };
struct B : A { virtual void foo(); };
void test() {
A *a = new A();
a->foo(); // 1
a->foo(); // 2: can assume object type is still A
a->~A();
A *b = new (a) B();
a->foo(); // 3: illegal use after end of lifetime
b->foo(); // 4: okay, can assume object type is still B
}
Here we happen to be able to prove that ‘a’ and ‘b’ have identical values,
but they nonetheless have different properties for this optimization because
‘a’ doesn’t meet the criteria for having been forwarded, whereas ‘b’ points to
the new object.

Right now, we don’t have any good machinery for doing this kind of
language-specific optimization in the backend. Doing it in the front-end
for your test case would require inter-statement analysis, which we’ve tried
to avoid in clang.

So I think the best bet for this optimization for now is to do it based on the
knowledge that printf doesn’t modify that particular argument.

John.

A better way for a front-end to declare that vtbl-ptr-hacking is not
expected and not supported is for it to emit llvm.invariant.start and
llvm.invariant.end calls for it. I tried putting that into my test
code as well, and found that support for it could use some work. In
particular, MemoryDependenceAnalysis seems to be the main consumer of
it, and it only recognizes it if the entire invariant region is
between a pointer store and a pointer load... it scans backward from
the load, looking for llvm.invariant.end and starts skipping
everything that precedes it until it finds a matching
llvm.invariant.begin. It seems to me that the generally expected way
for these markers to behave is to allow the region cover the entire
range of instructions during which the invariance holds, including any
loads of the memory, which MemoryDependenceAnalysis does *not* support
at present. At any rate, having the front-end add those markers for
the instance vtbl pointer shouldn't hurt anything, and should enable
devirtualization in a variety of situations, including some
cross-function scenarios.

In your example above, the llvm.lifetime.start and llvm.lifetime.end
would also be useful. llvm.lifetime.end would go after the destructor
call, and llvm.lifetime.start would go right before the constructor
calls. MemoryDependenceAnalysis also has code to look for those
markers, but I haven't tested it.

It looks like my best plan of attack given what parts of the system I
understand is to improve the handling of those markers. It looks to
me like those markers are exactly what clang needs to express vptr
value lifetime & invariance, allowing better back-end optimization and
avoiding inter-statement analysis.

Some of us were talking about this apropos your earlier post.
@llvm.invariant.start/end aren't appropriate, because the memory *isn't*
invariant; the user is totally allowed to destruct the object in-place and
create a new object there. The only thing the standard tells us is that
old references and pointers are only considered to be automatically
"forwarded" to the new object (i.e. aren't just invalid) if the types match.
So we're allowed to assume that a pointer or reference validly formed
to an object of dynamic type T will always refer to an object of that
dynamic type.

One possibility is to attach metadata to the object pointer somehow,
saying that it's to an object of a certain dynamic type; it would then
be simple for a language-specific optimization to fold vtable loads
from such pointers into the appropriate vtable pointer, which would
in turn be subject to additional folding. The chief obstacle here is
preventing other optimizations from merging two tagged pointers
in a way that leaves only one of the tags in place, which could break
semantics. To handle this, we could introduce an opaque readnone
no-op intrinsic (@llvm.bless?) after constructor calls (its return value
being the result of the new-expression) and before destructor calls
(its return value becoming the dtor's 'this' pointer). That, however,
has the disadvantage of interfering with memory optimizations
between the ctor, lifetime, and dtor; but maybe those optimizations
could be taught to look through blessings.

(kudos to djg and sabre for the discussion earlier)

John.

John McCall wrote:

A better way for a front-end to declare that vtbl-ptr-hacking is not
expected and not supported is for it to emit llvm.invariant.start and
llvm.invariant.end calls for it.

Some of us were talking about this apropos your earlier post.
@llvm.invariant.start/end aren't appropriate, because the memory *isn't*
invariant; the user is totally allowed to destruct the object in-place and
create a new object there.

Emit an invariant.end before the destructor and invariant.start before the constructor. Are there cases where you can't tell (in Clang) whether the callee is the destructor? For example, it is forbidden to take the address of a destructor in C++.

Nick

   The only thing the standard tells us is that

It's true that constructors and destructors can only be called directly. They can,
however, be called directly by a different function, which I may or may not inline.
I'm not certain how these intrinsics are supposed to pair up in that case, or what
happens to correctness if they don't.

John.

John McCall wrote:

John McCall wrote:

A better way for a front-end to declare that vtbl-ptr-hacking is not
expected and not supported is for it to emit llvm.invariant.start and
llvm.invariant.end calls for it.

Some of us were talking about this apropos your earlier post.
@llvm.invariant.start/end aren't appropriate, because the memory *isn't*
invariant; the user is totally allowed to destruct the object in-place and
create a new object there.

Emit an invariant.end before the destructor and invariant.start before the constructor.
Are there cases where you can't tell (in Clang) whether the callee is the destructor?
For example, it is forbidden to take the address of a destructor in C++.

It's true that constructors and destructors can only be called directly. They can,
however, be called directly by a different function, which I may or may not inline.
I'm not certain how these intrinsics are supposed to pair up in that case, or what
happens to correctness if they don't.

The design of invariant.start/end is slightly wrong in that the first parameter of the end is the start. As you point out, this forces invariants to be local. I think we could simply remove that argument from invariant.end and the return from invariant.start, and rely on alias analysis to determine whether the two must/may/couldn't refer to the same memory.

In general, the memory use markers were added specifically to address difficulties with devirtualization in the middle end or at LTO time. Feel free to adjust their semantics as needed; their implementation at this point is pretty minimal and they should be easy to work with.

Nick

So does that mean that we're allowed to assume that, given an "old"
pointer to that memory, the vtbl-ptr slot hasn't changed? Which
allows us to devirtualize:

Base* pT = GetMyObjectThatHappensToBeT();
// ...
// stuff that we can't tell what it does to our memory
// ...
pT->A();
// more mysterious stuff
pT->B();
// etc.

as long as we can tell that a given pointer is actually pT and not
another pointer copied from it or aliased to it?

For that matter, when you can find the *last* use of pT, you should be
able to put an llvm.immutable.end marker right after it, right? Or
have I forgotten something else in the standard? (entirely possible,
it's enormous!)

Do you do any processing of an IR Function after you've compiled it in clang?

The design of invariant.start/end is slightly wrong in that the first
parameter of the end is the start. As you point out, this forces invariants
to be local. I think we could simply remove that argument from invariant.end
and the return from invariant.start, and rely on alias analysis to determine
whether the two must/may/couldn't refer to the same memory.

In general, the memory use markers were added specifically to address
difficulties with devirtualization in the middle end or at LTO time. Feel
free to adjust their semantics as needed; their implementation at this point
is pretty minimal and they should be easy to work with.

Nick

If they end up being local-only, we could always have a pass that
walks the callgraph and propagates invariant regions into callers.

A better way for a front-end to declare that vtbl-ptr-hacking is not
expected and not supported is for it to emit llvm.invariant.start and
llvm.invariant.end calls for it.

Some of us were talking about this apropos your earlier post.
@llvm.invariant.start/end aren't appropriate, because the memory *isn't*
invariant; the user is totally allowed to destruct the object in-place and
create a new object there. The only thing the standard tells us is that
old references and pointers are only considered to be automatically
"forwarded" to the new object (i.e. aren't just invalid) if the types match.
So we're allowed to assume that a pointer or reference validly formed
to an object of dynamic type T will always refer to an object of that
dynamic type.

So does that mean that we're allowed to assume that, given an "old"
pointer to that memory, the vtbl-ptr slot hasn't changed? Which
allows us to devirtualize:

Yes, and also (POD) const member variables.

Base* pT = GetMyObjectThatHappensToBeT();
// ...
// stuff that we can't tell what it does to our memory
// ...
pT->A();
// more mysterious stuff
pT->B();
// etc.

as long as we can tell that a given pointer is actually pT and not
another pointer copied from it or aliased to it?

Right, which is another reason I'm skeptical of (the current design of)
the invariant intrinsics; I think using them will make the analysis
much harder. But they have other benefits.

John.

So does that mean that we're allowed to assume that, given an "old"
pointer to that memory, the vtbl-ptr slot hasn't changed? Which
allows us to devirtualize:

Base* pT = GetMyObjectThatHappensToBeT();
// ...
// stuff that we can't tell what it does to our memory
// ...
pT->A();
// more mysterious stuff
pT->B();
// etc.

as long as we can tell that a given pointer is actually pT and not
another pointer copied from it or aliased to it?

I should clarify my last response to say that copies of a pointer point
to the same object. When a new object is created at that storage, all
the old pointers may or may not forward; only the result of the
new-expression actually automatically points to the new object.
But we probably shouldn't rewrite invalid uses of stale pointers to
unreachable just because we can. :slight_smile:

For that matter, when you can find the *last* use of pT, you should be
able to put an llvm.immutable.end marker right after it, right? Or
have I forgotten something else in the standard? (entirely possible,
it's enormous!)

I'm not sure why we would. Any workable approach based on invariant
ranges will need to allow open ranges.

Do you do any processing of an IR Function after you've compiled it in clang?

Nothing this sophisticated.

John.

For that matter, when you can find the *last* use of pT, you should be
able to put an llvm.immutable.end marker right after it, right? Or
have I forgotten something else in the standard? (entirely possible,
it's enormous!)

I'm not sure why we would. Any workable approach based on invariant
ranges will need to allow open ranges.

Starting from the assumption that every use of pT is valid, it follows
that at every use of pT, pT->_vtblptr is invariant. After the last
use, we can no longer assume that pT->_vtblptr is invariant, so the
invariant region ends. That doesn't mean that pT->_vtbl is guaranteed
to change, just that we can no longer assume that it hasn't.

A better way for a front-end to declare that vtbl-ptr-hacking is not
expected and not supported is for it to emit llvm.invariant.start and
llvm.invariant.end calls for it.

Some of us were talking about this apropos your earlier post.
@llvm.invariant.start/end aren’t appropriate, because the memory isn’t
invariant; the user is totally allowed to destruct the object in-place and
create a new object there. The only thing the standard tells us is that
old references and pointers are only considered to be automatically
“forwarded” to the new object (i.e. aren’t just invalid) if the types match.
So we’re allowed to assume that a pointer or reference validly formed
to an object of dynamic type T will always refer to an object of that
dynamic type.

So does that mean that we’re allowed to assume that, given an “old”
pointer to that memory, the vtbl-ptr slot hasn’t changed?

You’re right, I hadn’t thought this through. The whole point of making them local is to say that “I’m sure these callees won’t modify that memory” regardless of what functions actually get called, even indirectly. We can’t know that they won’t modify the vptr in advance, so invariant doesn’t work here. Making it non-local just means that we would need to know the static call graph, which we don’t because we haven’t devirtualized yet so all the calls are indirect.

Nick

You're right, I hadn't thought this through. The whole point of making them
local is to say that "I'm sure these callees won't modify that memory"
regardless of what functions actually get called, even indirectly. We can't
know that they won't modify the vptr in advance, so invariant doesn't work
here. Making it non-local just means that we would need to know the static
call graph, which we don't because we haven't devirtualized yet so all the
calls are indirect.
Nick

So that means you're now saying that llvm.invariant.end ought to be
left the way it is, right?

I've been thinking about how to better take the invariant regions into
account, and it quickly became clear that it would be easier to handle
if we scanned forward instead of backwards. Any way you do it,
there'd be a lot more scanning in memdep than there is at present...
you can't tell just by looking at the instructions between a load and
a preceding potential clobber that those two instructions are actually
in an invariant region.

Which led me to wonder if perhaps instead of tracing individual
pointers on request with memdep, we should maybe create an
generalization of mem2reg to handle all pointers mentioned in a
function. A clobber outside of an invariant region would "unassign"
the location and all of its may-aliases. A store would "assign" the
location and all of its must-aliases and "unassign" all of its
may-aliases. A load would then use the SSA assignment if one is
unquestionably available, or remain in place as a load and then
"assign" the load result back to the pointer. No attempt would be
made to remove stores during this pass. AliasSetTracker would be
helpful as soon as I could get it to optionally keep all of its sets
"must alias" instead of downgrading them to "may alias" as soon as a
may-alias of an existing set showed up. Then I'd use a "must alias"
tracker and a "may alias" tracker so I could quickly answer "what are
all of this pointer's MustAlias's?" and "what are all of this
pointer's MayAlias's?".

After such a pass runs, the loads that now get fed to memdep are
already resolved as values whenever they can be and don't have to be
traced again or cached separately.

Kenneth Uildriks wrote:

You're right, I hadn't thought this through. The whole point of making them
local is to say that "I'm sure these callees won't modify that memory"
regardless of what functions actually get called, even indirectly. We can't
know that they won't modify the vptr in advance, so invariant doesn't work
here. Making it non-local just means that we would need to know the static
call graph, which we don't because we haven't devirtualized yet so all the
calls are indirect.
Nick

So that means you're now saying that llvm.invariant.end ought to be
left the way it is, right?

I have no idea how to make use of llvm.invariant to help devirtualization. If no use or other use case for them can be found, maybe they should be removed.

I've been thinking about how to better take the invariant regions into
account, and it quickly became clear that it would be easier to handle
if we scanned forward instead of backwards.

Right, there's been talk of a "reverse memdep". Owen?

   Any way you do it,

there'd be a lot more scanning in memdep than there is at present...
you can't tell just by looking at the instructions between a load and
a preceding potential clobber that those two instructions are actually
in an invariant region.

Which led me to wonder if perhaps instead of tracing individual
pointers on request with memdep, we should maybe create an
generalization of mem2reg to handle all pointers mentioned in a
function. A clobber outside of an invariant region would "unassign"
the location and all of its may-aliases. A store would "assign" the
location and all of its must-aliases and "unassign" all of its
may-aliases. A load would then use the SSA assignment if one is
unquestionably available, or remain in place as a load and then
"assign" the load result back to the pointer. No attempt would be
made to remove stores during this pass. AliasSetTracker would be
helpful as soon as I could get it to optionally keep all of its sets
"must alias" instead of downgrading them to "may alias" as soon as a
may-alias of an existing set showed up. Then I'd use a "must alias"
tracker and a "may alias" tracker so I could quickly answer "what are
all of this pointer's MustAlias's?" and "what are all of this
pointer's MayAlias's?".

After such a pass runs, the loads that now get fed to memdep are
already resolved as values whenever they can be and don't have to be
traced again or cached separately.

This is complex enough that I would need to think about it some more.

Nick

Hi Nick,

I have no idea how to make use of llvm.invariant to help
devirtualization. If no use or other use case for them can be found,
maybe they should be removed.

the dragonegg plugin uses llvm.invariant to help the code generators
optimize the code it produces for builtin_init_trampoline. It writes
some values into memory which is then constant forever after. Because
pointers to this memory are passed into all kinds of functions, the
optimizers don't understand that the values are constant without help,
thus the use of llvm.invariant. It doesn't work very well, but it is
still better than not doing it at all.

Ciao,

Duncan.

Apply invariant to the vtpr as long as the corresponding instance
pointer is in use within a function. With an invariant vtpr (and
better invariant support, and a front-end that uses it),
devirtualization happens more or less automatically with
-std-compile-opts.

Do the same wherever an instance pointer is passed. Mark its
corresponding vtpr invariant from the beginning of the entry block
until right after the last use of the instance pointer. Then a few
simple improvements to inlining & partial specialization will give you
some interprocedural devirtualization.

Anyway, you convinced me a few messages ago that invariant.end really
should be left the way it is. Especially since assuming invariance
hasn't ended when it has is unsafe... we *must* be able to match an
invariance start to its corresponding invariance end in all cases.