Dereferenceable load semantics & LICM

Hi all,
I have a question about dereferenceable metadata on load instruction. I have a patch (https://reviews.llvm.org/D31539) for LICM that hoists loads with !invariant.group.
The motivation example is devirtualization:

struct A {
virtual void foo();
};
int bar();
void indirect(A &a) {
while(bar())
a.foo();
}

With -O2 -fstrict-vtable-pointers we get:

define void @hoist(%struct.A* dereferenceable(8)) {
entry:
%call1 = tail call i32 @bar()
%tobool2 = icmp eq i32 %call1, 0
br i1 %tobool2, label %while.end, label %while.body.lr.ph

while.body.lr.ph: ; preds = %entry
%b = bitcast %struct.A* %0 to void (%struct.A*)***

br label %while.body

while.body: ; preds = %while.body.lr.ph, %while.body
%vtable = load void (%struct.A*), void (%struct.A*)* %b, align 8, !invariant.group !0
%1 = load void (%struct.A*), void (%struct.A)** %vtable, align 8, !invariant.load !0
tail call void %1(%struct.A* %0)
%call = tail call i32 @bar()
%tobool = icmp eq i32 %call, 0
br i1 %tobool, label %while.end.loopexit, label %while.body

while.end.loopexit: ; preds = %while.body
br label %while.end

while.end: ; preds = %while.end.loopexit, %entry
ret void
}

We know that the load of vptr and virtual function will not change in this loop, which is indicated by !invariant.group.
Hoisting invariant.group load is legal because %b is dereferenceable, but hoisting next load is illegal by LICM because it can’t prove that %vtable is dereferenceable.
But if I add dereferenceable metadata on vtable load like

%vtable = load void (%struct.A*), void (%struct.A*)* %b, align 8, !invariant.group !0, !dereferenceable !1

!1 = !{i64 8}

Then it doesn’t help either, because LICM drops !dereferencealbe metadata when hoisting (and adding it would be invalid, because we don’t know if it is also dereferenceable in hoisted block).

On the other hand, after performing my LICM of !invariant.group load, GVN hoists the second load and I am not sure why it is legal then.

Any ideas?

Piotr

Hi Piotr,

Hi all,
I have a question about dereferenceable metadata on load instruction. I
have a patch (https://reviews.llvm.org/D31539) for LICM that hoists loads
with !invariant.group.
The motivation example is devirtualization:
...
[snip]

On the other hand, after performing my LICM of !invariant.group load, GVN
hoists the second load and I am not sure why it is legal then.

I suspect what's going on is that we first canonicalize the loop to:

if (precondition) {
do {
vptr = load vtable;
fptr = *vptr;
...
} while (backedge_condition);
}

after which it is safe to transform the program to (modulo aliasing):

if (precondition) {
vptr = load vtable;
fptr = *vptr;
do {
...
} while (backedge_condition);
}

since the we moved a load from a ("strongly") postdominating location.
We know that once we were in the preheader we know we're definitely
going to execute the vptr and fptr loads, so they better be
dereferenceable. In other words, we're "exploiting undefined
behavior" here.

-- Sanjoy

Hi Piotr,

> Hi all,
> I have a question about dereferenceable metadata on load instruction. I
> have a patch (https://reviews.llvm.org/D31539) for LICM that hoists
loads
> with !invariant.group.
> The motivation example is devirtualization:
> ...
> [snip]
>
> On the other hand, after performing my LICM of !invariant.group load, GVN
> hoists the second load and I am not sure why it is legal then.

I suspect what's going on is that we first canonicalize the loop to:

if (precondition) {
  do {
    vptr = load vtable;
    fptr = *vptr;
    ...
  } while (backedge_condition);
}

after which it is safe to transform the program to (modulo aliasing):

if (precondition) {
  vptr = load vtable;
  fptr = *vptr;
  do {
    ...
  } while (backedge_condition);
}

since the we moved a load from a ("strongly") postdominating location.
We know that once we were in the preheader we know we're definitely
going to execute the vptr and fptr loads, so they better be
dereferenceable. In other words, we're "exploiting undefined
behavior" here.

Yes, this appears to be exactly the case - dominance tells us they must
execute at least once in both situations.

Thanks guys!
I have another small example:

struct A {
virtual void foo();
};

void indirect(A *a, int n) {

for (int i = 0 ; i < n; i++)
a->foo();
}

generates:

define void @_Z8indirectP1Ai(%struct.A* %a, i32 %n) local_unnamed_addr #0 {
entry:
%cmp3 = icmp sgt i32 %n, 0
br i1 %cmp3, label %for.body.lr.ph, label %for.cond.cleanup

for.body.lr.ph: ; preds = %entry
%0 = bitcast %struct.A* %a to void (%struct.A*)***
br label %for.body

for.cond.cleanup.loopexit: ; preds = %for.body
br label %for.cond.cleanup

for.cond.cleanup: ; preds = %for.cond.cleanup.loopexit, %entry
ret void

for.body: ; preds = %for.body, %for.body.lr.ph
%i.04 = phi i32 [ 0, %for.body.lr.ph ], [ %inc, %for.body ]
%vtable = load void (%struct.A*), void (%struct.A*)* %0, align 8, !tbaa !4, !invariant.group !7
%1 = load void (%struct.A*), void (%struct.A)** %vtable, align 8, !invariant.load !8
tail call void %1(%struct.A* %a)
%inc = add nuw nsw i32 %i.04, 1
%exitcond = icmp eq i32 %inc, %n
br i1 %exitcond, label %for.cond.cleanup.loopexit, label %for.body
}

So now a is not dereferenceable and I was counting that because we have loop preheader that only executes if the loop executes,
it will be hoisted. for invariant.group load the isGuaranteedToExecute returns with (!SafetyInfo->HeaderMayThrow).
The isGuaranteedToTransferExecutionToSuccessor called on the %vtable instruction returns true,
but few instructions later, the call of %1 may throw.
Do I understand it correctly, that it is legal to do the hoist because all of the instructions above %vtable does not throw?
Are there any plans to fix it in the future? The fix doesn’t seem hard to write and I can do it, but I am not sure if it won’t be too expensive.

Piotr

PS: please review the https://reviews.llvm.org/D31539

Hi Piotr,

[snip]
Do I understand it correctly, that it is legal to do the hoist because all
of the instructions above %vtable does not throw?

Yes, I think you're right. HeaderMayThrow is a conservative
approximation, and the conservativeness is biting us here.

Are there any plans to fix it in the future? The fix doesn't seem hard to

Not to my knowledge.

write and I can do it, but I am not sure if it won't be too expensive.

Maybe we can do it (relatively) cheaply:

- Remember the first throwing instruction in the header, instead of a
boolean, in LoopSafetyInfo

- In hoistRegion, remember if you've seen the first throwing
instruction yet

- Pass the above as a boolean parameter to isGuaranteedToExecute, and
instead of
if (Inst.getParent() == CurLoop->getHeader())
return !SafetyInfo->HeaderMayThrow;
do something like
if (Inst.getParent() == CurLoop->getHeader())
return IsBeforeThrowingInst;

-- Sanjoy

I have an ebb analysis around that can, in constant time, tell you there nearest dominating may throw that occurs before a given instruction, if any.

Here’s the simpler n-log-n processing time, log n lookup version if you wanted to throw it somewhere.
I believe it should work.

std::map<std::pair<unsigned int, unsigned int>, Instruction *> NearestDominatingMayThrow;
DenseMap<Instruction, std::pair<unsigned int, unsigned int>> InstrDFS;

// This is a variant of updateDFSNumbers.

void assignMayThrowDFS(){
unsigned int DFSNum = 0;

SmallVector<std::pair<const DomTreeNode *,
typename DomTreeNode::const_iterator>,
32> WorkStack;

auto *ThisRoot = DT->getRootNode();

if (!ThisRoot)
return;

WorkStack.push_back(std::make_pair(ThisRoot, ThisRoot->begin()));
ThisRoot->DFSNumIn = DFSNum++;

while (!WorkStack.empty()) {
const DomTreeNodeBase *Node = WorkStack.back().first;
typename DomTreeNodeBase::const_iterator ChildIt =
WorkStack.back().second;

// If we visited all of the children of this node, “recurse” back up the
// stack setting the DFOutNum.
if (ChildIt == Node->end()) {
for (auto *I : Node->getBlock()) {
auto &DFSPair = InstrDFS[I]
DFSPair.second = DFSNum++;
if (I->mayThrow())
NearestDominatingMayThrow[DFSPair] = I;
}
WorkStack.pop_back();
} else {
// Otherwise, recursively visit this child.
auto *Child = *ChildIt;
++WorkStack.back().second;

WorkStack.push_back(std::make_pair(Child, Child->begin()));
Child->DFSNumIn = DFSNum++;
for (auto *I : Child->getBlock())
InstrDFS[I].first = DFSNum++;

}
}

Now, for an instruction you want to find the nearest dominating may throw, look it up in instrdfs, and use lower_bound on the map. Note: if your current instruction throws, you need to start at the previous one (fixable, i’m just lazy).

I also haven’t hand verified whether std::pair will do the right thing, but what you want is the pair closest to your instruction where dfsnumin (.first) is < your pair’s dfsnumin, and dfsnumout (.second) is > than your pair’s dfsnumout.

(IE that it dominates you)

This instruction is guaranteed to dominate you, if it exists, and is guaranteed to be the closest thing that dominates you.

I was thinking about something very similar and it seems to be a good
approach to me because it has much lower complexity.

In the review Sanjoy correctly spoted that I am not discarding
invariant.group metadata when hoisting, which is incorrect in general.
This generates a big problem to devirtualization, because discarding
metadata when hoisting vtable load might be worse than not hoisting when
there might be another load/store with !invariant.group dominating it (e.g.
after inlining).

I don't see any way to model it right now in LLVM, but I see a one simple
solution:
add extra information to the metadata nodes indicating that this property
is non-local:

%0 = load... %p, !invariant.group !1, !dereferenceable !2
%1 = load ... %0, !invariant.load !0, !dereferenceable !2

!0 = !{!"GlobalProperty"}
!1 = !{!"MyType", !"GlobalProperty"}
!2 = !{i64 8, !"GlobalProperty}

With that we would strip only the metadata not containing this information.

For devirtualization it would make sense with invariant.load,
invariant.group and dereferenceable.

What do you think about this idea?

sense, so I won't start writing prototype that will be throwed to thrash

I don’t really see why you need a separate property…

It seems plausible that rather than having the hoisting logic handle invariant metadata generically (by stripping it when hoisting) it could instead be aware of invariant metadata and try to avoid hoisting the load.

But this will end up being yet another way in which enabling invariants obstructs normal optimizations to get devirtualization. I’m increasingly worried that the cost is going to outweigh the gain.

Hi Piotr,

> [snip]
> Do I understand it correctly, that it is legal to do the hoist because
all
> of the instructions above %vtable does not throw?

Yes, I think you're right. HeaderMayThrow is a conservative
approximation, and the conservativeness is biting us here.

> Are there any plans to fix it in the future? The fix doesn't seem hard
to

Not to my knowledge.

> write and I can do it, but I am not sure if it won't be too expensive.

Maybe we can do it (relatively) cheaply:

- Remember the first throwing instruction in the header, instead of a
   boolean, in LoopSafetyInfo

- In hoistRegion, remember if you've seen the first throwing
   instruction yet

- Pass the above as a boolean parameter to isGuaranteedToExecute, and
   instead of
     if (Inst.getParent() == CurLoop->getHeader())
       return !SafetyInfo->HeaderMayThrow;
   do something like
     if (Inst.getParent() == CurLoop->getHeader())
       return IsBeforeThrowingInst;

-- Sanjoy

I was thinking about something very similar and it seems to be a good
approach to me because it has much lower complexity.

In the review Sanjoy correctly spoted that I am not discarding
invariant.group metadata when hoisting, which is incorrect in general.
This generates a big problem to devirtualization, because discarding
metadata when hoisting vtable load might be worse than not hoisting when
there might be another load/store with !invariant.group dominating it
(e.g. after inlining).

I don't see any way to model it right now in LLVM, but I see a one simple
solution:
add extra information to the metadata nodes indicating that this
property is non-local:

%0 = load... %p, !invariant.group !1, !dereferenceable !2
%1 = load ... %0, !invariant.load !0, !dereferenceable !2

!0 = !{!"GlobalProperty"}
!1 = !{!"MyType", !"GlobalProperty"}
!2 = !{i64 8, !"GlobalProperty}

With that we would strip only the metadata not containing this
information.

For devirtualization it would make sense with invariant.load,
invariant.group and dereferenceable.

What do you think about this idea?

Do you have any comments? I would like to know if the idea does even make
sense, so I won't start writing prototype that will be throwed to thrash

I don't really see why you need a separate property....

It seems plausible that rather than having the hoisting logic handle
invariant metadata generically (by stripping it when hoisting) it could
instead be aware of invariant metadata and try to avoid hoisting the load.

So when we hoist based on the guarantee that it will be executed it seems

to be safe to keep the metadata. LICM always discard.
So we could only hoist invariant.group loads if they are guarantee to
execute.
Right now LICM will not hoist any invariant.group load, because it doesn't
know it is invariant.

It would be still nice to hoist it when we know that it is safe to load
from vptr and from vtable, but we can't do it if we will discard medata
(for this
case we need dereferenceable)

But this will end up being yet another way in which enabling invariants
obstructs normal optimizations to get devirtualization. I'm increasingly
worried that the cost is going to outweigh the gain.

What kind of cost you are reffering?
If we would add a way to say that specific metadata is a global property,
then I don't think the cost of maitanence (understanding idea, fixing
documentation
and fixing passes) would be higher than the gains, because I assume it
would be very low.
It does not seem to be that complicated to understand and use,
and to implement this we would only need function like
`dropAllUnknownLocalMetadata` that would replace `dropAllUnknownMedatata`
in most of the
places. We could even firstly fix LICM, and then fix other passes, because
the global properties would not guarantee that they would be not discarded.

When we add medata in the frontend then in most cases (probably all,
because it is not that easy to add metadata depending on the BB) it is a
non local property. Because of this it seems weird that we can only model
it as a local property.

Piotr

Hi Piotr,

I don't see any way to model it right now in LLVM, but I see a one simple
solution:
add extra information to the metadata nodes indicating that this property
is non-local:

%0 = load... %p, !invariant.group !1, !dereferenceable !2
%1 = load ... %0, !invariant.load !0, !dereferenceable !2

!0 = !{!"GlobalProperty"}
!1 = !{!"MyType", !"GlobalProperty"}
!2 = !{i64 8, !"GlobalProperty}

With that we would strip only the metadata not containing this
information.

For devirtualization it would make sense with invariant.load,
invariant.group and dereferenceable.

What do you think about this idea?

I'm sorry for being *this* annoying, but I'm not on board with this. :slight_smile:

I think this will run into the same dead-code-can-affect-behavior
issue discussed in https://reviews.llvm.org/D20116.

Specifically, if your program is:

if (false) {
  ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8, !"GlobalProperty}
  // ptr is not actually dereferenceable, even the load above has UB
  // (since the metadata is "wrong"), but it is never executed so all is well.
  int val = *ptr;
}

then you could transform it to

ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8, !"GlobalProperty}
// ptr is not actually dereferenceable
int val = *ptr;
if (false) {
}

and you'd have gone from a program with no UB to one with UB.

-- Sanjoy

I disagree, I find it different than the patch you mentioned. We don't have
any problems with code like this:

ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}
if (false) {

  // ptr is not actually dereferenceable, even the load above has UB
  // (since the metadata is "wrong"), but it is never executed so all is
well.
  int val = *ptr;
}

because we rely on the information provided by someone else that ptr is
dereferenceable, so we can transform it to

ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}
// ptr is not actually dereferenceable
int val = *ptr;
if (false) {
}

And this is exactly how we treat any other UB.
It is also important to mention that in order to perform optimization you
have mentioned we need to know that %ptrptr is also dereferenceable.
The vptr and vtable properties are non local, so I don't see how
hoisting vptr load could be dangerous:

void foo(A& a) {
  if (false)
    a.foo();
}

will be:

void foo(A* dereferenceable(8) %a) {
  if (false)
    %vptr = load %a, !dereferenceable !{VtableSize, "GlobalProperty"},
!invariant.group !0
    %vfunction = load %vptr, !invariant.load !0
    call %vfunction();
}

and after hoisting:

void foo(A* dereferenceable(8) %a) {
  %vptr = load %a, !dereferenceable !{VtableSize, "GlobalProperty"},
!invariant.group !{"GlobalProperty"}
  %vfunction = load %vptr, !invariant.load !{"GlobalProperty"}
  if (false)
    call %vfunction();
}

Hi Piotr,

I disagree, I find it different than the patch you mentioned. We don't have
any problems with code like this:

ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}
if (false) {

// ptr is not actually dereferenceable, even the load above has UB
// (since the metadata is "wrong"), but it is never executed so all is
well.
int val = *ptr;
}

I was not talking about code like that. The problem is code like this:

if (false) {
ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8, !"GlobalProperty}
// ptr is not actually dereferenceable, even the load above has UB
// (since the metadata is "wrong"), but it is never executed so all is well.
int val = *ptr;
}

I did not mention this earlier, but I've assumed that %ptrptr itself
is dereferenceable, which means you can hoist the load of ptr. Since
because of !"GlobalProperty" you don't strip the !dereferenceable,
you'll also be able to hoist the load of val, which would segfault
because ptr was not dereferenceable.

That is, with the !"GlobalProperty" bit in the picture, it is possible
to make "if (false) { X }" introduce UB in a program for certain
values of X.

-- Sanjoy

My point is that this it is exactly the same way as normal !dereferenceable
introduces UB.

ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}
if (false) {
  int val = *ptr;
  }

If frontend says that something is dereferenceable, which is not actually
dereferenceable, then it is UB and everything can happen - like the
execution of dead instruction.
This is exactly the same with the global properties - we are giving a
guarantee that pointer it is dereferenceable even if we would hoist or sink
it, and if it is not true then it is UB.

I don't see why UB with normal dereferenceable is acceptable, but having
this property globally is not.

Hi Piotr,

Hi Sanjoy,
My point is that this it is exactly the same way as normal !dereferenceable
introduces UB.

ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}
if (false) {
int val = *ptr;
}

These are two different things.

In the above example, your original program executes a load that was
supposed to produce a dereferenceable value, but it did not. This
means the original program is undefined, so we can optimize it to do
whatever we want.

OTOH, look at this program:

void main() {
if (false) {
// ptrptr does not contain a dereferenceable pointer, but is
itself dereferenceable
ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global
int val = *ptr;
}
}

What is the behavior of the above program? It better be well defined,
since it does nothing!

However, if we follow your rules, we can first xform it to

void main() {
ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global
if (false) {
int val = *ptr;
}
}

and then to

void main() {
ptr = load i8*, i8** %ptrptr, !dereferenceable !{i64 8}, !global
int val = *ptr;
if (false) {
}
}

which introduces UB into a program that was well defined to begin
with.

If frontend says that something is dereferenceable, which is not actually
dereferenceable, then it is UB and everything can happen - like the
execution of dead instruction.
This is exactly the same with the global properties - we are giving a
guarantee that pointer it is dereferenceable even if we would hoist or sink
it, and if it is not true then it is UB.

But then you're saying dead code (code that would not actually execute
in the original program) can affect program behavior, and: "if (false)
X;" is not a no-op for some values of X. It is in this respect that
your proposal is similar to the speculatable proposal.

-- Sanjoy

I left a comment on https://reviews.llvm.org/D18738 earlier which I
feel is related to this discussion, so I'm reproducing it here:

I wonder though if what we want to express isn't some sort of
"type-based dereferencability annotations". For example the semantics
I care about are essentially, "if you know you have a defereferencable
pointer, you can go and dereference any other valid (managed) pointers
the pointed to data references (recursively)". This has to be true for
me, because the GC walks all pointers recursively that way. Of course
the problem with this is that the compiler doesn't know which part of
the referenced data are valid pointers for this purpose (and it can't
just be based on the struct type, because we do store pointers to
unmanaged data). So if we had a way to express to the compiler "here
are the pointers in this struct that you may assume dereferencable",
that would work very well for me.

+CC Artur

I left a comment on https://reviews.llvm.org/D18738 earlier which I
feel is related to this discussion, so I'm reproducing it here:

I wonder though if what we want to express isn't some sort of
"type-based dereferencability annotations". For example the semantics
I care about are essentially, "if you know you have a defereferencable
pointer, you can go and dereference any other valid (managed) pointers
the pointed to data references (recursively)". This has to be true for
me, because the GC walks all pointers recursively that way. Of course
the problem with this is that the compiler doesn't know which part of
the referenced data are valid pointers for this purpose (and it can't
just be based on the struct type, because we do store pointers to
unmanaged data). So if we had a way to express to the compiler "here
are the pointers in this struct that you may assume dereferencable",
that would work very well for me.

That is very close to what we (Azul) do internally (I think Artur
covered some of this on his EuroLLVM talk).

If we have:

if (x instanceof java.lang.String) {
if (arbitraryCondition) {
charv = x.value;
}
}

we know we can hoist the load of x.value outside the
arbitraryCondition check (modulo aliasing).

-- Sanjoy

But we already have that behavior with dereferenceable variables.

void foo(i8* dereferenceable(8) %ptr) {
  if (false) {
    ; what if %x is not actually dereferenceable?
    %x = load i8, i8* %ptr
  }
}
Now we can hoist the load, even that it would be never executed.
This is the same with global properties - it gives us guarantee that
property holds non locally and if it doesn't then it is UB
and things like this might happen.

I understand your concerns about the speculate function attribute, the
examples that you showed in the review are pretty disturbing,
but loads and stores seems to be much safere in that maner. Maybe you know
some examples like this, but with load and stores?
It just seems weird to me that we already depend on very similar mechanics,
but this one seems like a bad idea.

Best
Piotr

+CC Artur

I left a comment on https://reviews.llvm.org/D18738 earlier which I
feel is related to this discussion, so I'm reproducing it here:

I wonder though if what we want to express isn't some sort of
"type-based dereferencability annotations". For example the semantics
I care about are essentially, "if you know you have a defereferencable
pointer, you can go and dereference any other valid (managed) pointers
the pointed to data references (recursively)". This has to be true for
me, because the GC walks all pointers recursively that way. Of course
the problem with this is that the compiler doesn't know which part of
the referenced data are valid pointers for this purpose (and it can't
just be based on the struct type, because we do store pointers to
unmanaged data). So if we had a way to express to the compiler "here
are the pointers in this struct that you may assume dereferencable",
that would work very well for me.

That is very close to what we (Azul) do internally (I think Artur
covered some of this on his EuroLLVM talk).

If we have:

if (x instanceof java.lang.String) {
   if (arbitraryCondition) {
     charv = x.value;
   }
}

we know we can hoist the load of x.value outside the
arbitraryCondition check (modulo aliasing).

As mentioned in the talk and discussion afterwards, we're open to trying to upstream the logic we've developed around expressing java types over IR. We'd have to generalize it a bit, but if others would find it helpful, we'd be perfectly willing to share this code.