PartialAlias: different start addresses

Hi,

I going through the alias analysis documentation (http://llvm.org/docs/AliasAnalysis.html) and noticed the following in the definition of PartialAlias:
"
The PartialAlias response is used when the two memory objects are known to be overlapping in some way, but *do not start at the same address*.
"

Is it really required that the objects do no start at the same address? if that's the case the AA algorithm would need to prove that.
I'm asking this because:
1) This condition seems very strict and I don't think it's met in a few places I found by manual inspection
2) I couldn't find any client that uses this information in a meaningful way. Is there a use case for PartialAlias w/ strict distinct starting addresses?

Thanks,
Nuno

If I read the definition correctly, at least our Andersens' AA
implementation violates it.
see:

AliasResult CFLAndersAAResult::alias(const MemoryLocation &LocA,
                                     const MemoryLocation &LocB) {
  if (LocA.Ptr == LocB.Ptr)
    return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

(i.e. the two objects are overlapping here *and* start at the same address).

Hi,

I going through the alias analysis documentation
(http://llvm.org/docs/AliasAnalysis.html) and noticed the following in the
definition of PartialAlias:
"
The PartialAlias response is used when the two memory objects are known to
be overlapping in some way, but *do not start at the same address*.
"

Is it really required that the objects do no start at the same address? if
that's the case the AA algorithm would need to prove that.
I'm asking this because:
1) This condition seems very strict and I don't think it's met in a few
places I found by manual inspection

If I read the definition correctly, at least our Andersens' AA
implementation violates it.
see:

AliasResult CFLAndersAAResult::alias(const MemoryLocation &LocA,
                                      const MemoryLocation &LocB) {
   if (LocA.Ptr == LocB.Ptr)
     return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

(i.e. the two objects are overlapping here *and* start at the same address).

I've never noticed that part of the definition myself. I'm fairly certain that's not what we actually implement.

  -Hal

Yeah, we definitely don’t implement this, and i’m not sure why we would.
The best guess i can come up with is that it’s really attempting to say something like

1 = *A+4 (size 4)
2 = *A (size 8)

load from 2 PartialAlias 1 (not MustAlias)

Which is conservative but correct, and meant to ensure alias(1, 2) == alias(2, 1)

That would match with our definition of MustAlias, which would not allow 2 MustAlias 1 either.

There isn’t really a good set of consistent answers between most compilers when the sizes are different but memory addresses overlap.

IE it seems reasonable to say that a load of 2 MustAlias 1 but a load of 1 PartialAlias 2. It also seems reasonable to declare these both PartialAlias.

In any case, our current implementation, from what i remember (i didn’t trace every path), says:

if MemoryLocations are same size:
may if no idea
must if the same
partial if we can prove overlap

If MemoryLocations are different sizes:
may if no idea
partial if overlap
never must

Thank you all for your replies.
So here seems to be an agreement that the documentation for PartialAlias is incorrect.

Daniel: now you got me wondering about MustAlias. This is what the docs say:
"The MustAlias response may only be returned if the two memory objects are *guaranteed to always start at exactly the same location*"

This statement is regardless of the access sizes. For example, in SCEV AA:
// If they evaluate to the same expression, it's a MustAlias.
if (AS == BS)
  return MustAlias;

AS/BS are scev expressions for the pointers. So no check for the access size.

So, does must needs to check for access sizes? If so, SCEV AA is buggy and the documentation needs tweaking.

Thanks,
Nuno

Thank you all for your replies.
So here seems to be an agreement that the documentation for PartialAlias is incorrect.

Daniel: now you got me wondering about MustAlias. This is what the docs say:
"The MustAlias response may only be returned if the two memory objects are *guaranteed to always start at exactly the same location*"

This statement is regardless of the access sizes. For example, in SCEV AA:
// If they evaluate to the same expression, it's a MustAlias.
if (AS == BS)
return MustAlias;

AS/BS are scev expressions for the pointers. So no check for the access size.

So, does must needs to check for access sizes? If so, SCEV AA is buggy and the documentation needs tweaking.

I'm under the impression that there is code that depends on the size check, but I don't trust my recollection in this regard. SCEV AA is known to cause miscompiles, IIRC, maybe you just found out why :wink:

  -Hal

It's true that the CFL AAs have this code:
if (LocA.Ptr == LocB.Ptr)
  return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

I grepped for clients of MustAlias:
~/llvm/lib/Transforms $ grep -Rl MustAlias .
./ObjCARC/ObjCARCOpts.cpp
./ObjCARC/ProvenanceAnalysis.cpp
./Scalar/DeadStoreElimination.cpp
./Scalar/GVN.cpp
./Scalar/LICM.cpp
./Scalar/LoopVersioningLICM.cpp
./Scalar/MemCpyOptimizer.cpp
./Scalar/MergedLoadStoreMotion.cpp
./Scalar/NewGVN.cpp
./Utils/VNCoercion.cpp

I glanced over all the uses in these files and I couldn't find any usage that requires sizes to match. Actually most clients check access sizes themselves. Most don't need equal sizes, just need one to be smaller than the other.

BTW, Basic AA doesn't check for size either:
if (isValueEqualInPotentialCycles(V1, V2))
  return MustAlias;

bool BasicAAResult::isValueEqualInPotentialCycles(const Value *V,
                                                  const Value *V2) {
  if (V != V2)
    return false;

  const Instruction *Inst = dyn_cast<Instruction>(V);
  if (!Inst)
    return true;
(...)
}

So if V1==V2, BasicAA will yield MustAlias.

I couldn't find any evidence that access sizes must be equal for 2 pointers to be MustAlias. Unless someone can prove that statement wrong, we have to conclude that CFL* AAs are being too conservative by yielding PartialMatch when sizes don't match.

Nuno

P.S.: This is not to say that SCEV AA hasn't bugs: I think I found one by inspection; I've asked a colleague to review and will report it soon. Not sure there's an easy fix for it, though.

Thank you all for your replies.
So here seems to be an agreement that the documentation for PartialAlias is incorrect.

Daniel: now you got me wondering about MustAlias. This is what the docs say:
"The MustAlias response may only be returned if the two memory objects are *guaranteed to always start at exactly the same location*"

This statement is regardless of the access sizes. For example, in SCEV AA:
// If they evaluate to the same expression, it's a MustAlias.
if (AS == BS)
return MustAlias;

AS/BS are scev expressions for the pointers. So no check for the access size.

So, does must needs to check for access sizes? If so, SCEV AA is buggy and the documentation needs tweaking.

I'm under the impression that there is code that depends on the size check, but I don't trust my recollection in this regard. SCEV AA is known to cause miscompiles, IIRC, maybe you just found out why :wink:

It's true that the CFL AAs have this code:
if (LocA.Ptr == LocB.Ptr)
return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

I grepped for clients of MustAlias:
~/llvm/lib/Transforms $ grep -Rl MustAlias .
./ObjCARC/ObjCARCOpts.cpp
./ObjCARC/ProvenanceAnalysis.cpp
./Scalar/DeadStoreElimination.cpp
./Scalar/GVN.cpp
./Scalar/LICM.cpp
./Scalar/LoopVersioningLICM.cpp
./Scalar/MemCpyOptimizer.cpp
./Scalar/MergedLoadStoreMotion.cpp
./Scalar/NewGVN.cpp
./Utils/VNCoercion.cpp

I glanced over all the uses in these files and I couldn't find any usage that requires sizes to match. Actually most clients check access sizes themselves. Most don't need equal sizes, just need one to be smaller than the other.

BTW, Basic AA doesn't check for size either:
if (isValueEqualInPotentialCycles(V1, V2))
return MustAlias;

bool BasicAAResult::isValueEqualInPotentialCycles(const Value *V,
                                                 const Value *V2) {
if (V != V2)
   return false;

const Instruction *Inst = dyn_cast<Instruction>(V);
if (!Inst)
   return true;
(...)
}

So if V1==V2, BasicAA will yield MustAlias.

I couldn't find any evidence that access sizes must be equal for 2 pointers to be MustAlias. Unless someone can prove that statement wrong, we have to conclude that CFL* AAs are being too conservative by yielding PartialMatch when sizes don't match.

Thanks for looking at this. In that case, I see no reason for the restriction. Please do post patches to clean all of this up.

Nuno

P.S.: This is not to say that SCEV AA hasn't bugs: I think I found one by inspection; I've asked a colleague to review and will report it soon. Not sure there's an easy fix for it, though.

Good, thanks!

  -Hal

Does aliasing actually check both ways?
Otherwise, alias (A, B) will give different results than alias (B, A).

Sorry for the delay.
I'm not sure I understood what you wrote, sorry. What you wrote is true in general, but I don't see how MustAlias in particular is worse than the other AA results. And how not enforcing that access sizes are equal changes things with respect to commutativity either. Maybe I completely missed your point..

Nuno

Thank you all for your replies.
So here seems to be an agreement that the documentation for
PartialAlias is incorrect.

Daniel: now you got me wondering about MustAlias. This is what the
docs say:
“The MustAlias response may only be returned if the two memory
objects are guaranteed to always start at exactly the same location

This statement is regardless of the access sizes. For example, in
SCEV AA:
// If they evaluate to the same expression, it’s a MustAlias.
if (AS == BS)
return MustAlias;

AS/BS are scev expressions for the pointers. So no check for the
access size.

So, does must needs to check for access sizes? If so, SCEV AA is
buggy and the documentation needs tweaking.

I’m under the impression that there is code that depends on the size
check, but I don’t trust my recollection in this regard. SCEV AA is
known to cause miscompiles, IIRC, maybe you just found out why :wink:

It’s true that the CFL AAs have this code:
if (LocA.Ptr == LocB.Ptr)
return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

I grepped for clients of MustAlias:
~/llvm/lib/Transforms $ grep -Rl MustAlias .
./ObjCARC/ObjCARCOpts.cpp
./ObjCARC/ProvenanceAnalysis.cpp
./Scalar/DeadStoreElimination.cpp
./Scalar/GVN.cpp
./Scalar/LICM.cpp
./Scalar/LoopVersioningLICM.cpp
./Scalar/MemCpyOptimizer.cpp
./Scalar/MergedLoadStoreMotion.cpp
./Scalar/NewGVN.cpp
./Utils/VNCoercion.cpp

I glanced over all the uses in these files and I couldn’t find any
usage that requires sizes to match. Actually most clients check
access sizes themselves. Most don’t need equal sizes, just need one to
be smaller than the other.

Does aliasing actually check both ways?
Otherwise, alias (A, B) will give different results than alias (B, A).

Sorry for the delay.
I’m not sure I understood what you wrote, sorry. What you wrote is true in
general, but I don’t see how MustAlias in particular is worse than the other
AA results.

Historically, in llvm, we have guaranteed that alias(a, b) ==alias(b, a)

If it does:

If start (a) == start (b)
If size(b) < size(a)
Return mustalias
Return may or partial

It will give different answers for alias(a, b) and alias (b, a)

Hence my question about whether it checked whether either was smaller, or just in one direction.

We should continue to do so. Thanks again, Hal

Meant to send this to the list:
Agreed, but we are getting it wrong somewhere, I just tried to bootstrap with an assert that this happened, and enabling a bunch of aa’s, and it asserts.
I’ll debug.

Wait, but the size check is done in the clients, not in AA.
AA only does (according to my understanding):

If start (a) == start (b)
     Return mustalias
Return may or partial

Nuno

That seems ... wrong to me, but i haven't thought very hard about it.

In particular, this would mean the documentation for partialalias *is*
correct :P.
IE it will only return partial if the starting addresses are not the same.

I actually would have expected something like:

if start(a) == start (b):
  if sizes equal:
    return mustalias
  else:
    return partialalias
if otherwise overlap
   return partial
return may

That said, i feel like both our existing code is trying to encode too much
info into these simple enum answers.

Maybe we should really be having something like:

enum aliaskind {
MustAlias, // Exactly the same start and size
PartialAliasSameStart,
PartialAlias,
Mayalias
}
enum overlapkind {
FullOverlap, // Either the accesses are the same size, or one fully covers
the other
PartialOverlap, // They partially cover each other
Unknown // No idea
}

std::pair<aliaskind, overlapkind> alias(A, B) {
if (start(A) == start(B))
  if sizes unknown:
     return {PartialAliasSameStart, Unknown}
  if sizes equal:
     return {MustAlias, FullOverlap}
  if one smaller than the other
     return {PartialAliasSameStart, FullOverlap}
etc

Then we don't have to muck around in the definitions of must/partial/may
too much, and the clients don't have to do too much when, for example, the
CSE/DSE like ones really want to know either "are these accessing exactly
the same memory location" *or* "can i eliminate this load/store in favor of
of an earlier/later one, maybe with some masking".

But maybe this doesn't need cleaning up.

Quoting Daniel Berlin <dberlin@dberlin.org>:

Thank you all for your replies.
So here seems to be an agreement that the documentation for
PartialAlias is incorrect.

Daniel: now you got me wondering about MustAlias. This is what the
docs say:
"The MustAlias response may only be returned if the two memory
objects are *guaranteed to always start at exactly the same
location*"

This statement is regardless of the access sizes. For example, in
SCEV AA:
// If they evaluate to the same expression, it's a MustAlias.
if (AS == BS)
return MustAlias;

AS/BS are scev expressions for the pointers. So no check for the
access size.

So, does must needs to check for access sizes? If so, SCEV AA is
buggy and the documentation needs tweaking.

I'm under the impression that there is code that depends on the size
check, but I don't trust my recollection in this regard. SCEV AA is
known to cause miscompiles, IIRC, maybe you just found out why :wink:

It's true that the CFL AAs have this code:
if (LocA.Ptr == LocB.Ptr)
return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

I grepped for clients of MustAlias:
~/llvm/lib/Transforms $ grep -Rl MustAlias .
./ObjCARC/ObjCARCOpts.cpp
./ObjCARC/ProvenanceAnalysis.cpp
./Scalar/DeadStoreElimination.cpp
./Scalar/GVN.cpp
./Scalar/LICM.cpp
./Scalar/LoopVersioningLICM.cpp
./Scalar/MemCpyOptimizer.cpp
./Scalar/MergedLoadStoreMotion.cpp
./Scalar/NewGVN.cpp
./Utils/VNCoercion.cpp

I glanced over all the uses in these files and I couldn't find any
usage that requires sizes to match. Actually most clients check
access sizes themselves. Most don't need equal sizes, just need one to
be smaller than the other.

Does aliasing actually check both ways?
Otherwise, alias (A, B) will give different results than alias (B, A).

Sorry for the delay.
I'm not sure I understood what you wrote, sorry. What you wrote is
true in
general, but I don't see how MustAlias in particular is worse than the
other
AA results.

Historically, in llvm, we have guaranteed that alias(a, b) ==alias(b, a)

If it does:

If start (a) == start (b)
  If size(b) < size(a)
      Return mustalias
  Return may or partial

It will give different answers for alias(a, b) and alias (b, a)

Wait, but the size check is done in the clients, not in AA.
AA only does (according to my understanding):

If start (a) == start (b)
    Return mustalias
Return may or partial

That seems ... wrong to me, but i haven't thought very hard about it.

In particular, this would mean the documentation for partialalias *is*
correct :P.
IE it will only return partial if the starting addresses are not the same.

I actually would have expected something like:

if start(a) == start (b):
  if sizes equal:
    return mustalias
  else:
    return partialalias
if otherwise overlap
   return partial
return may

That said, i feel like both our existing code is trying to encode too much
info into these simple enum answers.

Maybe we should really be having something like:

enum aliaskind {
MustAlias, // Exactly the same start and size
PartialAliasSameStart,
PartialAlias,
Mayalias
}
enum overlapkind {
FullOverlap, // Either the accesses are the same size, or one fully covers
the other
PartialOverlap, // They partially cover each other
Unknown // No idea
}

std::pair<aliaskind, overlapkind> alias(A, B) {
if (start(A) == start(B))
  if sizes unknown:
     return {PartialAliasSameStart, Unknown}
  if sizes equal:
     return {MustAlias, FullOverlap}
  if one smaller than the other
     return {PartialAliasSameStart, FullOverlap}
etc

Then we don't have to muck around in the definitions of must/partial/may
too much, and the clients don't have to do too much when, for example, the
CSE/DSE like ones really want to know either "are these accessing exactly
the same memory location" *or* "can i eliminate this load/store in favor of
of an earlier/later one, maybe with some masking".

But maybe this doesn't need cleaning up.

You are the AA expert, not me :slight_smile: I was just trying to formalize LLVM's AA stuff and noticed that some things were not very consistent.
BTW, we need to take into account over-approximation of the analyses. So we cannot/shouldn't say that PartialAlias only happens when addresses are different because this imposes extra burden on the AA: it would have to prove that. I believe it's better if PartialAlias doesn't carry any information about the addresses. That way MustAlias implies PartialAlias.
Anyway, this depends on what clients actually need. I don't have enough experience in this area to make any judgement about that; I'll leave that with you :slight_smile:

Nuno

Quoting Daniel Berlin <dberlin@dberlin.org>:

Thank you all for your replies.
So here seems to be an agreement that the documentation for
PartialAlias is incorrect.

Daniel: now you got me wondering about MustAlias. This is what the
docs say:
"The MustAlias response may only be returned if the two memory
objects are *guaranteed to always start at exactly the same
location*"

This statement is regardless of the access sizes. For example, in
SCEV AA:
// If they evaluate to the same expression, it's a MustAlias.
if (AS == BS)
return MustAlias;

AS/BS are scev expressions for the pointers. So no check for the
access size.

So, does must needs to check for access sizes? If so, SCEV AA is
buggy and the documentation needs tweaking.

I'm under the impression that there is code that depends on the size
check, but I don't trust my recollection in this regard. SCEV AA is
known to cause miscompiles, IIRC, maybe you just found out why :wink:

It's true that the CFL AAs have this code:
if (LocA.Ptr == LocB.Ptr)
return LocA.Size == LocB.Size ? MustAlias : PartialAlias;

I grepped for clients of MustAlias:
~/llvm/lib/Transforms $ grep -Rl MustAlias .
./ObjCARC/ObjCARCOpts.cpp
./ObjCARC/ProvenanceAnalysis.cpp
./Scalar/DeadStoreElimination.cpp
./Scalar/GVN.cpp
./Scalar/LICM.cpp
./Scalar/LoopVersioningLICM.cpp
./Scalar/MemCpyOptimizer.cpp
./Scalar/MergedLoadStoreMotion.cpp
./Scalar/NewGVN.cpp
./Utils/VNCoercion.cpp

I glanced over all the uses in these files and I couldn't find any
usage that requires sizes to match. Actually most clients check
access sizes themselves. Most don't need equal sizes, just need one to
be smaller than the other.

Does aliasing actually check both ways?
Otherwise, alias (A, B) will give different results than alias (B, A).

Sorry for the delay.
I'm not sure I understood what you wrote, sorry. What you wrote is
true in
general, but I don't see how MustAlias in particular is worse than the
other
AA results.

Historically, in llvm, we have guaranteed that alias(a, b) ==alias(b, a)

If it does:

If start (a) == start (b)
  If size(b) < size(a)
      Return mustalias
  Return may or partial

It will give different answers for alias(a, b) and alias (b, a)

Wait, but the size check is done in the clients, not in AA.
AA only does (according to my understanding):

If start (a) == start (b)
    Return mustalias
Return may or partial

That seems ... wrong to me, but i haven't thought very hard about it.

In particular, this would mean the documentation for partialalias *is*
correct :P.
IE it will only return partial if the starting addresses are not the same.

I actually would have expected something like:

if start(a) == start (b):
  if sizes equal:
    return mustalias
  else:
    return partialalias
if otherwise overlap
   return partial
return may

That said, i feel like both our existing code is trying to encode too much
info into these simple enum answers.

Maybe we should really be having something like:

enum aliaskind {
MustAlias, // Exactly the same start and size
PartialAliasSameStart,
PartialAlias,
Mayalias
}
enum overlapkind {
FullOverlap, // Either the accesses are the same size, or one fully covers
the other
PartialOverlap, // They partially cover each other
Unknown // No idea
}

std::pair<aliaskind, overlapkind> alias(A, B) {
if (start(A) == start(B))
  if sizes unknown:
     return {PartialAliasSameStart, Unknown}
  if sizes equal:
     return {MustAlias, FullOverlap}
  if one smaller than the other
     return {PartialAliasSameStart, FullOverlap}
etc

Then we don't have to muck around in the definitions of must/partial/may
too much, and the clients don't have to do too much when, for example, the
CSE/DSE like ones really want to know either "are these accessing exactly
the same memory location" *or* "can i eliminate this load/store in favor of
of an earlier/later one, maybe with some masking".

But maybe this doesn't need cleaning up.

You are the AA expert, not me :slight_smile: I was just trying to formalize LLVM's AA stuff and noticed that some things were not very consistent.
BTW, we need to take into account over-approximation of the analyses. So we cannot/shouldn't say that PartialAlias only happens when addresses are different because this imposes extra burden on the AA: it would have to prove that.

Our general past position is that AA may return MustAlias and PartialAlias only when it can prove the preconditions. Otherwise, it should return MayAlias. We've certainly had bugs in the past in this area (i.e. where AA returns MustAlias or PartialAlias when it didn't properly prove the preconditions), but if clients are to be able to use these return values meaningfully, I think we need to keep it this way (i.e. if we don't require AA to prove the preconditions, what's the point?).

  -Hal