Alias analysis is figuring out the relationship between two pointer expressions, at some location in the program. At a given point in the program, do two expressions always refer to the same location? At a given point in the program, do two expressions never refer to the same location?
AliasAnalysis::alias() doesn't explicitly take a "point" in the program because we don't implement any alias analysis that cares about the precise point in the program; the answer applies to any point in the program dominated by the definitions of both values. This produces useful results because LLVM uses SSA form. Working from that definition, it's easy to see that the logic used in aliasSelect is valid, and that aliasPHI is returning the correct result for the given example. Maybe it's worth calling this out in the alias analysis documentation.
The flaw here seems to be in the way that DA is using AA. DependenceAnalysis's underlyingObjectsAlias is doing this:
// Check the original locations (minus size) for noalias, which can happen for
// tbaa, incompatible underlying object locations, etc.
MemoryLocation LocAS(LocA.Ptr, LocationSize::unknown(), LocA.AATags);
MemoryLocation LocBS(LocB.Ptr, LocationSize::unknown(), LocB.AATags);
if (AA->alias(LocAS, LocBS) == NoAlias)
// Check the underlying objects are the same
const Value *AObj = GetUnderlyingObject(LocA.Ptr, DL);
const Value *BObj = GetUnderlyingObject(LocB.Ptr, DL);
// If the underlying objects are the same, they must alias
if (AObj == BObj)
// We may have hit the recursion limit for underlying objects, or have
// underlying objects where we don't know they will alias.
if (!isIdentifiedObject(AObj) || !isIdentifiedObject(BObj))
// Otherwise we know the objects are different and both identified objects so
// must not alias.
and I recall suggesting this use of AA to pick up the metadata-based AA results. We shouldn't need both the AA check here and the underlying-object check also, as the AA check includes an underlying-object check (and not much else, aside from checks on metadata, when both sizes are unknown). However, as this example shows, the fact that the two SSA values in question refer to different, identified underlying objects, doesn't prevent them from having a cross-iteration dependence as the identify of those underlying objects can change across different iterations (and, thus, the underlying object of one variable in one iteration might be the underlying object used by another variable in another iteration).
I'm open to suggestions on how to best resolve this situation. I'm leaning toward suggesting that we have a metadata-only AA query, and use that here, instead of calling something that will invoke BasicAA's logic. My metadata-only AA query, I specifically mean a fundamental dependency-analysis query, likely implemented by the AA subsystem for code-reuse reasons, which answers the question: is there some reason to believe that any instances of these two values never take the same value along any dynamic path through the CFG (or something similar).
For information across loop iterations, the appropriate analysis is probably LoopAccessAnalysis.