Basic AliasAnalysis: Can GEPs with the same base but different constant indices into a struct alias?

Hi all,

This is covered by (struct-path aware) TBAA, but BasicAA disagrees.
See the attached testcase, where it prevents us from removing the
redundant load.
For arbitrary GEPs, we can't decide based on constant indices (because
of e.g., &A[0][1] and &A[1][0], with *A a one-element array). BasicAA
has some logic to "try to distinguish something like &A[i][1] against
&A[42][0]". For the testcase, it works for instance when the struct
has an even number of floats.

However, I think we can safely say GEPs with different constant
indices into a struct can't alias, at least when:
- the GEP is inbounds
- the access sizes are such that there is no overlap
- the struct index is the final one

That is, these can't alias:
    gep i1, j1, k1, 0
    gep i2, j2, k2, 1
If this is a struct pointer:
    gep i1, j1, k1

I couldn't come up with a counterexample that would prevent us from
doing this in BasicAA.
I was surprised it wasn't; surely I missed something?

Thanks!

-Ahmed

testcase.ll (795 Bytes)

Are you counting unions, or literally just structs?

Are you counting unions, or literally just structs?

Just structs.

-Ahmed

From: "Ahmed Bougacha" <ahmed.bougacha@gmail.com>
To: "LLVM Dev" <llvmdev@cs.uiuc.edu>
Cc: "Hal Finkel" <hfinkel@anl.gov>
Sent: Tuesday, January 20, 2015 2:27:43 PM
Subject: Basic AliasAnalysis: Can GEPs with the same base but different constant indices into a struct alias?

Hi all,

This is covered by (struct-path aware) TBAA, but BasicAA disagrees.
See the attached testcase, where it prevents us from removing the
redundant load.
For arbitrary GEPs, we can't decide based on constant indices
(because
of e.g., &A[0][1] and &A[1][0], with *A a one-element array).
BasicAA
has some logic to "try to distinguish something like &A[i][1] against
&A[42][0]". For the testcase, it works for instance when the struct
has an even number of floats.

However, I think we can safely say GEPs with different constant
indices into a struct can't alias, at least when:
- the GEP is inbounds
- the access sizes are such that there is no overlap
- the struct index is the final one

That is, these can't alias:
    gep i1, j1, k1, 0
    gep i2, j2, k2, 1
If this is a struct pointer:
    gep i1, j1, k1

I couldn't come up with a counterexample that would prevent us from
doing this in BasicAA.
I was surprised it wasn't; surely I missed something?

I don't think you're missing anything, and you're right. BasicAA is essentially a collection of heuristics, it is quite possible that no one has added this one yet.

-Hal

Hi all,

This is covered by (struct-path aware) TBAA, but BasicAA disagrees.
See the attached testcase, where it prevents us from removing the
redundant load.
For arbitrary GEPs, we can't decide based on constant indices (because
of e.g., &A[0][1] and &A[1][0], with *A a one-element array). BasicAA
has some logic to "try to distinguish something like &A[i][1] against
&A[42][0]". For the testcase, it works for instance when the struct
has an even number of floats.

However, I think we can safely say GEPs with different constant
indices into a struct can't alias, at least when:
- the GEP is inbounds

I'm afraid that this is irrelevant. The only property conferred by inbounds
is that none of the intermediate pointers fall outside the allocated
object. It has nothing to do with indexing past the bounds (in either
positive or negative directions) of an array element within a struct. See
http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
and the immediately following entry.

- the access sizes are such that there is no overlap

I think this is what you have to prove, and once you do

- the struct index is the final one

I think this stops mattering.

That is, these can't alias:
    gep i1, j1, k1, 0
    gep i2, j2, k2, 1
If this is a struct pointer:
    gep i1, j1, k1

I don't think this holds in general... Consider the type which contains 4
i32 objects: { [1 x { i32, i32 }], [2 x { i32 }] }.

A) gep 0, 1, 1, 0 -> points at the 4th i32 object

and

B) gep 0, 0, 1, 1 -> points at the 4th i32 object

Unless I've missed the math somewhere, I think this works, and both gep 0,
1, 1 and gep 0, 0, 1 point at a struct.

In general, you can use an array element of the struct to do just about
anything. And I can wrap any struct in an another struct, prefix it by an
array, and then use that array to form GEPs that do crazy aliasing.

From: "Chandler Carruth" <chandlerc@google.com>
To: "Ahmed Bougacha" <ahmed.bougacha@gmail.com>
Cc: "LLVM Dev" <llvmdev@cs.uiuc.edu>
Sent: Saturday, January 24, 2015 7:44:50 PM
Subject: Re: [LLVMdev] Basic AliasAnalysis: Can GEPs with the same base but different constant indices into a struct
alias?

Hi all,

This is covered by (struct-path aware) TBAA, but BasicAA disagrees.
See the attached testcase, where it prevents us from removing the
redundant load.
For arbitrary GEPs, we can't decide based on constant indices
(because
of e.g., &A[0][1] and &A[1][0], with *A a one-element array). BasicAA
has some logic to "try to distinguish something like &A[i][1] against
&A[42][0]". For the testcase, it works for instance when the struct
has an even number of floats.

However, I think we can safely say GEPs with different constant
indices into a struct can't alias, at least when:
- the GEP is inbounds

I'm afraid that this is irrelevant. The only property conferred by
inbounds is that none of the intermediate pointers fall outside the
allocated object. It has nothing to do with indexing past the bounds
(in either positive or negative directions) of an array element
within a struct. See
http://llvm.org/docs/GetElementPtr.html#what-happens-if-an-array-index-is-out-of-bounds
and the immediately following entry.

- the access sizes are such that there is no overlap

I think this is what you have to prove, and once you do

- the struct index is the final one

I think this stops mattering.

That is, these can't alias:
gep i1, j1, k1, 0
gep i2, j2, k2, 1
If this is a struct pointer:
gep i1, j1, k1

I don't think this holds in general...

Indeed, I did not read this with sufficient care :wink: -- If you have:

gep i1, j1, k1, 0
gep i1, j1, k1, 1

If this is a struct pointer:

gep i1, j1, k1

then the first two should not alias. Allowing for arbitrary other indices, however, you can't reach the same conclusion (as Chandler pointed out).

-Hal

Ah yes, the structs are what make it messy.

How about the more useful constraint:
- the (identical) base must point to a (possibly multidimensional) array of
structs.

Then, the above holds, no? No matter which path you take, you must point
to a one of the contiguous structs, and if the index is different then the
pointers can't alias.

Anyway, the help is much appreciated, thanks Chandler & Hal!

-Ahmed

Ah yes, the structs are what make it messy.

How about the more useful constraint:
- the (identical) base must point to a (possibly multidimensional) array
of structs.

Then, the above holds, no? No matter which path you take, you must point
to a one of the contiguous structs, and if the index is different then the
pointers can't alias.

I don't really know what you're proposing. However, note that I can nest
arrays inside of structs and wrap structs in arrays, so I think I can
essentially almost always arrange to have arrays which I can overflow to
produce aliasing offsets.

What are you really trying to achieve here? Why is just modeling the math
not sufficient here?

Ah yes, the structs are what make it messy.

How about the more useful constraint:
- the (identical) base must point to a (possibly multidimensional) array
of structs.

Then, the above holds, no? No matter which path you take, you must point
to a one of the contiguous structs, and if the index is different then the
pointers can't alias.

I don't really know what you're proposing. However, note that I can nest
arrays inside of structs and wrap structs in arrays, so I think I can
essentially almost always arrange to have arrays which I can overflow to
produce aliasing offsets.

But that doesn't matter, no? If AA is looking at two GEPs indexing into
say [1 x {[0 x i32], i32}]:
- we can state that gep 0, 0, 0 and gep 0, 0, 1 don't alias
- even though gep 0, 0, 0, 1 and gep 0, 0, 1 can, since this is a different
query.

So if you have say [1 x {i32, i32}], you can safely say that gep 0, i, 0
and gep 0, j, 1 can't alias.

Or maybe this is my core misunderstanding of the problem?

What are you really trying to achieve here? Why is just modeling the math
not sufficient here?

See the originally attached testcase, where the load seems redundant, but
we can't prove the two GEPs don't alias.

-Ahmed

From: "Ahmed Bougacha" <ahmed.bougacha@gmail.com>
To: "Chandler Carruth" <chandlerc@google.com>
Cc: "LLVM Dev" <llvmdev@cs.uiuc.edu>, "Hal Finkel" <hfinkel@anl.gov>
Sent: Monday, February 2, 2015 1:25:40 PM
Subject: Re: [LLVMdev] Basic AliasAnalysis: Can GEPs with the same base but different constant indices into a struct
alias?

Ah yes, the structs are what make it messy.

How about the more useful constraint:

- the (identical) base must point to a (possibly multidimensional)
array of structs.

Then, the above holds, no? No matter which path you take, you must
point to a one of the contiguous structs, and if the index is
different then the pointers can't alias.

I don't really know what you're proposing. However, note that I can
nest arrays inside of structs and wrap structs in arrays, so I think
I can essentially almost always arrange to have arrays which I can
overflow to produce aliasing offsets.

But that doesn't matter, no? If AA is looking at two GEPs indexing
into say [1 x {[0 x i32], i32}]:
- we can state that gep 0, 0, 0 and gep 0, 0, 1 don't alias
- even though gep 0, 0, 0, 1 and gep 0, 0, 1 can, since this is a
different query.

So if you have say [1 x {i32, i32}], you can safely say that gep 0,
i, 0 and gep 0, j, 1 can't alias.

Or maybe this is my core misunderstanding of the problem?

What are you really trying to achieve here? Why is just modeling the
math not sufficient here?

See the originally attached testcase, where the load seems redundant,
but we can't prove the two GEPs don't alias.

Your original test case is essentially this:

%struct.RECT = type { float, float, float }
define float @basicaa_struct_geps(%struct.RECT* %rect, i64 %i, i64 %j, float %x, float %y) {
entry:
  %p_dx = getelementptr inbounds %struct.RECT* %rect, i64 %i, i32 0
  %p_dy = getelementptr inbounds %struct.RECT* %rect, i64 %j, i32 1
  ...
}

and the question is, can we conclude that %p_dx does not alias with %p_dy? The answer should be yes, but concluding this relies on more than the fact that the trailing indices differ for accesses to a compatible structure type. The lack of aliasing also involves proving the lack of partial aliasing of the base structure pointers (we know that (getelementptr inbounds %struct.RECT* %rect, i64 %i) might be equal to (getelementptr inbounds %struct.RECT* %rect, i64 %j), or disjoint from it, but won't partially overlap it). It also relies on the fact that there is no potential indexing overlap between elements of the structure type (there are no arrays, only simple member types).

So I think that we can enhance BasicAA to catch this case, but it won't be quite as general as you had originally proposed.

-Hal

So, I gave it a quick shot in http://reviews.llvm.org/D7453. Opinions
welcome, thanks again for the help!

-Ahmed