Alias analysis interface

Hi,

     In today meeting, Dan gave a very impressive talk about alias analysis.
I had a question about the example in his slide, something like:

   for (i = 0; i < N; i++) a[i] = a[i-1] + 1;

  For the sake of convenience, let A1, A2 be a[i] and a[i-1] respectively.
In Dan's design, Alias(A1, A2) would give "no alias", and in order to
prevent A2 from being mistakenly moved out of loop, the optimizer
needs to query dependence test as well.

   The problems is how optimizer combine the results from alias analysis
and dependence test?

   If I change the code little bit, into following, then combining the
querying dependence testing would prevent hosting *q.

   // points-to(p) = { a, b }
   // points-to(q) = { c, d }
   for (i = 0; i < N; i++) *p += *q + 1;

Thanks
Shuxin

Sorry the 1st example I gave it bit lame, it is changed to following:

    // a[] is local array, no addr taken. die right after the loop
    for (i = 0; i < N; i++) {
        a[i] = ... /* s1 */
        sum += a[i-1];
    }

  S1 could be easily deleted if alias analyizer say a[i] and a[i-1].

Hello,

I’m afraid I don’t understand your question. Could you restate your example and your question, and say what specifically you would like alias analysis to do?

Dan

what specifically you would like alias analysis to do?

Quite honestly, I don’t know what interface is convenient for optimizer. I once implemented a flow-sensitive alias-analysis as a hobby project,
which was able to disambiguate the subscripted variables like a[i] and a[i-1]. Unfortunately, the interface is pretty messy, that
is why I solicit your insight on this issue.

The problem is: in what situations, Alias(a[i], a[i-1]) is desirable to return “alias”, and in what situations,
it is desirable to return “no alias”.

Example 1:

what specifically you would like alias analysis to do?

Quite honestly, I don’t know what interface is convenient for optimizer. I once implemented a flow-sensitive alias-analysis as a hobby project,
which was able to disambiguate the subscripted variables like a[i] and a[i-1]. Unfortunately, the interface is pretty messy, that
is why I solicit your insight on this issue.

The problem is: in what situations, Alias(a[i], a[i-1]) is desirable to return “alias”, and in what situations,
it is desirable to return “no alias”.

LLVM defines “AliasAnalysis” to be an API which never looks across loop iterations, so “no alias” is correct here.

Example 1:

for (i = 0; i < N; i++) {
a[i] = … /* s1 */
sum += a[i-1];
}

In DCE, Alias(a[i], a[i-1]) is desirable to return “alias”. S1 would otherwise be mistakenly deleted.

LLVM’s dead store elimination pass isn’t smart enough to delete this kind of dead store, so its use of the AliasAnalysis API is fine. A more aggressive dead store elimination pass would need to use a higher-level API.

Example 2:

In following contrived example, Alias(a[i], a[i-1]) is desirable to return “no alias” in order to enable the propagation between S1 and S3.

for (i …) {
a[i] = /* S1 /
a[i-1] = /
S2 /
= a[i] /
S3*/
}

As you can see from the above two examples, both answers (“alias” and “not alias”) are correct, but in different context.

Is passing both mem access to Alias() sufficient, or should we also need to pass

  1. the common loop that tightly enclose two memory access, and
  2. boolean value indicate if the optimization is across loop or not.

We likely should, but it the the interface clunky to use. What is your insight into this problem?

One of LLVM’s insights is that simple optimizations which don’t look across loop boundaries, or which only do so in limited and careful ways, are “good enough” in many real-world situations. The AliasAnalysis API is designed to support this kind of use, and it benefits from this simplicity. The AliasAnalysis API can also serve as a base on which higher-level analyses can be built.

Dan

Problem 1:
At a point during program execution where both P and Q denote valid memory locations, can P and Q refer to the same memory location?

Problem 2:
During the execution time of the program, does the set of memory locations denoted by P contain any of the memory locations denoted by Q?

With P="a[i]" and Q="a[i-1]", the answer to problem 1 is "no", while the answer to problem 2 is "yes".

The answer to which one you want depends on the problem you're trying to solve.

-Krzysztof