"Best" alias analysis algorithm

Hello,
I'm playing with alias analysis, using the following program:

    %i = external global int ; <int*> [#uses=2]
    
    implementation ; Functions:
    
    int %_Z3bari(int %p) {
    entry:
      %tmp.0 = load int* %i ; <int> [#uses=1]
      %tmp.1 = setgt int %tmp.0, 10 ; <bool> [#uses=1]
      br bool %tmp.1, label %then, label %UnifiedReturnBlock
    
    then: ; preds = %entry
      %tmp.3 = add int %p, 2 ; <int> [#uses=1]
      ret int %tmp.3
    
    UnifiedReturnBlock: ; preds = %entry
      ret int 0
    }
    
    int %_Z3fooi(int %p) {
    entry:
      %tmp.0 = load int* %i ; <int> [#uses=1]
      %tmp.1 = setgt int %tmp.0, 15 ; <bool> [#uses=1]
      br bool %tmp.1, label %UnifiedReturnBlock, label %else
    
    else: ; preds = %entry
      %tmp.3 = call int %_Z3bari( int %p ) ; <int> [#uses=1]
      ret int %tmp.3
    
    UnifiedReturnBlock: ; preds = %entry
      ret int %p
    }
    
For each instruction, I call AliasAnalysis::getModRefInfo with the 'i' global
variable. I tried different alias analyses -- the default one, --aa-eval,
--anders-aa, -steens-aa and -globalsmodref-aa.

The 'i' variable is never modified in the program, however, all analyses
except for -globalsmodref-aa report that the

      %tmp.3 = call int %_Z3bari( int %p ) ; <int> [#uses=1]

instruction can modify 'i'. I'm somewhat surprised, because it looks like
-globalsmodref-aa is the simplest algorithm. So, why the order algorithms
report possible modification?

In fact, if I add

   %j = external global int ;

which is not used *anywhere*, most algorithms report that %j is modified by
the call. Am I missing something, or -globalsmodref-aa is the only alias
analysis that properly handles globals?

- Volodya

Things are actually worse. If I declare two global variables, and modify the
called function so that it changes just one variable, then alias analysis
(with --globalsmodref-aa) reports that the call instruction can write to both
global variables.

The C++ program is at
http://zigzag.lvk.cs.msu.su/~ghost/localize/a.cpp

The LLVM bytecode is at
http://zigzag.lvk.cs.msu.su/~ghost/localize/a.ll

And the output from my code is
http://zigzag.lvk.cs.msu.su/~ghost/localize/output.txt

The code itself is in
http://zigzag.lvk.cs.msu.su/~ghost/localize/code/

The GlobalsModRef::getModRefInfo has this logic:

    // If we are asking for mod/ref info of a direct call with a pointer to a
    // global we are tracking, return information if we have it.
    if (GlobalValue *GV = const_cast<GlobalValue*>(getUnderlyingObject(P)))
       if (GV->hasInternalLinkage())

So, no information is produced for external variables, the function calls
Aliasanalysis::getModRefInfo, which sees that called function may write to
memory, and returns true for all global variables.

Anything I can do about it? What I what is minimally accurate information
about all global variables a function may modify. "Modifies them all" is
clearly not even minimally accurate.

- Volodya

Ok, I've just modified all variables to have internal linkage. OTOH, in my
test case it's definitely clear that no external functions are called and so
the global variables, even though they have external linkage, cannot be
modified by any outside code.

- Volodya

The 'i' variable is never modified in the program, however, all analyses
except for -globalsmodref-aa report that the

     %tmp.3 = call int %_Z3bari( int %p ) ; <int> [#uses=1]

instruction can modify 'i'. I'm somewhat surprised, because it looks like
-globalsmodref-aa is the simplest algorithm. So, why the order algorithms
report possible modification?

That is because none of those algorithms are "context sensitive", and globals mod/ref is. Out of those, globalsmodref-aa is the only one that will get it.

In fact, if I add

  %j = external global int ;

which is not used *anywhere*, most algorithms report that %j is modified by
the call. Am I missing something, or -globalsmodref-aa is the only alias
analysis that properly handles globals?

This is just conservatism in the analysis and can probably be fixed.

You might also try -ds-aa, which is significantly more powerful than any of the rest.

-Chris

I will take a look at this and fix globals-modref. This is definitely over conservative! Thanks for pointing it out!

-Chris

Oh ok, now I remember. The deal here is that "globals-modref" is supposed to be a very simple mod/ref analysis that does no "pointer analysis" at all. In particular, it only tracks globals that it knows "never have their address taken". If a global has non-internal linkage, the compiler can't know that something outside of the program doesn't take its address and pass around its address.

While in this case, globals modref could handle this case, you really want something more powerful like -ds-aa.

-Chris

> The GlobalsModRef::getModRefInfo has this logic:
>
> // If we are asking for mod/ref info of a direct call with a pointer
> to a // global we are tracking, return information if we have it.
> if (GlobalValue *GV =
> const_cast<GlobalValue*>(getUnderlyingObject(P))) if
> (GV->hasInternalLinkage())
>
> So, no information is produced for external variables, the function calls
> Aliasanalysis::getModRefInfo, which sees that called function may write
> to memory, and returns true for all global variables.
>
> Anything I can do about it? What I what is minimally accurate information
> about all global variables a function may modify. "Modifies them all" is
> clearly not even minimally accurate.

Oh ok, now I remember. The deal here is that "globals-modref" is supposed
to be a very simple mod/ref analysis that does no "pointer analysis" at
all. In particular, it only tracks globals that it knows "never have
their address taken". If a global has non-internal linkage, the compiler
can't know that something outside of the program doesn't take its address
and pass around its address.

Even if something outside of the current module has the address of the global,
it cannot modify that global, because the control never leaves the current
module, no?

While in this case, globals modref could handle this case, you really want
something more powerful like -ds-aa.

I'm missed -ds-aa in the list of passes and now tried it. Seems to work fine
even when variables are declared external. Thanks!

- Volodya

In this case, yes, obviously. :wink: The problem is that the pass does not track this possibility. It is meant to be a very simple and quick pass that handles one thing well (non-address taken globals). For example, in this function:

extern int P;
int test(int *Q) { *Q = 1; return P; }

It can't decide whether 'test' could modify P. In practice, MANY trivial cases like this exist where globals mod/ref can't answer, so it does not even attempt to track external globals.

If you're interested in this, a more powerful algorithm like DSA should be used, or you could build a context-sensitive mod/ref follow-on pass that uses a non-context-sensitive alias analysis algorithm (like andersens or steensgaards) as its base.

-Chris