__main() function and AliasSet

In a code segment of my pass plugin, I try to gather AliasSets for all StoreInst, LoadInst and CallInst instructions in a function.
Some behaviors of the pass puzzled me.
Below is the *.ll of the test program which I run the pass on,
it was get with "llvm-gcc -Wl,--disable-opt" from a rather simple *.c program.

Oh, I appologize that I should not have asked about __main() ---- it appears in FAQ.
But the question remains that why call to __main() can alias stack location?
I think the memory location pointed by data_X pointers are not visible to __main().
In comparison, calls to printf() do not have similar effect.

Oh, I appologize that I should not have asked about __main() ---- it appears
in FAQ.
But the question remains that why call to __main() can alias stack location?
I think the memory location pointed by data_X pointers are not visible to
__main().
In comparison, calls to printf() do not have similar effect.

First, some background: -steens-aa and -anders-aa work reasonable well, but aren't production quality. In particular, they both assume that printf doesn't have side effects, when (in fact) printf can on certain GNU systems when the right format string is used. This is why they both think that printf has no side effects: they special case it.

In practice, aliasing is a bunch of heuristics, and you cannot ever be guaranteed to get an exact answer. As an example of this, both of these passes are "context insensitive". As such, they don't know anything about what the effect of a call is, so the call to __main (even though they could theoretically know) is treated quite conservatively.

There are a couple of different options you have here. The alias passes can be combined together, so something like this:

opt -globalsmodref-aa -steens-aa ...

should be able to tell that __main has no side effects. globalsmodref-aa is a production quality pass that does some simple context sensitive analysis (such as noticing functions with no side effects at all).

Another option is the -ds-aa pass. This pass is very powerful, but is also the farthest from production quality. That said, it does get almost all common things right, it just has some bugs in areas like variable length arrays etc.

-Chris

Hi Chris,

I took a haste look at the "Points-to Analysis in Almost Linear Time" by Steens , your PHD thesis
and SteensGaard.cpp in LLVM this afternoon.
So I think:
1. Actually the basic algorithm described originally by SteensGaard does not provide MOD/REF information for functions.
2. The context insensitive part of Data Structure Analysis (LocalAnalysis) can be deemed as
an enhancement of Steens' with field sensitive and MOD/REF information
3. The implementation of -steens-aa in LLVM is through Data Structure Analysis with MOD/REF logic for functions as below:
  i. inComplete nodes may MODREF by all function.
        ii. external function does not MOD/REF any complete node.
        iii if any function MOD/REF a node then all function considered as MOD/REF the node.

In other words, if I only use -steens-aa and the data_XXXs are all external global variables( and so inComplete ),
the call to printf will make the same effect, which I have tested it.

Am I right ? :slight_smile:

In other words, if I only use -steens-aa and the data_XXXs are all external global variables( and so inComplete ),

Sounds right!

the call to printf will make the same effect, which I have tested it.

Am I right ? :slight_smile:

If you've tested it then, yes you're right :). I haven't played with this stuff for a long time, but I was pretty sure that steens-aa and ds-aa had a special case for printf so that it thinks that printf does not mod/ref *anything*, including external globals.

My guess is that steens-aa gives up earlier because it's an external global and it's an external function call or something. You'd have to trace through the logic of the code to be sure.

-Chris

Unfortunately, I did not locate the lines in steens-aa for "printf" special case.
In ds-aa, I found the lines below:

Unfortunately, I did not locate the lines in steens-aa for "printf" special case.
In ds-aa, I found the lines below:

Right, steens-aa and ds-aa share code for "local analysis", they just stitch it together into an interprocedural analysis in different ways. The code below is used for steens-aa.

--------------------------------
if (F->getName() == "printf" || F->getName() == "fprintf" ||
                  F->getName() == "sprintf") {
         CallSite::arg_iterator AI = CS.arg_begin(), E = CS.arg_end();

...

         for (; AI != E; ++AI) {
           // printf reads all pointer arguments.
           if (isPointerType((*AI)->getType()))
             if (DSNode *N = getValueDest(**AI).getNode())
               N->setReadMarker();
         }
-----------------------------
So from my point of view, the ds-aa thinks "printf reads all pointer arguments",

Right.

but does not take into account the "%n" format string , which may cause "printf" MOD to a specific address.

Right. As I mentioned before, steens-aa is not "production quality". For example, it doesn't handle this. The right approach would change this code to:

1. Check to see if there is a constant format string. If not, handle it conservatively.
2. Scan the format string, matching up format specs to the arguments. Set the mod/ref bits for each thing as appropriate.

In addition to getting the mod bit right for %n, this would allow the ref bit to not be set for %p, for example.

As for steens-aa, I think the logic for mod/ref is very clear:

Right, this is the logic for querying the graph built by steens.

So steens-aa alone thinks "printf" ModRef *anything* that is inComplete(e.g. external globalVar) .

Makes sense.

-Chris

My guess is that steens-aa gives up earlier because it's an external
global and it's an external function call or something. You'd have to
trace through the logic of the code to be sure.

-Chris

Thank you very much for your detailed help.
You are definitely a good man. :slight_smile:
I feel so much to learn.

Happy to help!

-Chris

Oh, I appologize that I should not have asked about __main() ---- it appears
in FAQ.
But the question remains that why call to __main() can alias stack location?
I think the memory location pointed by data_X pointers are not visible to
__main().
In comparison, calls to printf() do not have similar effect.

First, some background: -steens-aa and -anders-aa work reasonable well,
but aren't production quality. In particular, they both assume that
printf doesn't have side effects, when (in fact) printf can on certain GNU
systems when the right format string is used. This is why they both think
that printf has no side effects: they special case it.

In practice, aliasing is a bunch of heuristics, and you cannot ever be
guaranteed to get an exact answer. As an example of this, both of these
passes are "context insensitive". As such, they don't know anything about
what the effect of a call is, so the call to __main (even though they
could theoretically know) is treated quite conservatively.

There are a couple of different options you have here. The alias passes
can be combined together, so something like this:

opt -globalsmodref-aa -steens-aa ...

should be able to tell that __main has no side effects. globalsmodref-aa
is a production quality pass that does some simple context sensitive
analysis (such as noticing functions with no side effects at all).

Another option is the -ds-aa pass. This pass is very powerful, but is
also the farthest from production quality. That said, it does get almost
all common things right, it just has some bugs in areas like variable
length arrays etc.

-Chris

In a code segment of my pass plugin, I try to gather AliasSets for all
StoreInst, LoadInst and CallInst instructions in a function.
Some behaviors of the pass puzzled me.
Below is the *.ll of the test program which I run the pass on,
it was get with "llvm-gcc -Wl,--disable-opt" from a rather simple *.c
program.

----------------------------------
; ModuleID = 'ptralias.bc'
target endian = little
target pointersize = 32
target triple = "i686-pc-linux-gnu"
deplibs = [ "c", "crtend" ]
%.str_1 = internal constant [25 x sbyte] c"ptra=0x ptrb=0x
ptrc=0x\0A\00" ; <[25 x sbyte]*> [#uses=1]
%ptr = weak global void ()* null ; <void ()**> [#uses=0]

implementation ; Functions:

declare int %printf(sbyte*, ...)

void %foo1() {
        ret void
}

void %foo2() {
        ret void
}

int %main(int %argc, sbyte** %argv) {
entry:
        %data_b = alloca int ; <int*> [#uses=2]
        %data_c = alloca int ; <int*> [#uses=1]
        %data_d = alloca int ; <int*> [#uses=3]
        %data_e = alloca int ; <int*> [#uses=2]
        %data_f = alloca int ; <int*> [#uses=2]
        call void %__main( )
        store int 2, int* %data_b
        store int 3, int* %data_c
        store int 4, int* %data_d
        store int 5, int* %data_e
        store int 6, int* %data_f
        switch int %argc, label %switchexit [
                 int 3, label %label.3
                 int 2, label %then.2
                 int 1, label %label.1
                 int 0, label %endif.2
        ]

label.1: ; preds = %entry
        br label %switchexit

label.3: ; preds = %entry
        br label %then.2

switchexit: ; preds = %label.1, %entry
        %ptr_b.0 = phi int* [ %data_d, %label.1 ], [ null, %entry ] ;
<int*> [#uses=1]
        br label %endif.2

then.2: ; preds = %label.3, %entry
        %ptr_a.1.0 = phi int* [ %data_f, %label.3 ], [ %data_e, %entry
] ; <int*> [#uses=1]
        store int 0, int* %ptr_a.1.0
        br label %then.3

endif.2: ; preds = %switchexit, %entry
        %ptr_b.0.1 = phi int* [ %ptr_b.0, %switchexit ], [ %data_b, %entry
] ; <int*> [#uses=2]
        %tmp.12 = seteq int* %ptr_b.0.1, null ; <bool> [#uses=1]
        br bool %tmp.12, label %then.4, label %then.3

then.3: ; preds = %endif.2, %then.2
        %ptr_b.0.2 = phi int* [ %data_d, %then.2 ], [ %ptr_b.0.1, %endif.2
] ; <int*> [#uses=1]
        store int 0, int* %ptr_b.0.2
        %tmp.1913 = call int (sbyte*, ...)* %printf( sbyte* getelementptr
([25 x sbyte]* %.str_1, int 0, int 0) ) ; <int> [#uses=0]
        ret int 0

then.4: ; preds = %endif.2
        %tmp.19 = call int (sbyte*, ...)* %printf( sbyte* getelementptr
([25 x sbyte]* %.str_1, int 0, int 0) ) ; <int> [#uses=0]
        ret int 0
}

void %__main() {
entry:
        ret void
}

----------------------------------
I think the right AliasSet information calculated for this program should
be

  Information for alias set0:
      pointer0=data_b
      pointer1=data_d
      pointer2=ptr_b.0.2
  Information for alias set1:
      pointer0=data_c
  Information for alias set2:
  Information for alias set3:
      pointer0=data_e
      pointer1=data_f
      pointer2=ptr_a.1.0
  Information for alias set4:
  Information for alias set5:

,where the empty AliasSets I think should be "Forwarded".

However, the result of the pass was:

  Information for alias set0:
    pointer0=data_b
    pointer1=data_d
    pointer2=data_e
    pointer3=data_f
    pointer4=ptr_a.1.0
    pointer5=ptr_b.0.2
  Information for alias set1:
    pointer0=data_c
After I deleted "call void %__main( )" in %main(), the pass get the right
answer.

So my question is:

1. What is the purpose for call to __main() ? __main() is just a empty
function. My .c program only contains main().
2. My explanation for this behavior is that the CallSite of "call void
%__main( )" alias some pointers in the program,
   however, __main() is obviously empty, why should the AliasAnalysis
think that it may/must Mod/Ref any stack location of main()?

btw: the AliasAnalysis pass I used is -steens-aa and it also the same with
-anders-aa.

--
Regards,
Nai

-Chris

-Chris

-Chris

-Chris

> Unfortunately, I did not locate the lines in steens-aa for "printf" special case.
> In ds-aa, I found the lines below:

Right, steens-aa and ds-aa share code for "local analysis", they just
stitch it together into an interprocedural analysis in different ways.
The code below is used for steens-aa.

> --------------------------------
> if (F->getName() == "printf" || F->getName() == "fprintf" ||
> F->getName() == "sprintf") {
> CallSite::arg_iterator AI = CS.arg_begin(), E = CS.arg_end();
...
> for (; AI != E; ++AI) {
> // printf reads all pointer arguments.
> if (isPointerType((*AI)->getType()))
> if (DSNode *N = getValueDest(**AI).getNode())
> N->setReadMarker();
> }
> -----------------------------
> So from my point of view, the ds-aa thinks "printf reads all pointer arguments",

Right.

> but does not take into account the "%n" format string , which may cause
> "printf" MOD to a specific address.

Right. As I mentioned before, steens-aa is not "production quality". For
example, it doesn't handle this. The right approach would change this
code to:

1. Check to see if there is a constant format string. If not, handle it
conservatively.
2. Scan the format string, matching up format specs to the arguments. Set
the mod/ref bits for each thing as appropriate.

In addition to getting the mod bit right for %n, this would allow the ref
bit to not be set for %p, for example.

Thanks for your advice. :slight_smile: