Understand alias-analysis results

Hello,

I am performing alias analysis toward the following simple code:

struct MyStruct {
int * f1;
int * f2;
};

void NOALIAS(void* p, void* q){
}

int main() {
struct MyStruct s[2];
int a,b;
s[0].f1 = &a;
s[1].f1 = &b;

NOALIAS(s[a].f1, s[b].f2);

return 0;
}

When I use the following command to generate .bc code and conduct alias analysis:

clang -c -emit-llvm t.c -O2

opt -basicaa -aa-eval -print-alias-sets -disable-output t.bc

I got the following outputs:

Alias sets for function ‘NOALIAS’:
Alias Set Tracker: 0 alias sets for 0 pointer values.

Alias sets for function ‘main’:
Alias Set Tracker: 0 alias sets for 0 pointer values.

===== Alias Analysis Evaluator Report =====
1 Total Alias Queries Performed
0 no alias responses (0.0%)
1 may alias responses (100.0%)
0 partial alias responses (0.0%)
0 must alias responses (0.0%)

I checked the generated .ll code, and it shows that within the main function and NOALIAS functions, there is only a “ret” statement, with no global or local variables used. Could anyone shed some lights on where the “1 may alias” come from? And is there a way that I can force the alias analysis algorithm to focus only the “main” function? Thank you very much.

Best,
Shuai

Hello,

I am performing alias analysis toward the following simple code:

[...]

I checked the generated .ll code, and it shows that within the main function and NOALIAS functions, there is only a "ret" statement, with no global or local variables used. Could anyone shed some lights on where the "1 may alias" come from? And is there a way that I can force the alias analysis algorithm to focus only the "main" function? Thank you very much.

Hi!

Here's more information after initializing the variables (assuming the intent in the source code was, e.g., to initialize `a` and `b` to `0` and the pointers `f1` and `f2` to `NULL`, using aggregate initialization for `s`):
- Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/WT7V3E
- [LLVM-IR ->] opt: https://llvm.godbolt.org/z/Veswa4

Alias sets for function 'main': Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x55ec7f9a23e0, 3] may alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(4)), (i32* %a, LocationSize::precise(4))

Note that in the original source code `a`, `b` are uninitialized--consequently, attempting to access `s[a].f1` and `s[b].f2` is undefined behavior (as we're using automatic storage duration objects `a` and `b` while their values are indeterminate): https://taas.trust-in-soft.com/tsnippet/t/acff56c8

Cf. https://cigix.me/c17#6.7.9.p10 ("If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.") & https://cigix.me/c17#J.2.p1
("The behavior is undefined in the following circumstances: [...] The value of an object with automatic storage duration is used while it is indeterminate").

As such, you can notice that most of the code is going to be optimized away between mem2reg and dead argument elimination:
https://llvm.godbolt.org/z/iEdKE_

(Similarly, even if `a` and `b` were initialized to `0`, we only wrote to `f1` for `s[0]` and `s[1]`, so accessing `s[b].f2` is again using an object while it is indeterminate and undefined behavior.)

*** IR Dump After Promote Memory to Register ***

; the following corresponds to loading `s[a].f1`
%3 = load i32, i32* %a, align 4, !tbaa !7
%idxprom = sext i32 %3 to i64
%arrayidx3 = getelementptr inbounds [2 x %struct.MyStruct], [2 x %struct.MyStruct]* %s, i64 0, i64 %idxprom
%f14 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* %arrayidx3, i32 0, i32 0
%4 = load i32*, i32** %f14, align 16, !tbaa !2
%5 = bitcast i32* %4 to i8*

; the following corresponds to loading `s[b].f2`
%6 = load i32, i32* %b, align 4, !tbaa !7
%idxprom5 = sext i32 %6 to i64
%arrayidx6 = getelementptr inbounds [2 x %struct.MyStruct], [2 x %struct.MyStruct]* %s, i64 0, i64 %idxprom5
%f2 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* %arrayidx6, i32 0, i32 1
%7 = load i32*, i32** %f2, align 8, !tbaa !9
%8 = bitcast i32* %7 to i8*
call void @NOALIAS(i8* %5, i8* %8)

*** IR Dump After Dead Argument Elimination ***
; note how the arguments have been rewritten to `undef` in the following:
call void @NOALIAS(i8* undef, i8* undef)

> And is there a way that I can force the alias analysis algorithm to focus only the "main" function?

One way is to make the definition of `NOALIAS` unavailable (as if external) by only providing the declaration (as in the above examples).

Best,
Matt

Hey Matt,

That’s awesome. Thank you very much for all the information and clarification! Just a few follow up questions. Could you kindly shed some lights on it? Thank you!

  1. I tried to tweak the code in the following way:

And i note that the outputs are:


Alias sets for function 'main':

Alias Set Tracker: 2 alias sets for 4 pointer values.

  __AliasSet[0x563faa6c6260, 5] may alias, Mod/Ref   Pointers: (i8* %0, LocationSize::precise(4)), (i32* %a, LocationSize::precise(4)), (i8* %1, LocationSize::precise(4)), (i32* %b, LocationSize::precise(4))__

   1 Unknown instructions:   call void @NOALIAS(i8* nonnull %0, i8* nonnull %1) #3

  **AliasSet[0x563faa6b45e0, 1] must alias, Mod/Ref    forwarding to 0x563faa6c6260**

===== Alias Analysis Evaluator Report =====

  6 Total Alias Queries Performed

  4 no alias responses (66.6%)

  **0 may alias responses (0.0%)**

  0 partial alias responses (0.0%)

  **2 must alias responses (33.3%)**

I am trying to interpret the outputs, so if I understand correctly, the output indicates that we have an alias set of 4 pointers which “potentially” point to the same memory region, correct? Then is there any more accurate analysis pass that I could use to somewhat infer that “there are two must alias sets, each set has two pointers”? Correct me if I was wrong here… Using my local opt (version 6.0), I tried to iterate all feasible alias analysis passes but the results are not changed.

Also, what is the “must alias, Mod/Ref forwarding to 0x563faa6c6260”?

And how to interpret that we have “2 must alias responses”? Where does it come from? And why do we have “0 may alias response”? I would expect to have at least “4 may alias responses” as well?

  1. I note that using the latest opt (version 11.0?) gives different outputs with my local opt (version 6.0). For opt (version 6.0), it reports: 2 alias sets for 2 pointer values.

More importantly, can I expect to get generally better alias analysis results when switching to version 11.0?

Thank you very much!

Best,
Shuai

And another case:

Is there any chance that we can smoothly infer that:

  • x and &c are “must alias”
  • x and b are “must alias”

I don’t know how to interpret the current results, in particular the following outputs:

AliasSet[0x5584ab7e5f30, 1] must alias, Mod/Ref Pointers: (i32** %x, LocationSize::precise(8))

AliasSet[0x5584ab7e6020, 1] must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))

It means we have two “must alias” sets, each of which contain only one pointer? That seems quite confusing to me…

Best,
Shuai

Hi again!

Replying in chronological order:

    Hey Matt,

    That's awesome. Thank you very much for all the information and
    clarification! Just a few follow up questions. Could you kindly shed
    some lights on it? Thank you!

    1. I tried to tweak the code in the following way: [...]

    I am trying to interpret the outputs, so if I understand correctly,
    the output indicates that we have an alias set of 4 pointers which
    "potentially" point to the same memory region, correct? Then is
    there any more accurate analysis pass that I could use to somewhat
    infer that "there are two must alias sets, each set has two
    pointers"? Correct me if I was wrong here.. Using my local opt
    (version 6.0), I tried to iterate all feasible alias analysis passes
    but the results are not changed.

Seems correct, I don't think you'll get more precise results out of the basic-aa pass, note that it has limited context sensitivity:
https://llvm.org/docs/AliasAnalysis.html#the-basic-aa-pass

Compare the results for `test_simple`, `test_in_array`, and `test_same_underlying_object_different_indices`:
https://github.com/llvm/llvm-project/blob/release/10.x/llvm/test/Analysis/BasicAA/struct-geps.ll

    Also, what is the "must alias, Mod/Ref forwarding to 0x563faa6c6260"?

If alias sets have been merged, you'll get the attached node forwarding to the root node; note the comment for `getForwardedTarget` making a reference to union-find:
https://github.com/llvm/llvm-project/blob/release/10.x/llvm/include/llvm/Analysis/AliasSetTracker.h#L281
(with "merge" corresponding to the union-find collapsing for "union", https://en.wikipedia.org/wiki/Disjoint-set_data_structure#Union).

You can see how `AliasSet::mergeSetIn` (called, e.g., by `AliasSetTracker::mergeAliasSetsForPointer`) sets up forwarding for `AS.Forward`: https://github.com/llvm/llvm-project/blob/release/10.x/llvm/lib/Analysis/AliasSetTracker.cpp#L51, https://github.com/llvm/llvm-project/blob/release/10.x/llvm/lib/Analysis/AliasSetTracker.cpp#L301
FWIW, you can use a tracer or manually step through `opt` in a debugger to follow the function calls.

    And how to interpret that we have "2 must alias responses"? Where
    does it come from? And why do we have "0 may alias response"? I
    would expect to have at least "4 may alias responses" as well?

No, "MayAlias" and "MustAlias" are distinct elements in the lattice, cf.
https://llvm.org/docs/AliasAnalysis.html#must-may-or-no
There's a good explanation of the alias analysis queries and responses in the following talk (particularly the part starting with "AA Query" around the 22 min. mark):
“Pointers, Alias & ModRef Analyses” (2018 EuroLLVM Developers’ Meeting: A. Sbirlea & N. Lopes)
https://www.youtube.com/watch?v=r0XVS4Atl3U

When you return `AliasResult` from your analysis you choose one:
https://llvm.org/doxygen/namespacellvm.html#ae1738272abcf2ac638b97e7dc6360cfd

You can see a simple example here (`TARAAResult::alias`): https://blog.tartanllama.xyz/llvm-alias-analysis/

    2. I note that using the latest opt (version 11.0?) gives different
    outputs with my local opt (version 6.0). For opt (version 6.0), it
    reports: 2 alias sets for 2 pointer values.

    More importantly, can I expect to get generally better alias
    analysis results when switching to version 11.0?

I'd assume that generally it shouldn't get worse :slight_smile:

    Thank you very much!

> And another case:
>
> - Clang [-> LLVM-IR]: https://llvm.godbolt.org/z/SGeJZw
> - [LLVM-IR ->] opt: https://llvm.godbolt.org/z/dNi-k2
>
> Is there any chance that we can smoothly infer that:
> - x and &c are "must alias"
> - x and b are "must alias"
>
> I don't know how to interpret the current results, in particular the
> following outputs:
>
> AliasSet[0x5584ab7e5f30, 1] must alias, Mod/Ref Pointers: (i32** %x,
> LocationSize::precise(8))
> AliasSet[0x5584ab7e6020, 1] must alias, Mod Pointers: (i32* %y,
> LocationSize::precise(4))
>
> It means we have two "must alias" sets, each of which contain only one
> pointer? That seems quite confusing to me..

You can add -print-all-alias-modref-info for more detailed information: https://llvm.godbolt.org/z/9njGqx -- you'll notice "MustAlias: i32* %c, i8* %6".

Adding `-evaluate-aa-metadata` for `load` and `store` instructions, https://llvm.godbolt.org/z/YaW1Mb, you'll notice "MustAlias: %0 = load i32**, i32*** %a, align 8 <-> store i32** %b, i32*** %a, align 8"

However, from your results we can already note:

AliasSet[0x5584ab7e5d00, 5] may alias, Mod/Ref Pointers: (i32* %c, LocationSize::precise(4)), (i32** %b, LocationSize::precise(8)), (i32** %0, LocationSize::precise(8)), (i32* %2, LocationSize::precise(4))

Note how in the C source code pointer `b` points to int `c` (`b = &c;`) corresponding to the memory locations (same object in memory when loading from `c` or `*b`). However, pointers `x` and `y` are distinct objects in memory themselves. In general, remember not to confuse pointers with what they point to--here also distinct, since `x` points `b` but `y` points to `c` (I mention this specifically since the desired inference of "x and b are "must alias"" cannot be correct--these are not the same objects in memory).

Best,
Matt

Hello!

Thank you very much! Yes, that makes a lot of sense to me. However, just want to point out two things that are still unclear:

  1. The output contains a alias set of only one element, for instance:
    “must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))”

This one really confused me. I would expect to have an alias set of at least two elements (e.g., two pointers point to the same memory region), otherwise for the above case, $y$ is aliased to whom? I searched all outputs that are related to %y from https://llvm.godbolt.org/z/9njGqx but all I can find is “NoAlias”. Overall, to understand why an alias set can have only one pointer looks quite strange to me…

  1. For the following code chunks:

b = &c;
x = *a;
int y = *x;
MUSTALIAS(x,&c);
MUSTALIAS(x,b);

I don’t get it why (quote from your previous email) “the desired inference of “x and b are “must alias”” cannot be correct–these are not the same objects in memory”. x and b are both pointing to c, right? Am I missing anything here?

Sorry for the potential trouble this may have caused… And thank you in advance!

Best,
Shuai

Hi!

Hello!

Thank you very much! Yes, that makes a lot of sense to me. However, just want to point out two things that are still unclear:

1. The output contains a alias set of only one element, for instance:
"must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))"

This one really confused me. I would expect to have an alias set of at least *two* elements (e.g., two pointers point to the same memory region), otherwise for the above case, $y$ is aliased to whom? I searched all outputs that are related to %y from https://llvm.godbolt.org/z/9njGqx but all I can find is "NoAlias". Overall, to understand why an alias set can have only one pointer looks quite strange to me..

This seems correct: Note that `int y` is a distinct object in memory, not a pointer, and shares no alias sets with anything (we happen to assign the int _value_ we obtain from pointer `x` by dereferencing it `*x`, but that bears no relevance to aliasing here). Perhaps this can help illustrate the scenario (assuming the URL doesn't get mangled):
http://www.pythontutor.com/cpp.html#code=void%20MUSTALIAS(void%20p,%20void%20q%29%20%7B%7D%0A%0Aint%20main%28%29%7B%0A%0A%20%20int%20**a,%20b,%20x%20,c%3B%0A%20%20c%20%3D%2010%3B%0A%20%20a%20%3D%20%26b%3B%0A%20%20b%20%3D%20%26c%3B%0A%20%20x%20%3D%20a%3B%0A%20%20int%20y%20%3D%20x%3B%0A%20%20MUSTALIAS%28x,%26c%29%3B%0A%20%20MUSTALIAS%28x,b%29%3B%0A%20%20return%200%3B%0A%7D&curInstr=12&mode=display&origin=opt-frontend.js&py=cpp&rawInputLstJSON=%5B%5D

Note (in step 13 of 13) how `y` does not alias (it is just an `int` itself) anything (agreeing with NoAlias results you're getting).

2. For the following code chunks:

b = &c;
x = *a;
int y = *x;
MUSTALIAS(x,&c);
MUSTALIAS(x,b);

I don't get it why (quote from your previous email) "the desired inference of "x and b are "must alias"" cannot be correct--these are not the same objects in memory". x and b are both pointing to c, right? Am I missing anything here?

Correct, both `x` and `b` point to `c` (the pointers themselves are distinct, but after dereferencing what they point to is the same location).
Consider the slice on the (first call) to `MUSTALIAS(x, &c);`: https://llvm.godbolt.org/z/YaW1Mb

%a = alloca i32**, align 8
%b = alloca i32*, align 8
%x = alloca i32*, align 8
%c = alloca i32, align 4
. . .
%0 = load i32**, i32*** %a, align 8
%1 = load i32*, i32** %0, align 8
. . .
%4 = load i32*, i32** %x, align 8
%5 = bitcast i32* %4 to i8*
%6 = bitcast i32* %c to i8*
; MUSTALIAS(x, &c);
call void @MUSTALIAS(i8* %5, i8* %6)

Notice the following results:
NoAlias: i32* %4, i32** %x
MayAlias: i32* %4, i32* %c
MustAlias: i32* %4, i8* %5
MayAlias: i32* %4, i8* %6
. . .
MayAlias: i32** %b, i8* %5
NoAlias: i32** %x, i8* %5
MayAlias: i32* %c, i8* %5
MayAlias: i8* %5, i8* %6
MustAlias: i32* %c, i8* %6
. . .
MustAlias: %4 = load i32*, i32** %x, align 8 <-> store i32* %1, i32** %x, align 8

The location in memory of a pointer object `b` may alias location of the first argument `%5` but not the (other memory object) pointer `x`; however, `i32* %c` aliases second argument `i8* %6` (the same memory object) as well as may alias `i8* %5` (which we've obtained from loading the address from pointer `x`; the difference here being between the memory locations storing the pointers themselves, say, on the stack--not aliased, and the memory objects the addresses stored in the pointers refer to, i.e., i32** vs. i32* or i8*).

Best,
Matt

Dear Matt,

Hi!

Hello!

Thank you very much! Yes, that makes a lot of sense to me. However, just
want to point out two things that are still unclear:

  1. The output contains a alias set of only one element, for instance:
    “must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))”

This one really confused me. I would expect to have an alias set of at
least two elements (e.g., two pointers point to the same memory
region), otherwise for the above case, $y$ is aliased to whom? I
searched all outputs that are related to %y from
https://llvm.godbolt.org/z/9njGqx but all I can find is “NoAlias”.
Overall, to understand why an alias set can have only one pointer looks
quite strange to me…

This seems correct: Note that int y is a distinct object in memory,
not a pointer, and shares no alias sets with anything (we happen to
assign the int value we obtain from pointer x by dereferencing it
*x, but that bears no relevance to aliasing here). Perhaps this can
help illustrate the scenario (assuming the URL doesn’t get mangled):
http://www.pythontutor.com/cpp.html#code=void%20MUSTALIAS%28void%20p,%20void%20q%29%20%7B%7D%0A%0Aint%20main%28%29%7B%0A%0A%20%20int%20**a,%20b,%20x%20,c%3B%0A%20%20c%20%3D%2010%3B%0A%20%20a%20%3D%20%26b%3B%0A%20%20b%20%3D%20%26c%3B%0A%20%20x%20%3D%20a%3B%0A%20%20int%20y%20%3D%20x%3B%0A%20%20MUSTALIAS%28x,%26c%29%3B%0A%20%20MUSTALIAS%28x,b%29%3B%0A%20%20return%200%3B%0A%7D&curInstr=12&mode=display&origin=opt-frontend.js&py=cpp&rawInputLstJSON=%5B%5D

Note (in step 13 of 13) how y does not alias (it is just an int
itself) anything (agreeing with NoAlias results you’re getting).

Yes, I think I fully understand that ‘y’ is not a pointer. However, again, what confuses me and seems incorrect is the output of LLVM alias analysis:

must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))”

Isn’t it literally indicating that “i32* y”, denoting a pointer, is must alias with some other pointers?

If so, then why does this set only have one pointer instead of at least two? If not (which makes more sense), then why is “i32* %y” reported and annotated as “must alias, mod”? Both ways do not make sense to me.

Best,
Shuai

This seems analogous to the following results:
- `allmust`: https://github.com/llvm/llvm-project/blob/release/10.x/llvm/test/Analysis/AliasSet/saturation.ll#L4
- `test1`: https://github.com/llvm/llvm-project/blob/release/10.x/llvm/test/Analysis/AliasSet/intrinsics.ll#L3

Note how we obtain four (singleton) alias sets for `allmust`, each containing `i32* %a`, `i32* %b`, `i32* %c`, and `i32* %d`, respectively.
This is similar to how we obtain the (singleton) alias set with `i32* y` (all singleton sets, i.e., single element sets, denoting the reflexivity of aliasing relationship; here: `i32* y` "must alias" itself).

Suppose that right after `int y = *x;` we add a statement `x = &y;` to the C source code: https://llvm.godbolt.org/z/PP6b75
We then obtain the corresponding analysis from opt: https://llvm.godbolt.org/z/WEMzvs

   AliasSet[0x55e1b009c5a0, 6] may alias, Mod/Ref Pointers: (i32* %c, LocationSize::precise(4)), (i32** %b, LocationSize::precise(8)), (i32** %0, LocationSize::precise(8)), (i32* %2, LocationSize::precise(4)), (i32* %y, LocationSize::precise(4))

whereas previously (i.e., before we added `int y = *x;`) we had

     AliasSet[0x55633b7fc150, 5] may alias, Mod/Ref Pointers: (i32* %c, LocationSize::precise(4)), (i32** %b, LocationSize::precise(8)), (i32** %0, LocationSize::precise(8)), (i32* %2, LocationSize::precise(4))

The way I think about it:

- (i32* %c, LocationSize::precise(4)) - LLVM IR level pointer (i32* %c) to runtime stack slot containing the value (integer) of the C source level `int c` (32-bit, or 4 bytes); this is also the LLVM IR level pointer to `i32` returned by `alloca`,

- (i32** %b, LocationSize::precise(8)) - LLVM IR level pointer (i32** %b) to pointer to runtime stack slot containing the value (address) of C source level `int *b` (64-bit, or 8 bytes); this is also the LLVM IR pointer to `i32*` returned by `alloca`.

The reason `(i32* %c, LocationSize::precise(4))` and `(i32** %b, LocationSize::precise(8))` appear together in both alias sets is because of the following LLVM IR statement:

   store i32* %c, i32** %b, align 8

After adding the aforementioned C source level statement `x = &y;`, we can observe that `(i32* %y, LocationSize::precise(4))` got merged into the (latter) alias set, too.

Does this seem reasonable?

Best,
Matt