Alias sets for function ‘NOALIAS’:
Alias Set Tracker: 0 alias sets for 0 pointer values.
Alias sets for function ‘main’:
Alias Set Tracker: 0 alias sets for 0 pointer values.
===== Alias Analysis Evaluator Report =====
1 Total Alias Queries Performed
0 no alias responses (0.0%)
1 may alias responses (100.0%)
0 partial alias responses (0.0%)
0 must alias responses (0.0%)
I checked the generated .ll code, and it shows that within the main function and NOALIAS functions, there is only a “ret” statement, with no global or local variables used. Could anyone shed some lights on where the “1 may alias” come from? And is there a way that I can force the alias analysis algorithm to focus only the “main” function? Thank you very much.
I am performing alias analysis toward the following simple code:
[...]
I checked the generated .ll code, and it shows that within the main function and NOALIAS functions, there is only a "ret" statement, with no global or local variables used. Could anyone shed some lights on where the "1 may alias" come from? And is there a way that I can force the alias analysis algorithm to focus only the "main" function? Thank you very much.
Hi!
Here's more information after initializing the variables (assuming the intent in the source code was, e.g., to initialize `a` and `b` to `0` and the pointers `f1` and `f2` to `NULL`, using aggregate initialization for `s`):
- Clang [-> LLVM-IR]: Compiler Explorer
- [LLVM-IR ->] opt: Compiler Explorer
Alias sets for function 'main': Alias Set Tracker: 1 alias sets for 2 pointer values.
AliasSet[0x55ec7f9a23e0, 3] may alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(4)), (i32* %a, LocationSize::precise(4))
Note that in the original source code `a`, `b` are uninitialized--consequently, attempting to access `s[a].f1` and `s[b].f2` is undefined behavior (as we're using automatic storage duration objects `a` and `b` while their values are indeterminate): T∙Snippet - TrustInSoft Analyzer
Cf. ISO/IEC 9899:2017 ("If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.") & ISO/IEC 9899:2017
("The behavior is undefined in the following circumstances: [...] The value of an object with automatic storage duration is used while it is indeterminate").
As such, you can notice that most of the code is going to be optimized away between mem2reg and dead argument elimination:
(Similarly, even if `a` and `b` were initialized to `0`, we only wrote to `f1` for `s[0]` and `s[1]`, so accessing `s[b].f2` is again using an object while it is indeterminate and undefined behavior.)
*** IR Dump After Promote Memory to Register ***
; the following corresponds to loading `s[a].f1`
%3 = load i32, i32* %a, align 4, !tbaa !7
%idxprom = sext i32 %3 to i64
%arrayidx3 = getelementptr inbounds [2 x %struct.MyStruct], [2 x %struct.MyStruct]* %s, i64 0, i64 %idxprom
%f14 = getelementptr inbounds %struct.MyStruct, %struct.MyStruct* %arrayidx3, i32 0, i32 0
%4 = load i32*, i32** %f14, align 16, !tbaa !2
%5 = bitcast i32* %4 to i8*
*** IR Dump After Dead Argument Elimination ***
; note how the arguments have been rewritten to `undef` in the following:
call void @NOALIAS(i8* undef, i8* undef)
> And is there a way that I can force the alias analysis algorithm to focus only the "main" function?
One way is to make the definition of `NOALIAS` unavailable (as if external) by only providing the declaration (as in the above examples).
That’s awesome. Thank you very much for all the information and clarification! Just a few follow up questions. Could you kindly shed some lights on it? Thank you!
Alias sets for function 'main':
Alias Set Tracker: 2 alias sets for 4 pointer values.
__AliasSet[0x563faa6c6260, 5] may alias, Mod/Ref Pointers: (i8* %0, LocationSize::precise(4)), (i32* %a, LocationSize::precise(4)), (i8* %1, LocationSize::precise(4)), (i32* %b, LocationSize::precise(4))__
1 Unknown instructions: call void @NOALIAS(i8* nonnull %0, i8* nonnull %1) #3
**AliasSet[0x563faa6b45e0, 1] must alias, Mod/Ref forwarding to 0x563faa6c6260**
===== Alias Analysis Evaluator Report =====
6 Total Alias Queries Performed
4 no alias responses (66.6%)
**0 may alias responses (0.0%)**
0 partial alias responses (0.0%)
**2 must alias responses (33.3%)**
I am trying to interpret the outputs, so if I understand correctly, the output indicates that we have an alias set of 4 pointers which “potentially” point to the same memory region, correct? Then is there any more accurate analysis pass that I could use to somewhat infer that “there are two must alias sets, each set has two pointers”? Correct me if I was wrong here… Using my local opt (version 6.0), I tried to iterate all feasible alias analysis passes but the results are not changed.
Also, what is the “must alias, Mod/Ref forwarding to 0x563faa6c6260”?
And how to interpret that we have “2 must alias responses”? Where does it come from? And why do we have “0 may alias response”? I would expect to have at least “4 may alias responses” as well?
I note that using the latest opt (version 11.0?) gives different outputs with my local opt (version 6.0). For opt (version 6.0), it reports: 2 alias sets for 2 pointer values.
More importantly, can I expect to get generally better alias analysis results when switching to version 11.0?
That's awesome. Thank you very much for all the information and
clarification! Just a few follow up questions. Could you kindly shed
some lights on it? Thank you!
1. I tried to tweak the code in the following way: [...]
I am trying to interpret the outputs, so if I understand correctly,
the output indicates that we have an alias set of 4 pointers which
"potentially" point to the same memory region, correct? Then is
there any more accurate analysis pass that I could use to somewhat
infer that "there are two must alias sets, each set has two
pointers"? Correct me if I was wrong here.. Using my local opt
(version 6.0), I tried to iterate all feasible alias analysis passes
but the results are not changed.
Compare the results for `test_simple`, `test_in_array`, and `test_same_underlying_object_different_indices`:
Also, what is the "must alias, Mod/Ref forwarding to 0x563faa6c6260"?
If alias sets have been merged, you'll get the attached node forwarding to the root node; note the comment for `getForwardedTarget` making a reference to union-find:
And how to interpret that we have "2 must alias responses"? Where
does it come from? And why do we have "0 may alias response"? I
would expect to have at least "4 may alias responses" as well?
No, "MayAlias" and "MustAlias" are distinct elements in the lattice, cf. https://llvm.org/docs/AliasAnalysis.html#must-may-or-no
There's a good explanation of the alias analysis queries and responses in the following talk (particularly the part starting with "AA Query" around the 22 min. mark):
“Pointers, Alias & ModRef Analyses” (2018 EuroLLVM Developers’ Meeting: A. Sbirlea & N. Lopes)
2. I note that using the latest opt (version 11.0?) gives different
outputs with my local opt (version 6.0). For opt (version 6.0), it
reports: 2 alias sets for 2 pointer values.
More importantly, can I expect to get generally better alias
analysis results when switching to version 11.0?
I'd assume that generally it shouldn't get worse
Thank you very much!
> And another case:
>
> - Clang [-> LLVM-IR]: Compiler Explorer
> - [LLVM-IR ->] opt: Compiler Explorer
>
> Is there any chance that we can smoothly infer that:
> - x and &c are "must alias"
> - x and b are "must alias"
>
> I don't know how to interpret the current results, in particular the
> following outputs:
>
> AliasSet[0x5584ab7e5f30, 1] must alias, Mod/Ref Pointers: (i32** %x,
> LocationSize::precise(8))
> AliasSet[0x5584ab7e6020, 1] must alias, Mod Pointers: (i32* %y,
> LocationSize::precise(4))
>
> It means we have two "must alias" sets, each of which contain only one
> pointer? That seems quite confusing to me..
You can add -print-all-alias-modref-info for more detailed information: Compiler Explorer -- you'll notice "MustAlias: i32* %c, i8* %6".
Adding `-evaluate-aa-metadata` for `load` and `store` instructions, Compiler Explorer, you'll notice "MustAlias: %0 = load i32**, i32*** %a, align 8 <-> store i32** %b, i32*** %a, align 8"
Note how in the C source code pointer `b` points to int `c` (`b = &c;`) corresponding to the memory locations (same object in memory when loading from `c` or `*b`). However, pointers `x` and `y` are distinct objects in memory themselves. In general, remember not to confuse pointers with what they point to--here also distinct, since `x` points `b` but `y` points to `c` (I mention this specifically since the desired inference of "x and b are "must alias"" cannot be correct--these are not the same objects in memory).
Thank you very much! Yes, that makes a lot of sense to me. However, just want to point out two things that are still unclear:
The output contains a alias set of only one element, for instance:
“must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))”
This one really confused me. I would expect to have an alias set of at least two elements (e.g., two pointers point to the same memory region), otherwise for the above case, $y$ is aliased to whom? I searched all outputs that are related to %y from https://llvm.godbolt.org/z/9njGqx but all I can find is “NoAlias”. Overall, to understand why an alias set can have only one pointer looks quite strange to me…
For the following code chunks:
b = &c;
x = *a;
int y = *x;
MUSTALIAS(x,&c);
MUSTALIAS(x,b);
I don’t get it why (quote from your previous email) “the desired inference of “x and b are “must alias”” cannot be correct–these are not the same objects in memory”. x and b are both pointing to c, right? Am I missing anything here?
Sorry for the potential trouble this may have caused… And thank you in advance!
Thank you very much! Yes, that makes a lot of sense to me. However, just want to point out two things that are still unclear:
1. The output contains a alias set of only one element, for instance:
"must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))"
This one really confused me. I would expect to have an alias set of at least *two* elements (e.g., two pointers point to the same memory region), otherwise for the above case, $y$ is aliased to whom? I searched all outputs that are related to %y from Compiler Explorer but all I can find is "NoAlias". Overall, to understand why an alias set can have only one pointer looks quite strange to me..
Note (in step 13 of 13) how `y` does not alias (it is just an `int` itself) anything (agreeing with NoAlias results you're getting).
2. For the following code chunks:
b = &c;
x = *a;
int y = *x;
MUSTALIAS(x,&c);
MUSTALIAS(x,b);
I don't get it why (quote from your previous email) "the desired inference of "x and b are "must alias"" cannot be correct--these are not the same objects in memory". x and b are both pointing to c, right? Am I missing anything here?
Correct, both `x` and `b` point to `c` (the pointers themselves are distinct, but after dereferencing what they point to is the same location).
Consider the slice on the (first call) to `MUSTALIAS(x, &c);`: Compiler Explorer
The location in memory of a pointer object `b` may alias location of the first argument `%5` but not the (other memory object) pointer `x`; however, `i32* %c` aliases second argument `i8* %6` (the same memory object) as well as may alias `i8* %5` (which we've obtained from loading the address from pointer `x`; the difference here being between the memory locations storing the pointers themselves, say, on the stack--not aliased, and the memory objects the addresses stored in the pointers refer to, i.e., i32** vs. i32* or i8*).
Thank you very much! Yes, that makes a lot of sense to me. However, just
want to point out two things that are still unclear:
The output contains a alias set of only one element, for instance:
“must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))”
This one really confused me. I would expect to have an alias set of at
least two elements (e.g., two pointers point to the same memory
region), otherwise for the above case, $y$ is aliased to whom? I
searched all outputs that are related to %y from https://llvm.godbolt.org/z/9njGqx but all I can find is “NoAlias”.
Overall, to understand why an alias set can have only one pointer looks
quite strange to me…
Note (in step 13 of 13) how y does not alias (it is just an int
itself) anything (agreeing with NoAlias results you’re getting).
Yes, I think I fully understand that ‘y’ is not a pointer. However, again, what confuses me and seems incorrect is the output of LLVM alias analysis:
“must alias, Mod Pointers: (i32* %y, LocationSize::precise(4))”
Isn’t it literally indicating that “i32* y”, denoting a pointer, is must alias with some other pointers?
If so, then why does this set only have one pointer instead of at least two? If not (which makes more sense), then why is “i32* %y” reported and annotated as “must alias, mod”? Both ways do not make sense to me.
Note how we obtain four (singleton) alias sets for `allmust`, each containing `i32* %a`, `i32* %b`, `i32* %c`, and `i32* %d`, respectively.
This is similar to how we obtain the (singleton) alias set with `i32* y` (all singleton sets, i.e., single element sets, denoting the reflexivity of aliasing relationship; here: `i32* y` "must alias" itself).
Suppose that right after `int y = *x;` we add a statement `x = &y;` to the C source code: Compiler Explorer
We then obtain the corresponding analysis from opt: Compiler Explorer
- (i32* %c, LocationSize::precise(4)) - LLVM IR level pointer (i32* %c) to runtime stack slot containing the value (integer) of the C source level `int c` (32-bit, or 4 bytes); this is also the LLVM IR level pointer to `i32` returned by `alloca`,
- (i32** %b, LocationSize::precise(8)) - LLVM IR level pointer (i32** %b) to pointer to runtime stack slot containing the value (address) of C source level `int *b` (64-bit, or 8 bytes); this is also the LLVM IR pointer to `i32*` returned by `alloca`.
The reason `(i32* %c, LocationSize::precise(4))` and `(i32** %b, LocationSize::precise(8))` appear together in both alias sets is because of the following LLVM IR statement:
store i32* %c, i32** %b, align 8
After adding the aforementioned C source level statement `x = &y;`, we can observe that `(i32* %y, LocationSize::precise(4))` got merged into the (latter) alias set, too.