Daniel,
Indulge me one more mail and I will go away … (or I’ll try to
yes, it does, but this is somewhat irrelevant to C.
The pointers will not be of type struct.a , they will be of type
int ([100])) or int * or whatever
But using the type system, aliasing, and language rules you can say which types might alias to other types:
A char*, void* , int*, and an int (*) pointer (and others) might alias with int (*structS::A)[100] in C (strict aliasing).
foo(struct a f, int i, int y) {
f.A[i] =
= f.B[y]
}
I believe, you could emit more expressive tbaa tags than we currently do. See below. And this may improve things sufficiently to be worth it. By just looking at LLVM IR you cannot assume no-alias in the example above, but your language might allow assuming so. And without changing the semantics of LLVM IR, or doing some global range analysis (which you might not be able to do), or inserting runtime checks, I don’t see any way to conclude no-alias in the above example besides some TBAA.
Backend developers like to do this
An int pointer load/store would have a TBAA tag of ‘int *’ which in my TBAA DAG aliases with structS:A/B so we would give a safe answer (albeit possibly more conservative than necessary).
= (int)ptr // = load ptr, !tbaa !“int *”
I am not sure what you mean by this. I believe you can give a correct answer using TBAA DAGs (see farther below). However, I do understand that you will get a better / less conservative answer in many cases by doing pointer analysis.
int (*ptr)[100] = S.A;
int *ptr2 = ptr;
*ptr2 ; // points-to analysis can tell me that ptr2 points-to S.A but not S.B while TBAA would tell me they both may-alias ptr2
Note, that our current chained BasicAA and TBAA approach would also catch simple case like the one above without doing full scale global points-to analysis.
Yes, indeed I forgot int ()[]. I admit my DAG was not complete you would also have to have int (), void*, char* - and I am sure I forgot some more (structB::int*, structB::char*, stuctA::structB::int*, etc) - pointing to structS::A.
Say you have
*&S.A[0] = val; // The front-end knows you have a structS::A access.
int *ptr2 = ptr; // The front knows that you have a int() access.
The front-end would generate
store sa, val, !tbaa “int (structS::A)[]"
ptr2 = load %ptr, !tbaa "int()”
and if we use a more complete TBAA type DAG we could conclude that they may-alias because the TBAA DAG tells us:
However, for the access below we could conclude that they don’t alias (because structS::A cannot not point to structS::B and vice versa).
foo(struct a f, int i, int y) {
f.A[i] = ; // store ptr2, !tbaa !“structS::A”
= f.B[y]; // load ptr, !tbaa !“structS::B”
}
More examples:
struct N{
int a[9];
struct A {
int f[9];
int g[9];
int *ptr;
} ;
}
in this case an access using a &N.a, &N.f pointer would not alias (point to each other in the TBAA DAG) but a N.ptr integer pointer would point to either in the DAG.
Okay.
When I hear fields I think types. When I think types, I assume that it is assumed (I made this mistake the first time I looked at BasicAA) that you can treat the type as a guarantee about certain properties including things like that you are not allowed to walk beyond array bounds which would imply that you can treat an access over type of structs/arrays like a path in a tree (I wondered why we don’t do this the first time I looked at BasicAA - until I dug deeper).
My first (wrong) intuition was the set of possible accessible locations in the following two geps is disjunct
= getelementptr inbounds %struct.anon* @Foo, i32 0, i32 0, …
= getelementptr inbounds %struct.anon* @Foo, i32 0, i32 1, …
Hence, my initial warning about “understanding independent objects inside a structure”.
I am seeing TBAA as a way to give me similar guarantees on top of the existing semantics of the IR.
From what I read (it’s been quite a while) t’s actually not valid in
C, but valid in LLVM IR.
At least, as written.§6.5.2.1, paragraph 2:
A postfix expression followed by an expression in square brackets
is a subscripted designation of an element of an array object. The
definition of the subscript operator is that E1[E2] is identical to
(*((E1)+(E2))). Because of the conversion rules that apply to the
binary + operator, if E1 is an array object (equivalently, a pointer
to the initial element of an array object) and E2 is an integer,
E1[E2] designates the E2-th element of E1 (counting from zero).So it’s going to be a pointer plus an address.
§6.5.6, paragraph 8:
When an expression that has integer type is added to or subtracted
from a pointer, the result has the type of the pointer operand. If the
pointer operand points to an element of an array object, and the array
is large enough, the result points to an element offset from the
original element such that the difference of the subscripts of the
resulting and original array elements equals the integer expression.
In other words, if the expression P points to the i-th element of an
array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N
(where N has the value n) point to, respectively, the i+n-th and
i-n-th elements of the array object, provided they exist. Moreover, if
the expression P points to the last element of an array object, the
expression (P)+1 points one past the last element of the array object,
and if the expression Q points one past the last element of an array
object, the expression (Q)-1 points to the last element of the array
object. If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the array
object, the evaluation shall not produce an overflow; otherwise, the
behavior is undefined. If the result points one past the last element
of the array object, it shall not be used as the operand of a unary *
operator that is evaluated.you fall into the “otherwise, the behavior is undefined”.
It would be explicitly valid to convert it to char *, and read it up
to and through B, but not using the array access operator.
Yes, I vaguely remember reading something like this just was not 100% sure anymore. Thanks for looking it up.
IE char * foo = &
<walk through A, get to B>
Thanks,Arnold