I have a question about when we should apply these pointer aliasing
rules. Do the rules tell us when a load/store is safe?
"Any memory access must be done through a pointer value associated
with an address range of the memory access, otherwise the behavior is
undefined."
I don't think the pointer aliasing rules indicate when a memory access is safe. Rather, they set down rules for what the compiler can consider to be defined and undefined behavior. It lays down the law for what optimizations are considered correct and which are not.
So this means the conversion discussed here is still safe in terms of
memory safety, but its meaning after conversion could be weird. Am I
correct?
I am not sure what you mean. However, if you're asking whether casting a pointer to an integer and then casting the integer back to a pointer is correct, I believe the answer is yes. We certainly treat it that way in SAFECode although in the current implementation, it can weaken the safety guarantees. Our points-to analysis, DSA, doesn't track pointers through integers, and so SAFECode uses more lenient checks on pointer values coming from inttoptr casts; DSA can't always guarantee that it knows everything about the memory objects feeding into it.
That is, consequently, one of the reasons why we'd like to do Arushi's transformation. It will make DSA less conservative and SAFECode more stringent.
Then it comes to my another question. The base-on relation has this rule:
"A pointer value formed by an inttoptr is based on all pointer values
that contribute (directly or indirectly) to the computation of the
pointer's value."
Suppose an int value 'i' is computed by a lot of int variables that
are converted from ptr (p1,p2...pn) by ptrtoint, then if we inttoptr i
to a point p, how should I decide which pointer value the 'p' forms?
If those p_j are ptrtoint to a i_j, and the computation for i is i =
i_0 + i_1 + ... i_n, does it mean
we can take either p_j as a base pointer, and other int variables
its offset, say we take p_2 as the base pointer, and the p from i
points to
p_2 + (i_0 + i_1 + i_3 + .. i_n)
?
So, in your example, if you do:
i1 = ptrtoint p1;
i2 = ptrtoint p2;
...
in = ptrtoint pn;
i = i1 + i2 ... + in;
p = inttoptr i;
..., then p can point to any memory object p1, p2, ... pn. The reasoning is that the integer add instruction obscures which integer is the base pointer and which is the index, so the aliasing rules conservatively assume that either operand is the base pointer.
So in the transformation example, the result is different when we take
%196 or %193 as a base pointer.
Yes, which is why the transform that Arushi suggested is not legal unless you can prove that %196 can't be a pointer to a memory object.
For alias-analysis, we may say the p can point to a memory any of the
p_j points to. But if we consider memory safety, should we say p is
safe to access if p is not out-of-bound no matter which p_j is taken
as a base pointer?
That is how I would interpret memory safety: p is safe if it is within the bounds of any of the p_j memory objects.
Could anyone explain this rule more precisely? For
example, how can we find "
all pointer values that contribute (directly or indirectly)" ?
I think this can be conservatively done using simple data-flow analysis. The only tricky part is when a pointer travels through memory (i.e., it is stored into memory by a store instruction and loaded later by a load instruction). An enhanced version of DSA which tracks pointers through integers could handle this.
This would be helpful to understand
http://llvm.org/docs/GetElementPtr.html#ptrdiff
http://llvm.org/docs/GetElementPtr.html#null
which suggest that we can do some 'wild' pointer arithmetic by
inttoptr and ptrtoint.
For example, given a pointer p, can we safely do?
i = ptrtoint p;
j = i + null;
q = inttoptr j;
v = load q;
That's a weird one (aside: you need to cast NULL to int first before using it in the add). Since NULL doesn't point to a valid memory range, it may be that you can technically consider q to just point to p. However, I'm not sure about that; maybe q is technically aliased with null and can point to some offset of NULL.
However, in practice, even if the aliasing rules say that q can point to p or some offset of NULL, I would say that q points to just p since you know (for most implementations) that NULL is equivalent to zero.
-- John T.