That’s a very long story… let me try to summarize why you can’t do “inttoptr(ptrtoint(x)) → x” blindly (it’s correct in some cases).
- Integers carry no provenance information and can be interchanged at will.
This means that this transformation is always correct:
if (x == y)
f(x);
=>
if (x == y)
f(y);
- There are many pointers whose addresses are equal. For example:
char p[n];
char q[m];
char r[3];
We may have that (int)(p+n) == (int)q == (int)(r-m).
Even if we focus just on inbounds pointers (because we e.g. augmented inttoptr to have an inbounds tag), we can still have 2 pointers with the same address: p+n & q.
- Pointers have provenance. You can’t use p+n to change memory of q.
p[n] = 42; // UB, to make the life of the alias analysis easier
If we put the three pieces together, we get that it’s possible for the compiler to swap a ptrtoint of a dereferenceable pointer with something else and then if you blindly fold the ptrtoint/inttoptr chain, you get a wrong pointer. Something like:
int x = p + n;
int y = q;
if (x == y)
(char)y = 3;
=> (GVN)
int x = p + n;
int y = q;
if (x == y)
(char)x = 3;
=> (invalid fold of inttoptr/ptrtoin chain)
int x = p + n;
int y = q;
if (x == y)
*(p+n) = 3;
=> (access OOB is UB)
int x = p + n;
int y = q;
if (x == y)
UB;
I’ve a few slides on LLVM’s AA that may help: https://web.ist.utl.pt/nuno.lopes/pres/pointers-eurollvm18.pptx
Nuno