inttoptr and noalias returns

That’s a very long story… let me try to summarize why you can’t do “inttoptr(ptrtoint(x)) → x” blindly (it’s correct in some cases).

  1. Integers carry no provenance information and can be interchanged at will.

This means that this transformation is always correct:

if (x == y)

f(x);

=>

if (x == y)

f(y);

  1. There are many pointers whose addresses are equal. For example:

char p[n];

char q[m];

char r[3];

We may have that (int)(p+n) == (int)q == (int)(r-m).

Even if we focus just on inbounds pointers (because we e.g. augmented inttoptr to have an inbounds tag), we can still have 2 pointers with the same address: p+n & q.

  1. Pointers have provenance. You can’t use p+n to change memory of q.

p[n] = 42; // UB, to make the life of the alias analysis easier

If we put the three pieces together, we get that it’s possible for the compiler to swap a ptrtoint of a dereferenceable pointer with something else and then if you blindly fold the ptrtoint/inttoptr chain, you get a wrong pointer. Something like:

int x = p + n;

int y = q;

if (x == y)

(char)y = 3;

=> (GVN)

int x = p + n;

int y = q;

if (x == y)

(char)x = 3;

=> (invalid fold of inttoptr/ptrtoin chain)

int x = p + n;

int y = q;

if (x == y)

*(p+n) = 3;

=> (access OOB is UB)

int x = p + n;

int y = q;

if (x == y)

UB;

I’ve a few slides on LLVM’s AA that may help: https://web.ist.utl.pt/nuno.lopes/pres/pointers-eurollvm18.pptx

Nuno

Thank you, that’s super helpful.

Do we have plans/proposals for how to avoid this? I gather it involves stopping optimization from blindly folding inttoptr(ptrtoint(x))->x, and you’ve mentioned making sure we avoid introducing inttoptr+ptrtoint unnecessarily, is the plan just those things? You also mentioned augmenting inttoptr w/ inbounds and that the folding is “correct in some cases”, does that mean we have plans (or at least a desire) to formulate refined rules for when the folding is possible that will allow more optimization? The slide deck discusses separating the notions of logical pointers vs physical pointers, is that something that anybody is working on changing the code to model?

For context, I’m working on a front-end for a language that I can’t change whose type system doesn’t really distinguish between native pointers and pointer-sized integers, so I can only do so much to avoid creating ptrtoint/inttoptr in the first place. But there are some constructs we can recognize as allocations, I’m hoping to be able to iteratively re-type arithmetic trees rooted at those as pointers/geps.

Thanks,

-Joseph