Hi Jeroen,
My interpretation (well not just mine, we did have discussions about this in
our group)
wrt to restrict handling, is that the use of decrypt/encrypt
triggers undefined behavior.
Yes, that is exactly what I am pushing back against.
I cannot see a
reading
of the standard where this is UB. I also don't think it is the intention of
the
standard to make this UB. Note that the line I showed could be very far away
from the 'restrict' annotation. Basically if this is UB then a 'restrict'
pointer cannot be passed to other functions unless we know exactly that they
do
not do ptr-to-int casts.
Sure, this might be a liberal reading of that sentence wrt to restrict.
And that is how it is done today in the full restrict patches, but of course,
that does not mean that this is where we need to settle on when including the
functionality. It is good to have the reviews that steer us to a solution that
is more broadly applicable.
Fair enough. The standard is certainly not as unambiguous as one would hope.
Having suffered from an endless stream of 'noalias' bugs on the Rust side, I am very excited that this part of LLVM is being overhauled. 
I was hoping at some point to delve into those restrict patches and try to understand them from a PL/semantics perspective, but so far I haven't had the time -- and it's also a large patchset, much of which naturally is about the implementation (which I can't really follow) and not about the high-level description of the LLVM IR spec that makes the new analyses correct.
When/if I find some time -- what would be a good starting point to try to understand the concepts of those patches without having to understand the C++ details?
Now that we are going over the different pieces of the implementation and
see how we can use
them in a broader context, the situation is different: instead of just
tracking
the 'restrict/noalias' provenance, we now want to use that part of the
infrastructure to
track provenance in general. Because of that, it also makes sense to
reconsider what 'policy'
we want to use. In that context, mapping a 'int2ptr' to a
'add_provenance(int2ptr(%Decrypt), null)'
indicating that it can point to anything makes sense, but is still
orthogonal to the infrastructure.
That is not sufficient though. You also need to know that the provenance of
the
'restrict'ed pointer can now be acquired by other pointers created literally
anywhere via int2ptr. *That* is what makes this so tricky, I think.
int foo(int *restrict x) {
*x = 0;
unk1();
assert(*x == 0); // can be optimized to 'true'
unk2((uintptr_t)x);
assert(*x == 0); // can *not* be optimized to 'true'
}
Also for restrict, escape analysis must be done. So also this case can be handled.
Sure, smarter analyses can handle the easy cases, but I was asking about what part of the spec of these operations forces the analysis to work like that. Defeating the analysis is not that hard, so here's another example:
static int foo(int *restrict x, uintptr_t y) {
*x = 0;
unk1();
assert(*x == 0); // can be optimized to 'true'
uintptr_t addr = (uintptr_t)x;
if (addr == y)
unk2(addr);
assert(*x == 0); // can *not* be optimized to 'true'
}
Now we do GVN integer replacement:
static int foo(int *restrict x, uintptr_t y) {
*x = 0;
unk1();
assert(*x == 0); // can be optimized to 'true'
uintptr_t addr = (uintptr_t)x;
if (addr == y)
unk2(y);
assert(*x == 0); // can *not* be optimized to 'true'
}
Now let us assume there is exactly one call site of this function (and foo is static so we know it can't be called from elsewhere, or maybe we are doing LTO), which looks like
foo(ptr, (uintptr_t)ptr);
This means we know that the "if" in "foo" will always evaluate to true, so we have
static int foo(int *restrict x, uintptr_t y) {
*x = 0;
unk1();
assert(*x == 0); // can be optimized to 'true'
uintptr_t addr = (uintptr_t)x;
unk2(y);
assert(*x == 0); // can *not* be optimized to 'true'
}
Now we can (seemingly) optimize away the "addr" variable entirely -- but at that point, there is no clue left for escape analysis to know that "unk2" might legally mutate "x".
That's why I am saying that with 'restrict', we have to treat ptr-to-int casts as side-effecting, and cannot optimize them away even if their result is unused.
They *always* have an "escape" effect, no matter what happens with their result.
Kind regards,
Ralf