[RFC] Introducing a byte type to LLVM

Ralf_Jung · June 20, 2021, 10:22am

Hi David,

    Integers demonstrably do not carry provenance; see
    <Pointers Are Complicated II, or: We need better language specs
    <https://www.ralfj.de/blog/2020/12/14/provenance.html>> for a detailed
    explanation of why.
    As a consequence of this, ptr-int-ptr roundtrips are lossy: some of the
    original
    provenance information is lost. This means that optimizing away such roundtrips
    is incorrect, and indeed doing so leads to miscompilations
    (34548 – InstCombine cannot blindly assume that inttoptr(ptrtoint x) -> x
    <34548 – InstCombine cannot blindly assume that inttoptr(ptrtoint x) -> x).

    The key difference between int and byte is that ptr-byte-ptr roundtrips are
    *lossless*, all the provenance is preserved. This means some extra
    optimizations
    (such as removing these roundtrips -- which implicitly happens when a
    redundant-store-after-load is removed), but also some lost optimizations (most
    notably, "x == y" does not mean x and y are equal in all respects; their
    provenance might still differ, so it is incorrect for GVN to replace one my the
    other).

    It's a classic tradeoff: we can *either* have lossless roundtrips

I think an important part of explaining the motivation for "byte" would be an explanation/demonstration of what the cost of losing "lossless roundtrips" would be.

I am not entirely sure where you are going with this question. Currently LLVM assumes *both* lossless ptr-int-ptr roundtrips *and* it goes GVN based on "x == y" (on integers). This is simply inconsistent and demonstrably leads to miscompilations. One of them needs to be lost.
Consensus in LLVM seems to be that one would rather lose lossless roundtrip than GCN based on "==". I am not an expert in these trade-offs among optimizations. All I can do is cast some light on where the edge of the design space for a correct set of optimizations lies. I will leave it to others to decide where in that design space they'd rather be.

Kind regards,
Ralf

Juneyoung_Lee · June 21, 2021, 6:15am

Hi,
Sorry for my late reply, and thank you for sharing great summaries & ideas. I’ll leave my thoughts below.

Okay, so let me try to restate and summarize all this. I’ve CC’ed

a bunch of people back into this part of the thread.

C is moving towards a provenance model; you can find the details in
this committee TR that Joshua Cranmer linked:
http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf

This TR is very clearly a work in progress and contains many
digressions and several possible sets of rules with different
implications. I will try to briefly summarize.

Now, that rule as I’ve stated it would be really bad. Allowing a
lucky guess to resolve to absolutely anything would almost
completely block the optimizer from optimizing memory. For example,
if a local variable came into scope, and then we called a function
that returned a different pointer, we’d have to conservatively
assume that that pointer might alias the local, even if the address
of the local was never even taken, much less escaped:
  int x = 0;
  int *p = guess_address_of_x();
  *p = 15;
  printf(“%d\n”, x); // provably 0?
So the currently favored proposal adds a really important caveat:
this blessing of provenance only works when a pointer with the
correct provenance has been “exposed”. There are several ways to
expose a pointer, including I/O, but the most important is casting
it to an integer.

This is a valid point. If one wants to formally show the correctness of this kind of memory optimization this problem should be tackled.
I think n2676’s ‘Allocation-address nondeterminism’ (p. 27) paragraph addresses this issue.
The underlying idea is that the address of an allocated object is assumed to be non-deterministically chosen, causing any guessed accesses to raise undefined behavior in at least one execution.

Again, there’s no requirement of a data dependence between the
exposure and the int-to-pointer cast. If the program casts a
pointer to an integer, and an independently-produced integer
that happens to be the same value is later cast to a pointer,
and the storage hasn’t been reallocated in the meantime, the
resulting pointer will have the right provenance for the memory
and will be valid to use. This implies that pointer-to-int casts
(and other exposures) are semantically significant events in the
program. They don’t have side effects in the normal sense, but
they must be treated by the compiler just like things that do have
side effects: e.g. unless I’m missing something in the TR,
eliminating a completely unused pointer-to-int cast may make
later code UB.

And in fact, it turns out that this is crucially important for
optimization. If the optimizer wants to allow arbitrary
replacement of integers based on conditional equality, like
in GVN, then replacement totally breaks direct data dependence,
and you can easily be left with no remaining uses of a pointer-to-int
cast when the original code would have had a data dependence. So
you cannot reason about provenance through int-to-pointer casts:
the pointer can alias any storage whose provenance has potentially
been exposed, and the optimizer must be conservative about optimizing
memory that has potentially been exposed.

+1, due to this reason the casting semantics cannot be directly used for LLVM’s ptrtoint/inttoptr.

Everything I’ve been talking about so far is a C-level concept:
an int-to-pointer cast is e.g. (float*) myInt, not inttoptr
in LLVM IR. But I think people have an expectation of what these
things mean in LLVM IR, and I haven’t seen it written out explicitly,
so let’s do that now.

The first assumption here is that int-to-pointer and pointer-to-int
casts in C will translate to inttoptr and ptrtoint operations
in IR. Now, this is already problematic, because those operations
do not currently have the semantics they need to have to make the
proposed optimization model sound. In particular:

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

ptrtoint on a constant is folded to a constant expression,
not an instruction, which is therefore no longer anchored in the
code and does not reliably record that the global may have escaped.
(Unused constant expressions do not really exist, and we absolutely
cannot allow them to affect the semantics of the IR.)

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.

inttoptr(ptrtoint) can be peepholed; ignoring the dead-stripping
aspects of removing the inttoptr, this also potentially
introduces UB because the original inttoptr “launders” the
provenance of the pointer to the current provenance of the
storage, whereas the original pointer may have stale provenance.

All of these concerns are valid.

(I’m not sure whether this is a good place to introduce this, but) we actually have semantics for pointer castings tailored to LLVM (link).
In this proposal, ptrtoint does not have an escaping side effect; ptrtoint and inttoptr are scalar operations.
inttoptr simply returns a pointer which can access any object.
Combined with the address nondeterminism that is described above, unescaped objects can be effectively left untouched from other memory accesses.

Also, the following optimizations can be explained:

The aliasing property of ‘gep inbounds p’ is supported: dereferencing ‘gep inbounds p, 1’ must raise UB if either p or p+1 isn’t in bounds of p’s object (provenance)
‘gep (inttoptr i), idx’ is equivalent to ‘i + idx’ (same value and same level of undefinedness)
gep and gep inbounds is a scalar operation (can be freely reordered w.r.t ptrtoint/inttoptr/lifetime/free/…)
gep’s natural properties are supported: stripping off inbounds tag, ‘gep (gep (inttoptr i), i1), i2’ → ‘gep (inttoptr i), i1+i2’

There are a few transformations that become hard to explain, but perhaps the biggest one is ‘inttoptr(ptrtoint p)’ → p.

I don’t find either side of this argument very convincing.

First, the compiler already has to be very conservative about
memory. If a pointer is stored into memory, we generally have
to treat it as having fully escaped: unless we can prove that the
memory isn’t loaded by other code, we have to assume that it is,
and that the loading code will do arbitrary other stuff. That
could include doing a pointer-to-int cast and sharing the pointer
back to us as an integer. Therefore, in the general case, our
ability to optimize when a pointer has escaped into memory is at
least as bad as if it had escaped via an int-to-pointer cast. If
we can nonetheless optimize, because we can reason about some of
the uses together or prove that there aren’t any other uses,
then okay, maybe we see that there’s an int<->pointer conversion.
But translating this to ptrtoint/inttoptr should be, at
worst, overly conservative; it’s not unsound, for reasons I’m
about to get into.

Second, adding casts through an integer type is always valid.
Doing so might block the optimizer, but it doesn’t break semantics.
If I have a program that does e.g *px = 15, and I change it to
do *(int*)(intptr_t)px = 15, my program has become well-defined
in strictly more situations: in any case, there must be valid
storage at px for this not to be UB, but previously px might
have had the wrong provenance, and now it never does as long as
the provenance for that storage has previously escaped.

I agree. Transforming ‘p’ into ‘inttoptr(ptrtoint p)’ should not make the program undefined, and it can be used to address the correctness issue of these kinds of problems.

If we find that we’re generating too many unnecessary casts
through integer types and it’s really blocking the optimizer too
much, then we should consider the best solutions to those problems
as they arise. It may be the case that we need a better solution
for type conversions introduced through manual memcpy-like code
so that we maintain the original provenance instead of adding
explicit int<->pointer conversions that launder provenance.
I don’t know that byte types are the best solution to that, but
we can consider them.

But this whole conversation about byte types seems to be coming
at it the wrong way around. We need to be centering the first
set of problems around int<->pointer casts.

John.

As the first step, I made a patch to LangRef for differentiation of int and ptr: https://reviews.llvm.org/D104013 . It is currently under review.

Maybe we can pursue two-track:
(1) gradually disabling the ‘inttoptr(ptrtoint p) → p’ folding.

For example, to fix https://bugs.llvm.org/show_bug.cgi?id=34548, disabling it when p’s underlying object is alloca would be enough (I guess).
(2) inspecting how byte types can help revive optimizations.
For example, loop idiom recognition on memcpy-like loops should be removed otherwise. Its performance impact should be checked.

Thanks,
Juneyoung

Juneyoung_Lee · June 21, 2021, 6:39am

Hello Andrew,

We can’t retroactively weaken inttoptr semantics relative to what frontends rely on–it should continue to have the semantics of a pointer cast in C, which need to be handled more conservatively in LLVM.

I think it depends. If necessary, ptr/int cast semantics needs to be weakened, as done in signed overflow.Signed overflow is undefined behavior in C, but it yields poison in LLVM because the C semantics prohibits free code motion of add nsw.
I believe its poison definition is successful and can still support useful optimizations.

Thanks,
Juneyoung

Ralf_Jung · June 21, 2021, 8:07am

Hi,

    Now, that rule as I’ve stated it would be really bad. Allowing a
    lucky guess to resolve to absolutely anything would almost
    completely block the optimizer from optimizing memory. For example,
    if a local variable came into scope, and then we called a function
    that returned a different pointer, we’d have to conservatively
    assume that that pointer might alias the local, even if the address
    of the local was never even taken, much less escaped:

    >int x = 0; int *p = guess_address_of_x(); *p = 15; printf(“%d\n”, x); //
    provably 0? |

    So the currently favored proposal adds a really important caveat:
    this blessing of provenance only works when a pointer with the
    correct provenance has been “exposed”. There are several ways to
    expose a pointer, including I/O, but the most important is casting
    it to an integer.

This is a valid point. If one wants to formally show the correctness of this kind of memory optimization this problem should be tackled.
I think n2676's 'Allocation-address nondeterminism' (p. 27) paragraph addresses this issue.
The underlying idea is that the address of an allocated object is assumed to be non-deterministically chosen, causing any guessed accesses to raise undefined behavior in at least one execution.

I am confused -- that optimization is allowed without any reasoning about non-determinism, because the address of x has never been exposed, right?

There are some optimizations that still require reasoning about non-determinism, namely cases where the address *has* been exposed. This was recently discussed on <Sympa Mailing lists service on lists.cam.ac.uk - home; and at least one WG14 member expressed the opinion that in this case, the optimization is actually not allowed.
FWIW, I personally prefer a model that always uses non-determinism to justify such optimizations, avoiding the need for any kind of "exposed" flag (such as the paper Juneyoung referenced -- unsurprisingly so, since I am a coauthor of that paper ;).

However, I should also add that the trade-offs in language design are different for a surface language such as C and for an optimized IR such as LLVM's.

    Again, there’s no requirement of a data dependence between the
    exposure and the int-to-pointer cast. If the program casts a
    pointer to an integer, and an independently-produced integer
    that happens to be the same value is later cast to a pointer,
    and the storage hasn’t been reallocated in the meantime, the
    resulting pointer will have the right provenance for the memory
    and will be valid to use. This implies that pointer-to-int casts
    (and other exposures) are semantically significant events in the
    program. They don’t have side effects in the normal sense, but
    they must be treated by the compiler just like things that do have
    side effects: e.g. unless I’m missing something in the TR,
    eliminating a completely unused pointer-to-int cast may make
    later code UB.

    And in fact, it turns out that this is crucially important for
    optimization. If the optimizer wants to allow arbitrary
    replacement of integers based on conditional equality, like
    in GVN, then replacement totally breaks direct data dependence,
    and you can easily be left with no remaining uses of a pointer-to-int
    cast when the original code would have had a data dependence. So
    you cannot reason about provenance through int-to-pointer casts:
    the pointer can alias any storage whose provenance has potentially
    been exposed, and the optimizer must be conservative about optimizing
    memory that has potentially been exposed.

+1, due to this reason the casting semantics cannot be directly used for LLVM's ptrtoint/inttoptr.

However, note that GVN integer replacement is a problem even without the "exposed" flag, as demonstrated by the long-standing issues 34548 – InstCombine cannot blindly assume that inttoptr(ptrtoint x) -> x and https://bugs.llvm.org/show_bug.cgi?id=35229.

Kind regards,
Ralf

John_McCall1 · June 21, 2021, 7:07pm

Hi,
Sorry for my late reply, and thank you for sharing great summaries & ideas.
I’ll leave my thoughts below.

Now, that rule as I’ve stated it would be really bad. Allowing a
lucky guess to resolve to absolutely anything would almost
completely block the optimizer from optimizing memory. For example,
if a local variable came into scope, and then we called a function
that returned a different pointer, we’d have to conservatively
assume that that pointer might alias the local, even if the address
of the local was never even taken, much less escaped:

int x = 0;
int *p = guess_address_of_x();
*p = 15;
printf(“%d\n”, x); // provably 0?

So the currently favored proposal adds a really important caveat:
this blessing of provenance only works when a pointer with the
correct provenance has been “exposed”. There are several ways to
expose a pointer, including I/O, but the most important is casting
it to an integer.

This is a valid point. If one wants to formally show the correctness of
this kind of memory optimization this problem should be tackled.
I think n2676’s ‘Allocation-address nondeterminism’ (p. 27) paragraph
addresses this issue.
The underlying idea is that the address of an allocated object is assumed
to be non-deterministically chosen, causing any guessed accesses to raise
undefined behavior in at least one execution.

Ah, that’s an interesting idea. I must have missed that section; I’m
afraid I only skimmed N2676 looking for the key points, because it’s
quite a long document.

To be clear, under the PVNI-ae family of semantics, that rule would
not be necessary to optimize x in my example because int-to-pointer
casts are not allowed to recreate provenance for x because it has
not been exposed. That rule does theoretically allow optimization of
some related examples where the address of x has been exposed,
because it lets the compiler try to reason about what happens after
exposure; it’s no longer true that exposure implies guessing are okay.

Unfortunately, though, I this non-determinism still doesn’t allow LLVM
to be anywhere near as naive about pointer-to-int casts as it is today.
The rule is intended to allow the compiler to start doing use-analysis
of exposures; let’s assume that this analysis doesn’t see any
un-analyzable uses, since of course it would need to conservatively
treat them as escapes. But if we can optimize uses of integers as if
they didn’t carry pointer data — say, in a function that takes integer
parameters — and then we can apply those optimized uses to integers
that concretely result from pointer-to-int casts — say, by inlining
that function into one of its callers — can’t we end up with a use
pattern for one or more of those pointer-to-int casts that no longer
reflects the fact that it’s been exposed? It seems to me that either
(1) we cannot do those optimizations on opaque integers or (2) we
need to record that we did them in a way that, if it turns out that
they were created by a pointer-to-int casts, forces other code to
treat that pointer as opaquely exposed.

Everything I’ve been talking about so far is a C-level concept:
an int-to-pointer cast is e.g. (float*) myInt, not inttoptr
in LLVM IR. But I think people have an expectation of what these
things mean in LLVM IR, and I haven’t seen it written out explicitly,
so let’s do that now.

The first assumption here is that int-to-pointer and pointer-to-int
casts in C will translate to inttoptr and ptrtoint operations
in IR. Now, this is already problematic, because those operations
do not currently have the semantics they need to have to make the
proposed optimization model sound. In particular:

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

ptrtoint on a constant is folded to a constant expression,
not an instruction, which is therefore no longer anchored in the
code and does not reliably record that the global may have escaped.
(Unused constant expressions do not really exist, and we absolutely
cannot allow them to affect the semantics of the IR.)

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.

inttoptr(ptrtoint) can be peepholed; ignoring the dead-stripping
aspects of removing the inttoptr, this also potentially
introduces UB because the original inttoptr “launders” the
provenance of the pointer to the current provenance of the
storage, whereas the original pointer may have stale provenance.

All of these concerns are valid.

(I’m not sure whether this is a good place to introduce this, but) we
actually have semantics for pointer castings tailored to LLVM (link
<https://sf.snu.ac.kr/publications/llvmtwin.pdf>).
In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
and inttoptr are scalar operations.
inttoptr simply returns a pointer which can access any object.

Skimming your paper, I can see how this works except that I don’t
see any way not to treat ptrtoint as an escape. And really I think
you’re already partially acknowledging that, because that’s the only
real sense of saying that inttoptr(ptrtoint p) can’t be reduced to
p. If those are really just scalar operations that don’t expose
p in ways that might be disconnected from the uses of the inttoptr
then that reduction ought to be safe.

John.

nhaehnle · June 22, 2021, 7:14am

actually have semantics for pointer castings tailored to LLVM (link
<https://sf.snu.ac.kr/publications/llvmtwin.pdf>).
In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
and inttoptr are scalar operations.
inttoptr simply returns a pointer which can access any object.

Skimming your paper, I can see how this works except that I don’t
see any way not to treat ptrtoint as an escape. And really I think
you’re already partially acknowledging that, because that’s the only
real sense of saying that inttoptr(ptrtoint p) can’t be reduced to
p. If those are really just scalar operations that don’t expose
p in ways that might be disconnected from the uses of the inttoptr
then that reduction ought to be safe.

I haven’t read the paper yet, but I think the argument is: Assume p was obtained via an out-of-bounds GEP that happens to point at valid memory that belongs to a different object than p’s provenance. In that case, dereferencing inttoptr(ptrtoint(p)) is defined behavior (“inttoptr simply returns a pointer which can access any object”) while dereferencing p is UB.

Cheers,
Nicolai

Ralf_Jung · June 22, 2021, 9:58am

Hi John,

Unfortunately, though, I this non-determinism still doesn’t allow LLVM
to be anywhere near as naive about pointer-to-int casts as it is today.

Definitely. There are limits to how naive one can be; beyond those limits, miscompilations lurk. <https://www.ralfj.de/blog/2020/12/14/provenance.html> explains this by showing such a miscompilation arising from three naive optimizations being chained together.

The rule is intended to allow the compiler to start doing use-analysis
of exposures; let’s assume that this analysis doesn’t see any
un-analyzable uses, since of course it would need to conservatively
treat them as escapes. But if we can optimize uses of integers as if
they didn’t carry pointer data — say, in a function that takes integer
parameters — and then we can apply those optimized uses to integers
that concretely result from pointer-to-int casts — say, by inlining
that function into one of its callers — can’t we end up with a use
pattern for one or more of those pointer-to-int casts that no longer
reflects the fact that it’s been exposed? It seems to me that either
(1) we cannot do those optimizations on opaque integers or (2) we
need to record that we did them in a way that, if it turns out that
they were created by a pointer-to-int casts, forces other code to
treat that pointer as opaquely exposed.

There is a third option: don't optimize away ptr-int-ptr roundtrips. Then you can still do all the same optimizations on integers that LLVM does today, completely naively -- the integer world remains "sane". Only the pointer world has to be "strange".
(You can also not do things like GVN replacement of *pointer-typed* values, but for values of integer types this remains unproblematic.)

I don't think it makes sense for LLVM to adopt an explicit "exposed" flag in its semantics. Reasoning based on non-determinism works fine, and has the advantage of keeping ptr-to-int casts a pure, side-effect-free operation. This is the model we explored in <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and we were able to show quite a few of LLVM's standard optimizations correct formally. Some changes are still needed as you noted, but those changes will be required anyway even if LLVM were to adopt PNVI-ae:
- No removal of ptr-int-ptr roundtrips. (34548 – InstCombine cannot blindly assume that inttoptr(ptrtoint x) -> x)
- No GVN replacement of pointer-typed values. (https://bugs.llvm.org/show_bug.cgi?id=35229)

    (I'm not sure whether this is a good place to introduce this, but) we
    actually have semantics for pointer castings tailored to LLVM (link
    <https://sf.snu.ac.kr/publications/llvmtwin.pdf
    <https://sf.snu.ac.kr/publications/llvmtwin.pdf>>\).
    In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
    and inttoptr are scalar operations.
    inttoptr simply returns a pointer which can access any object.

Skimming your paper, I can see how this works /except/ that I don’t
see any way not to treat |ptrtoint| as an escape. And really I think
you’re already partially acknowledging that, because that’s the only
real sense of saying that |inttoptr(ptrtoint p)| can’t be reduced to
>p>. If those are really just scalar operations that don’t expose
>p> in ways that might be disconnected from the uses of the |inttoptr|
then that reduction ought to be safe.

They are indeed just scalar operations, but the reduction is not safe.
The reason is that pointer-typed variables have values of the form "(addr, provenance)". There is essentially an 'invisible' component in each pointer value that tracks some additional information -- the "provenance" of the pointer. Casting a ptr to an int removes that provenance. Casting an int to a ptr picks a "default" provenance. So the overall effect of inttoptr(ptrtoint p) is to turn "(addr, provenance)" into "(addr, DEFAULT_PROVENANCE)".
Clearly that is *not* a NOP, and hence performing the reduction actually changes the result of this operation. Before the reduction, the resulting pointer had DEFAULT_PROVENANCE; after the reduction, it maintains the original provenance of "p". This can introduce UB into previously UB-free programs.

Kind regards,
Ralf

Juneyoung_Lee · June 22, 2021, 4:02pm

Hi,
Sorry for my late reply, and thank you for sharing great summaries & ideas.
I’ll leave my thoughts below.

Now, that rule as I’ve stated it would be really bad. Allowing a
lucky guess to resolve to absolutely anything would almost
completely block the optimizer from optimizing memory. For example,
if a local variable came into scope, and then we called a function
that returned a different pointer, we’d have to conservatively
assume that that pointer might alias the local, even if the address
of the local was never even taken, much less escaped:

int x = 0;
int *p = guess_address_of_x();
*p = 15;
printf(“%d\n”, x); // provably 0?

So the currently favored proposal adds a really important caveat:
this blessing of provenance only works when a pointer with the
correct provenance has been “exposed”. There are several ways to
expose a pointer, including I/O, but the most important is casting
it to an integer.

This is a valid point. If one wants to formally show the correctness of
this kind of memory optimization this problem should be tackled.
I think n2676’s ‘Allocation-address nondeterminism’ (p. 27) paragraph
addresses this issue.
The underlying idea is that the address of an allocated object is assumed
to be non-deterministically chosen, causing any guessed accesses to raise
undefined behavior in at least one execution.

Ah, that’s an interesting idea. I must have missed that section; I’m
afraid I only skimmed N2676 looking for the key points, because it’s
quite a long document.

To be clear, under the PVNI-ae family of semantics, that rule would
not be necessary to optimize x in my example because int-to-pointer
casts are not allowed to recreate provenance for x because it has
not been exposed. That rule does theoretically allow optimization of
some related examples where the address of x has been exposed,
because it lets the compiler try to reason about what happens after
exposure; it’s no longer true that exposure implies guessing are okay.

Unfortunately, though, I this non-determinism still doesn’t allow LLVM
to be anywhere near as naive about pointer-to-int casts as it is today.
The rule is intended to allow the compiler to start doing use-analysis
of exposures; let’s assume that this analysis doesn’t see any
un-analyzable uses, since of course it would need to conservatively
treat them as escapes. But if we can optimize uses of integers as if
they didn’t carry pointer data — say, in a function that takes integer
parameters — and then we can apply those optimized uses to integers
that concretely result from pointer-to-int casts — say, by inlining
that function into one of its callers — can’t we end up with a use
pattern for one or more of those pointer-to-int casts that no longer
reflects the fact that it’s been exposed? It seems to me that either
(1) we cannot do those optimizations on opaque integers or (2) we
need to record that we did them in a way that, if it turns out that
they were created by a pointer-to-int casts, forces other code to
treat that pointer as opaquely exposed.

IIUC the corresponding example would be:
int p;
intptr_t i = (intptr_t)&p;
f(i);
// Initially f(i) exposes p by storing i into somewhere, but optimization changed f to not store i anymore (e.g. i was replaced with a constant)

In this case, p was already exposed by ‘(intptr_t)p’ I guess; the body of f(i) won’t affect the exposedness of p.

Everything I’ve been talking about so far is a C-level concept:
an int-to-pointer cast is e.g. (float*) myInt, not inttoptr
in LLVM IR. But I think people have an expectation of what these
things mean in LLVM IR, and I haven’t seen it written out explicitly,
so let’s do that now.

The first assumption here is that int-to-pointer and pointer-to-int
casts in C will translate to inttoptr and ptrtoint operations
in IR. Now, this is already problematic, because those operations
do not currently have the semantics they need to have to make the
proposed optimization model sound. In particular:

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

ptrtoint on a constant is folded to a constant expression,
not an instruction, which is therefore no longer anchored in the
code and does not reliably record that the global may have escaped.
(Unused constant expressions do not really exist, and we absolutely
cannot allow them to affect the semantics of the IR.)

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.

inttoptr(ptrtoint) can be peepholed; ignoring the dead-stripping
aspects of removing the inttoptr, this also potentially
introduces UB because the original inttoptr “launders” the
provenance of the pointer to the current provenance of the
storage, whereas the original pointer may have stale provenance.

All of these concerns are valid.

(I’m not sure whether this is a good place to introduce this, but) we
actually have semantics for pointer castings tailored to LLVM (link
<https://sf.snu.ac.kr/publications/llvmtwin.pdf>).
In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
and inttoptr are scalar operations.
inttoptr simply returns a pointer which can access any object.

Skimming your paper, I can see how this works except that I don’t
see any way not to treat ptrtoint as an escape. And really I think
you’re already partially acknowledging that, because that’s the only
real sense of saying that inttoptr(ptrtoint p) can’t be reduced to
p. If those are really just scalar operations that don’t expose
p in ways that might be disconnected from the uses of the inttoptr
then that reduction ought to be safe.

John.

It again relies on nondeterminism of the address of an object.

For example:
p = call malloc(4) ; p’s address is assumed to be nondeterministically chosen
i = ptrtoint p
; i is never used
ret void

Since i is never used, this program cannot safely access p via address guessing + inttoptr.
This allows (more precise) escape analysis to conclude that malloc with one unused ptrtoint is never escaped.

Juneyoung

hfinkel · June 22, 2021, 4:07pm

Hi John,

Unfortunately, though, I this non-determinism still doesn’t allow LLVM
to be anywhere near as naive about pointer-to-int casts as it is today.

Definitely. There are limits to how naive one can be; beyond those limits, miscompilations lurk. <https://www.ralfj.de/blog/2020/12/14/provenance.html> explains this by showing such a miscompilation arising from three naive optimizations being chained together.

The rule is intended to allow the compiler to start doing use-analysis
of exposures; let’s assume that this analysis doesn’t see any
un-analyzable uses, since of course it would need to conservatively
treat them as escapes. But if we can optimize uses of integers as if
they didn’t carry pointer data — say, in a function that takes integer
parameters — and then we can apply those optimized uses to integers
that concretely result from pointer-to-int casts — say, by inlining
that function into one of its callers — can’t we end up with a use
pattern for one or more of those pointer-to-int casts that no longer
reflects the fact that it’s been exposed? It seems to me that either
(1) we cannot do those optimizations on opaque integers or (2) we
need to record that we did them in a way that, if it turns out that
they were created by a pointer-to-int casts, forces other code to
treat that pointer as opaquely exposed.

There is a third option: don't optimize away ptr-int-ptr roundtrips. Then you can still do all the same optimizations on integers that LLVM does today, completely naively -- the integer world remains "sane". Only the pointer world has to be "strange".
(You can also not do things like GVN replacement of *pointer-typed* values, but for values of integer types this remains unproblematic.)

Do we have any idea how large of an effect this might be? If we disable GVN for all pointer-typed values? And is it really all GVN, or just cases where you unify the equivalence classes based on some dominating comparison operation? We should be careful here, perhaps, because LLVM's GVN does a lot of plain-old CSE, store-to-load forwarding, etc. and we should say specifically what would need to be disabled and in what contexts.

-Hal

Ralf_Jung · June 22, 2021, 4:25pm

Hi Hal,

The rule is intended to allow the compiler to start doing use-analysis
of exposures; let’s assume that this analysis doesn’t see any
un-analyzable uses, since of course it would need to conservatively
treat them as escapes. But if we can optimize uses of integers as if
they didn’t carry pointer data — say, in a function that takes integer
parameters — and then we can apply those optimized uses to integers
that concretely result from pointer-to-int casts — say, by inlining
that function into one of its callers — can’t we end up with a use
pattern for one or more of those pointer-to-int casts that no longer
reflects the fact that it’s been exposed? It seems to me that either
(1) we cannot do those optimizations on opaque integers or (2) we
need to record that we did them in a way that, if it turns out that
they were created by a pointer-to-int casts, forces other code to
treat that pointer as opaquely exposed.

There is a third option: don't optimize away ptr-int-ptr roundtrips. Then you can still do all the same optimizations on integers that LLVM does today, completely naively -- the integer world remains "sane". Only the pointer world has to be "strange".
(You can also not do things like GVN replacement of *pointer-typed* values, but for values of integer types this remains unproblematic.)

Do we have any idea how large of an effect this might be? If we disable GVN for all pointer-typed values? And is it really all GVN, or just cases where you unify the equivalence classes based on some dominating comparison operation? We should be careful here, perhaps, because LLVM's GVN does a lot of plain-old CSE, store-to-load forwarding, etc. and we should say specifically what would need to be disabled and in what contexts.

What I mean is specifically GVN "exploiting" equality tests (icmp) of pointer type. That doesn't work (https://bugs.llvm.org/show_bug.cgi?id=35229 "weaponizes" it to create a miscompilation, albeit in artificial code). I think this optimization it is pretty much unsalvageable, except if you somehow know that the two pointers have *identical* provenance -- the moment pointers have any kind of provenance, just because icmp says they are equal does not mean we can treat them as equivalent for GVN, since icmp cannot know if the provenance of the two pointers is the same or not.

I assume that is what you mean by "unify the equivalence classes based on some dominating comparison operation" -- at least that sounds about right.
(I am probably using some incorrect/sloppy terminology here, and I apologize for that. My expertise is more in the area of formal language semantics than compiler construction.)

GVN can still treat e.g. "GEPi p 4" as equivalent with another "GEPi p 4" -- pure operations with identical inputs produce identical outputs, even at pointer type, and GEP/GEPi remain pure under this proposal. The issue is specifically the interaction of GVN with icmp, not GVN alone.

Kind regards,
Ralf

John_McCall1 · June 22, 2021, 5:10pm

Hi,
Sorry for my late reply, and thank you for sharing great summaries & ideas.
I’ll leave my thoughts below.

Now, that rule as I’ve stated it would be really bad. Allowing a
lucky guess to resolve to absolutely anything would almost
completely block the optimizer from optimizing memory. For example,
if a local variable came into scope, and then we called a function
that returned a different pointer, we’d have to conservatively
assume that that pointer might alias the local, even if the address
of the local was never even taken, much less escaped:

int x = 0;
int *p = guess_address_of_x();
*p = 15;
printf(“%d\n”, x); // provably 0?

So the currently favored proposal adds a really important caveat:
this blessing of provenance only works when a pointer with the
correct provenance has been “exposed”. There are several ways to
expose a pointer, including I/O, but the most important is casting
it to an integer.

This is a valid point. If one wants to formally show the correctness of
this kind of memory optimization this problem should be tackled.
I think n2676’s ‘Allocation-address nondeterminism’ (p. 27) paragraph
addresses this issue.
The underlying idea is that the address of an allocated object is assumed
to be non-deterministically chosen, causing any guessed accesses to raise
undefined behavior in at least one execution.

Ah, that’s an interesting idea. I must have missed that section; I’m
afraid I only skimmed N2676 looking for the key points, because it’s
quite a long document.

To be clear, under the PVNI-ae family of semantics, that rule would
not be necessary to optimize x in my example because int-to-pointer
casts are not allowed to recreate provenance for x because it has
not been exposed. That rule does theoretically allow optimization of
some related examples where the address of x has been exposed,
because it lets the compiler try to reason about what happens after
exposure; it’s no longer true that exposure implies guessing are okay.

Unfortunately, though, I this non-determinism still doesn’t allow LLVM
to be anywhere near as naive about pointer-to-int casts as it is today.
The rule is intended to allow the compiler to start doing use-analysis
of exposures; let’s assume that this analysis doesn’t see any
un-analyzable uses, since of course it would need to conservatively
treat them as escapes. But if we can optimize uses of integers as if
they didn’t carry pointer data — say, in a function that takes integer
parameters — and then we can apply those optimized uses to integers
that concretely result from pointer-to-int casts — say, by inlining
that function into one of its callers — can’t we end up with a use
pattern for one or more of those pointer-to-int casts that no longer
reflects the fact that it’s been exposed? It seems to me that either
(1) we cannot do those optimizations on opaque integers or (2) we
need to record that we did them in a way that, if it turns out that
they were created by a pointer-to-int casts, forces other code to
treat that pointer as opaquely exposed.

IIUC the corresponding example would be:
int p;
intptr_t i = (intptr_t)&p;
f(i);
// Initially f(i) exposes p by storing i into somewhere, but optimization
changed f to not store i anymore (e.g. i was replaced with a constant)

In this case, p was already exposed by ‘(intptr_t)p’ I guess; the body of
f(i) won’t affect the exposedness of p.

Right, formally I think that’s true.

All of these concerns are valid.

(I’m not sure whether this is a good place to introduce this, but) we
actually have semantics for pointer castings tailored to LLVM (link
<https://sf.snu.ac.kr/publications/llvmtwin.pdf>).
In this proposal, ptrtoint does not have an escaping side effect; ptrtoint
and inttoptr are scalar operations.
inttoptr simply returns a pointer which can access any object.

Skimming your paper, I can see how this works except that I don’t
see any way not to treat ptrtoint as an escape. And really I think
you’re already partially acknowledging that, because that’s the only
real sense of saying that inttoptr(ptrtoint p) can’t be reduced to
p. If those are really just scalar operations that don’t expose
p in ways that might be disconnected from the uses of the inttoptr
then that reduction ought to be safe.

John.

It again relies on nondeterminism of the address of an object.

For example:
p = call malloc(4) ; p’s address is assumed to be nondeterministically
chosen
i = ptrtoint p
; i is never used
ret void

Since i is never used, this program cannot safely access p via address
guessing + inttoptr.
This allows (more precise) escape analysis to conclude that malloc with one
unused ptrtoint is never escaped.

I think this is too local of an analysis.

Your proposal generally relies on certain optimizations not applying
to pointers because they mess up provenance as represented in
transitive use-dependencies. If those optimizations can be applied
to integers, you can lose use-dependencies in exactly the same way as
you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p
reduction doesn’t matter at all, because in either case that value
has a use-dependency on p, whereas the actual problem is that there
is no longer a use-dependency on some other value.

For example, you have compellingly argued that it’s problematic to
do the reduction a == b ? a : b to b for pointer types. Suppose
I instead do this optimization on integers where a = ptrtoint A.
The result is now simply b. If I inttoptr that result and access
the memory, there will be no record that that access may validly
be to A. It does not help that the access may be represented
as inttoptr (ptrtoint B) for some B rather than just directly
to B, because there is no use-dependence on A. All there is
is an apparently unrelated and unused ptrtoint A.

Obviously we can avoid doing this optimization locally when we
see that the inputs result from ptrtoint, but that’s no more
than best-effort: we can do this optimization in a function which
we later inline in a caller that performs all the ptrtoint and
inttoptr casts.

John.

Ralf_Jung · June 22, 2021, 5:21pm

Hi,

Your proposal generally relies on certain optimizations not applying
to pointers because they mess up provenance as represented in
transitive use-dependencies. If those optimizations can be applied
to integers, you can lose use-dependencies in exactly the same way as
you can with pointers. Not doing the inttoptr(ptrtoint p)) -> p
reduction doesn’t matter at all, because in either case that value
has a use-dependency on p, whereas the actual problem is that there
is no longer a use-dependency on some other value.

Note that "provenance" as we use it in this discussion is an *explicit operational artifact* -- it exists as a concrete piece of state in the Abstract Machine. That is very different from something that might just be used internally in some kind of analysis.

There is no problem with "resetting" that provenance on a "inttoptr", and basically forgetting about where the int comes from. Note that this is a statement about an operation in the Abstract Machine, not merely a statement about some analysis: this is not "forgetting" as in "safely overapproximating the real set of possible behaviors", it is "forgetting" by *forcing* the provenance to be some kind of wildcard/default provenance. All analyses then have to correctly account for that.

For example, you have compellingly argued that it’s problematic to
do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
I instead do this optimization on integers where |a = ptrtoint A|.
The result is now simply |b|. If I |inttoptr| that result and access
the memory, there will be no record that that access may validly
be to |A|. It does not help that the access may be represented
as |inttoptr (ptrtoint B)| for some |B| rather than just directly
to |B|, because there is no use-dependence on |A|. All there is
is an apparently unrelated and unused |ptrtoint A|.

So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being replaced by "ptrtoint B"? I don't see any problem with that. Do you have a concrete example?

Obviously we can avoid doing this optimization locally when we
see that the inputs result from |ptrtoint|, but that’s no more
than best-effort: we can do this optimization in a function which
we later inline in a caller that performs all the |ptrtoint| and
>inttoptr> casts.

I agree "avoiding something locally" is way too fragile to be considered a solution. I do not think it is needed, though.

Kind regards,
Ralf

John_McCall1 · June 22, 2021, 6:52pm

Your proposal generally relies on certain optimizations not applying
to pointers because they mess up provenance as represented in
transitive use-dependencies. If those optimizations can be applied
to integers, you can lose use-dependencies in exactly the same way as
you can with pointers. Not doing the inttoptr(ptrtoint p)) → p
reduction doesn’t matter at all, because in either case that value
has a use-dependency on p, whereas the actual problem is that there
is no longer a use-dependency on some other value.

Note that “provenance” as we use it in this discussion is an explicit operational artifact – it exists as a concrete piece of state in the Abstract Machine. That is very different from something that might just be used internally in some kind of analysis.

There is no problem with “resetting” that provenance on a “inttoptr”, and basically forgetting about where the int comes from. Note that this is a statement about an operation in the Abstract Machine, not merely a statement about some analysis: this is not “forgetting” as in “safely overapproximating the real set of possible behaviors”, it is “forgetting” by forcing the provenance to be some kind of wildcard/default provenance. All analyses then have to correctly account for that.

But it’s not a truly wildcard provenance. At the very least, it’s
restricted to the set of provenances that have been exposed, and
my understanding is that Juneyoung’s non-determinism rule attempts
to readmit a use-dependency analysis and turn ptrtoint back into
a simple scalar operation.

For example, you have compellingly argued that it’s problematic to
do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
I instead do this optimization on integers where |a = ptrtoint A|.
The result is now simply |b|. If I |inttoptr| that result and access
the memory, there will be no record that that access may validly
be to |A|. It does not help that the access may be represented
as |inttoptr (ptrtoint B)| for some |B| rather than just directly
to |B|, because there is no use-dependence on |A|. All there is
is an apparently unrelated and unused |ptrtoint A|.

So that would be “ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B” being replaced by “ptrtoint B”? I don’t see any problem with that. Do you have a concrete example?

I think you can take any example where pointer-type restrictions
would be necessary to protect against miscompilation and turn it
into an example here by just inserting inttoptr and ptrtoint
appropriately. Quick example:

int A = 0x1;
int B = 0x2;
long a = (long) (A+1);
long b = (long) B;
long result = (a == b ? a : b);
if (a == b)
  *(((int*) result) - 1) = 0x4;
else
  *((int*) result) = 0x8;
printf(“%d %d\n”, A, B);

I submit that this program has unspecified but not undefined behavior,
with printing “1 8” and “4 2” being the only two valid outcomes.
But I think an optimizer which changes the fifth line to
long result = b; without leaving any trace behind could easily
compile this to print “1 2” because there would be nothing to
prevent the initialization of A from being forwarded to the
final load.

You can prevent this by noting that the provenance of A has been
xposed and allowing the inttoptr of result to alias A, but
that seems inconsistent with treating ptrtoint as a simple scalar
operation and allowing a use-analysis of a ptrtoint to restrict
which inttoptr casts are allowed to recreate provenance for the
ptrtoint operand.

If you want to keep treating ptrtoint as a scalar operation and
doing use-analyses on it, I think the most palatable option is to
recognize whenever you’re cutting a use-dependency and
conservatively record in IR that the original value has now been
exposed. So if you start with this:

  %eq = icmp eq i32 %a, %b
  %result = select i1 %eq, i32 %a, i32 %b

You have to transform it like this:

  %result = %b
  call void @llvm.expose.i32(i32 %a)

You should be able to remove these exposure events in a lot of
situations, but conservatively they’ll have to be treated as
escapes.

Most optimizations never cut use-dependencies on opaque values
like this and so won’t be affected.

John.

Ralf_Jung · June 22, 2021, 7:40pm

Hi John,

    Note that "provenance" as we use it in this discussion is an *explicit
    operational artifact* -- it exists as a concrete piece of state in the
    Abstract Machine. That is very different from something that might just be
    used internally in some kind of analysis.

    There is no problem with "resetting" that provenance on a "inttoptr", and
    basically forgetting about where the int comes from. Note that this is a
    statement about an operation in the Abstract Machine, not merely a statement
    about some analysis: this is not "forgetting" as in "safely
    overapproximating the real set of possible behaviors", it is "forgetting" by
    *forcing* the provenance to be some kind of wildcard/default provenance. All
    analyses then have to correctly account for that.

But it’s not a truly wildcard provenance. At the very least, it’s
restricted to the set of provenances that have been exposed, and
my understanding is that Juneyoung’s non-determinism rule attempts
to readmit a use-dependency analysis and turn |ptrtoint| back into
a simple scalar operation.

So, at least in the paper Juneyoung and me and a few other people wrote (the one referenced a couple times in this thread already: https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a wildcard provenance.
"exposed" is not a thing in the operational semantics, it is a theorem we can prove.

        For example, you have compellingly argued that it’s problematic to
        do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
        I instead do this optimization on integers where |a = ptrtoint A|.
        The result is now simply |b|. If I |inttoptr| that result and access
        the memory, there will be no record that that access may validly
        be to |A|. It does not help that the access may be represented
        as |inttoptr (ptrtoint B)| for some |B| rather than just directly
        to |B|, because there is no use-dependence on |A|. All there is
        is an apparently unrelated and unused |ptrtoint A|.

    So that would be "ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B" being
    replaced by "ptrtoint B"? I don't see any problem with that. Do you have a
    concrete example?

I think you can take any example where pointer-type restrictions
would be necessary to protect against miscompilation and turn it
into an example here by just inserting |inttoptr| and |ptrtoint|
appropriately. Quick example:

>int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result = (a == b ? a : b); if (a == b) *(((int*) result) - 1) = 0x4; else *((int*) result) = 0x8; printf(“%d %d\n”, A, B); |

(Sorry about the bad quote, not sure what Thunderbird does here.)
I assume you mean "&A" and "&B" in lines 3 and 4?

I submit that this program has unspecified but not undefined behavior,
with printing “1 8” and “4 2” being the only two valid outcomes.

Okay, I can follow that.

But I think an optimizer which changes the fifth line to
>long result = b;| without leaving any trace behind

so then we are at

int A = 0x1;
int B = 0x2;
long a = (long) (&A+1);
long b = (long) &B;
long result = b;
if (a == b)
*(((int*) result) - 1) = 0x4;
else
*((int*) result) = 0x8;
printf(“%d %d\n”, A, B);

could easily
compile this to print “1 2” because there would be nothing to
prevent the initialization of |A| from being forwarded to the
final load.

I don't think that's right, since "a" is still used in the "if", so a bit of information about the address is leaked and you can't assume the address was not guessed. The furthest you can go is

int A = 0x1;
int B = 0x2;
long a = (long) (&A+1);
long b = (long) &B;
if (a == b)
*(((int*) (long) &B) - 1) = 0x4;
else
*((int*) (long) &B) = 0x8;
printf(“%d %d\n”, A, B);

No need for an explicit 'exposed', I think.

That said, things get more tricky once you want to account for C 'restrict' / LLVM 'noalias', and then it might be necessary to have explicit 'exposed'-like operations. I haven't seen a properly worked-out model for this, but the candidate models I saw would need something like this. So maybe it'd not be a bad idea to have such an operation anyway... I just don't think it should have any effect for the kind of programs we have been discussing here so far.

Kind regards,
Ralf

John_McCall1 · June 22, 2021, 8:52pm

Note that “provenance” as we use it in this discussion is an explicit
operational artifact – it exists as a concrete piece of state in the
Abstract Machine. That is very different from something that might just be
used internally in some kind of analysis.

There is no problem with “resetting” that provenance on a “inttoptr”, and
basically forgetting about where the int comes from. Note that this is a
statement about an operation in the Abstract Machine, not merely a statement
about some analysis: this is not “forgetting” as in “safely
overapproximating the real set of possible behaviors”, it is “forgetting” by
forcing the provenance to be some kind of wildcard/default provenance. All
analyses then have to correctly account for that.

But it’s not a truly wildcard provenance. At the very least, it’s
restricted to the set of provenances that have been exposed, and
my understanding is that Juneyoung’s non-determinism rule attempts
to readmit a use-dependency analysis and turn |ptrtoint| back into
a simple scalar operation.

So, at least in the paper Juneyoung and me and a few other people wrote (the one referenced a couple times in this thread already: https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a wildcard provenance.
“exposed” is not a thing in the operational semantics, it is a theorem we can prove.

Okay, fine, I won’t say “exposure”. You intend to still be able to
prove that the reconstructed provenance cannot be that of certain
memory locations based on a comprehensive use analysis of those
locations.

For example, you have compellingly argued that it’s problematic to
do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
I instead do this optimization on integers where |a = ptrtoint A|.
The result is now simply |b|. If I |inttoptr| that result and access
the memory, there will be no record that that access may validly
be to |A|. It does not help that the access may be represented
as |inttoptr (ptrtoint B)| for some |B| rather than just directly
to |B|, because there is no use-dependence on |A|. All there is
is an apparently unrelated and unused |ptrtoint A|.

So that would be “ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B” being
replaced by “ptrtoint B”? I don’t see any problem with that. Do you have a
concrete example?

I think you can take any example where pointer-type restrictions
would be necessary to protect against miscompilation and turn it
into an example here by just inserting |inttoptr| and |ptrtoint|
appropriately. Quick example:

int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result = (a == b ? a : b); if (a == b) (((int) result) - 1) = 0x4; else ((int) result) = 0x8; printf(“%d %d\n”, A, B); |

(Sorry about the bad quote, not sure what Thunderbird does here.)
I assume you mean “&A” and “&B” in lines 3 and 4?

Yes, sorry.

I submit that this program has unspecified but not undefined behavior,
with printing “1 8” and “4 2” being the only two valid outcomes.

Okay, I can follow that.

But I think an optimizer which changes the fifth line to

long result = b;| without leaving any trace behind

so then we are at

int A = 0x1;
int B = 0x2;
long a = (long) (&A+1);
long b = (long) &B;
long result = b;
if (a == b)
(((int) result) - 1) = 0x4;
else
((int) result) = 0x8;
printf(“%d %d\n”, A, B);

could easily
compile this to print “1 2” because there would be nothing to
prevent the initialization of |A| from being forwarded to the
final load.

I don’t think that’s right, since “a” is still used in the “if”, so a bit of information about the address is leaked and you can’t assume the address was not guessed.

Okay, so you have a baseline expectation that pointer comparisons will be treated as escapes. The reason I added a second comparison in my example is just because otherwise in the a != b case the access is out of bounds. So I guess you’re suggesting that it’s impossible to construct an example that avoids that problem that wouldn’t ultimately do something uneliminatable that would count as an escape?

That said, things get more tricky once you want to account for C ‘restrict’ / LLVM ‘noalias’, and then it might be necessary to have explicit ‘exposed’-like operations. I haven’t seen a properly worked-out model for this, but the candidate models I saw would need something like this. So maybe it’d not be a bad idea to have such an operation anyway… I just don’t think it should have any effect for the kind of programs we have been discussing here so far.

Honestly, part of why I like this approach is that I think the basic idea — recognizing when we’re doing a dependency-breaking value replacement and using a different replacement sequence — also moves us towards a solution for consume dependencies.

John.

nhaehnle · June 23, 2021, 5:27am

I’ve read this paper now, and it makes good sense to me as something to adopt in LLVM.

I do have one question about a point that doesn’t seem sufficiently justified, though. In the semantics of the paper, store-pointer-then-load-as-integer results in poison. This seems to be the root cause for being forced to introduce a “byte” type for correctness, but it is only really justified by an optimization that eliminates a store that writes back a previously loaded value. That optimization doesn’t seem all that important (but feel free to point out why it is…), while introducing a “byte” type is a massive change. On the face of it, that doesn’t seem like a good trade-off to me.

Has the alternative of allowing type punning through memory at the cost of removing that optimization been studied sufficiently elsewhere?

Cheers,
Nicolai

Juneyoung_Lee · June 23, 2021, 8:00am

About the impact of disabling GVN for pointers - for SPEC CPU and llvm-test-suite the slowdown wasn’t significant (avg. <0.1%),
but I’m concerned that they are only a small number of programs written in C/C++ on and my experiment was done on only one architecture.
Certainly for programs using many pointers disabling GVN is likely to be problematic.

If we have an operation that is similar to launder, then the optimization can be almost salvaged.
Let’s call it ‘q = wildcard_provenance(p)’; it means that q can be used to access an object that is at (intptr_t)p but not necessarily p’s object.

(In our memory model, wildcard_provenance(p) is equivalent to ‘inttoptr(ptrtoint p)’).

Replacing p with ‘wildcard_provenance(p)’ is correct because it makes the program more defined.

For example, load p raises UB if p is freed, but load wildcard_provenance(p) is still well-defined if a new live malloc is exactly placed at (intptr_t)p.

When lowered to MachineIR, wildcard_provenance(p) is simply a register copy.

Here is the list of valid optimizations:

GVN
if (p == q) {
use(p);
use(q);
}
=>
if (p == q) {
use(p);
use(wildcard_provenance(p));
}

dereference(p);

dereference(wildcard_provenance(p));
=>
dereference(p);
dereference(p);

store v, wildcard_provenance(p);

w = load p;
=>

store v, wildcard_provenance(p);

w = v;

must-alias(p, wildcard_provenance(p))? // answer: true

I don’t have a performance number for this yet because the operation did not exist when designing the memory model.

Juneyoung

Juneyoung_Lee · June 23, 2021, 8:36am

I don’t think it makes sense for LLVM to adopt an explicit “exposed” flag in its
semantics. Reasoning based on non-determinism works fine, and has the advantage
of keeping ptr-to-int casts a pure, side-effect-free operation. This is the
model we explored in <https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf>, and
we were able to show quite a few of LLVM’s standard optimizations correct
formally. Some changes are still needed as you noted, but those changes will be
required anyway even if LLVM were to adopt PNVI-ae:

No removal of ptr-int-ptr roundtrips.
(https://bugs.llvm.org/show_bug.cgi?id=34548)

No GVN replacement of pointer-typed values.
(https://bugs.llvm.org/show_bug.cgi?id=35229)

I’ve read this paper now, and it makes good sense to me as something to adopt in LLVM.

I do have one question about a point that doesn’t seem sufficiently justified, though. In the semantics of the paper, store-pointer-then-load-as-integer results in poison. This seems to be the root cause for being forced to introduce a “byte” type for correctness, but it is only really justified by an optimization that eliminates a store that writes back a previously loaded value. That optimization doesn’t seem all that important (but feel free to point out why it is…), while introducing a “byte” type is a massive change. On the face of it, that doesn’t seem like a good trade-off to me.

Has the alternative of allowing type punning through memory at the cost of removing that optimization been studied sufficiently elsewhere?

The transformation is analogous to removing memcpy-like code with the same dst and src.
Such code might not be written by humans frequently, but I believe C++'s template instantiation or optimizations like inlining can expose such a case.

Juneyoung

Juneyoung_Lee · June 23, 2021, 8:37am

Note that “provenance” as we use it in this discussion is an explicit
operational artifact – it exists as a concrete piece of state in the
Abstract Machine. That is very different from something that might just be
used internally in some kind of analysis.

There is no problem with “resetting” that provenance on a “inttoptr”, and
basically forgetting about where the int comes from. Note that this is a
statement about an operation in the Abstract Machine, not merely a statement
about some analysis: this is not “forgetting” as in “safely
overapproximating the real set of possible behaviors”, it is “forgetting” by
forcing the provenance to be some kind of wildcard/default provenance. All
analyses then have to correctly account for that.

But it’s not a truly wildcard provenance. At the very least, it’s
restricted to the set of provenances that have been exposed, and
my understanding is that Juneyoung’s non-determinism rule attempts
to readmit a use-dependency analysis and turn |ptrtoint| back into
a simple scalar operation.

So, at least in the paper Juneyoung and me and a few other people wrote (the one referenced a couple times in this thread already: https://people.mpi-sws.org/~jung/twinsem/twinsem.pdf), it is a wildcard provenance.
“exposed” is not a thing in the operational semantics, it is a theorem we can prove.

Okay, fine, I won’t say “exposure”. You intend to still be able to
prove that the reconstructed provenance cannot be that of certain
memory locations based on a comprehensive use analysis of those
locations.

For example, you have compellingly argued that it’s problematic to
do the reduction |a == b ? a : b| to |b| for pointer types. Suppose
I instead do this optimization on integers where |a = ptrtoint A|.
The result is now simply |b|. If I |inttoptr| that result and access
the memory, there will be no record that that access may validly
be to |A|. It does not help that the access may be represented
as |inttoptr (ptrtoint B)| for some |B| rather than just directly
to |B|, because there is no use-dependence on |A|. All there is
is an apparently unrelated and unused |ptrtoint A|.

So that would be “ptrtoint A == ptrtoint B ? ptrtoint A : ptrtoint B” being
replaced by “ptrtoint B”? I don’t see any problem with that. Do you have a
concrete example?

I think you can take any example where pointer-type restrictions
would be necessary to protect against miscompilation and turn it
into an example here by just inserting |inttoptr| and |ptrtoint|
appropriately. Quick example:

int A = 0x1; int B = 0x2; long a = (long) (A+1); long b = (long) B; long result = (a == b ? a : b); if (a == b) (((int) result) - 1) = 0x4; else ((int) result) = 0x8; printf(“%d %d\n”, A, B); |

(Sorry about the bad quote, not sure what Thunderbird does here.)
I assume you mean “&A” and “&B” in lines 3 and 4?

Yes, sorry.

I submit that this program has unspecified but not undefined behavior,
with printing “1 8” and “4 2” being the only two valid outcomes.

Okay, I can follow that.

But I think an optimizer which changes the fifth line to

long result = b;| without leaving any trace behind

so then we are at

int A = 0x1;
int B = 0x2;
long a = (long) (&A+1);
long b = (long) &B;
long result = b;
if (a == b)
(((int) result) - 1) = 0x4;
else
((int) result) = 0x8;
printf(“%d %d\n”, A, B);

could easily
compile this to print “1 2” because there would be nothing to
prevent the initialization of |A| from being forwarded to the
final load.

I don’t think that’s right, since “a” is still used in the “if”, so a bit of information about the address is leaked and you can’t assume the address was not guessed.

Okay, so you have a baseline expectation that pointer comparisons will be treated as escapes. The reason I added a second comparison in my example is just because otherwise in the a != b case the access is out of bounds. So I guess you’re suggesting that it’s impossible to construct an example that avoids that problem that wouldn’t ultimately do something uneliminatable that would count as an escape?

That said, things get more tricky once you want to account for C ‘restrict’ / LLVM ‘noalias’, and then it might be necessary to have explicit ‘exposed’-like operations. I haven’t seen a properly worked-out model for this, but the candidate models I saw would need something like this. So maybe it’d not be a bad idea to have such an operation anyway… I just don’t think it should have any effect for the kind of programs we have been discussing here so far.

Honestly, part of why I like this approach is that I think the basic idea — recognizing when we’re doing a dependency-breaking value replacement and using a different replacement sequence — also moves us towards a solution for consume dependencies.

(Slightly out of topic, but) I’m interested in knowing about what the consume dependency problem issue is; do you have any suggested thread for this?

Ralf_Jung · June 23, 2021, 12:52pm

Hi Nicolai,

I've read this paper now, and it makes good sense to me as something to
adopt in LLVM.

    I do have one question about a point that doesn't seem sufficiently
    justified, though. In the semantics of the paper,
    store-pointer-then-load-as-integer results in poison. This seems to be the
    root cause for being forced to introduce a "byte" type for correctness, but
    it is only really justified by an optimization that eliminates a store that
    writes back a previously loaded value. That optimization doesn't seem all
    that important (but feel free to point out why it is...), while introducing
    a "byte" type is a massive change. On the face of it, that doesn't seem like
    a good trade-off to me.

    Has the alternative of allowing type punning through memory at the cost of
    removing that optimization been studied sufficiently elsewhere?

The transformation is analogous to removing memcpy-like code with the same dst and src.
Such code might not be written by humans frequently, but I believe C++'s template instantiation or optimizations like inlining can expose such a case.

To add to what Juneyoung said:
I don't think that experiment has been made. From what I can see, the alternative you propose leads to an internally consistent model -- one "just" has to account for the fact that a "load i64" might do some transformation on the data to actually obtain an integer result (namely, it might to ptrtoint).

However, I am a bit worried about what happens when we eventually add proper support for 'restrict'/'noalias': the only models I know for that one actually make 'ptrtoint' have side-effects on the memory state (similar to setting the 'exposed' flag in the C provenance TS). I can't (currently) demonstrate that this is *required*, but I also don't know an alternative. So if this remains the case, and if we say "load i64" performs a ptrtoint when needed, then that would mean we could not do dead load elimination any more as that would remove the ptrtoint side-effect.

There also is the somewhat conceptual concern that LLVM ought to have a type that can loslessly hold all kinds of data that exist in LLVM. Currently, that is not the case -- 'iN' cannot hold data with provenance.

Kind regards,
Ralf

Topic		Replies	Views
[RFC] Introducing a byte type to LLVM LLVM Dev List Archives	38	347	June 20, 2021
Demystifying the byte type LLVM Dev List Archives	10	234	October 20, 2021
A memory model for LLVM IR supporting limited type punning IR & Optimizations	15	2227	April 30, 2022
[RFC] Introducing a byte type to LLVM LLVM Dev List Archives	4	113	June 14, 2021
More Encoding Ideas LLVM Dev List Archives	23	177	August 26, 2004

[RFC] Introducing a byte type to LLVM

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.

[RFC] Introducing a byte type to LLVM

ptrtoint does not have side-effects and can be dead-stripped when unused, which as discussed above is not okay.

Of course, this is only significant for globals that don’t already have to be treated conservatively because they might have other uses. That is, it only really matters for globals with, say, internal or private linkage.

inttoptr can be reordered with other instructions, which is not allowed because different points in a function may have different sets of storage with escaped provenance.

ptrtoint does not have side-effects and can be dead-stripped when unused, which as discussed above is not okay.

Of course, this is only significant for globals that don’t already have to be treated conservatively because they might have other uses. That is, it only really matters for globals with, say, internal or private linkage.

inttoptr can be reordered with other instructions, which is not allowed because different points in a function may have different sets of storage with escaped provenance.

Related topics

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.

ptrtoint does not have side-effects and can be dead-stripped
when unused, which as discussed above is not okay.

Of course, this is only significant for globals that don’t already
have to be treated conservatively because they might have other
uses. That is, it only really matters for globals with, say,
internal or private linkage.

inttoptr can be reordered with other instructions, which is
not allowed because different points in a function may have
different sets of storage with escaped provenance.