pointer provenance introduction to load/store instructions

Hi all,

as some of you already know, the full restrict patches[0] introduce an optional 'ptr_provenance' operand
to LoadInst/StoreInst. In the context of full restrict support, this operand is used to separate the
computation of the pointer operand from the possible restrict provenance of that operand. This was necessary
so that we could keep all the optimizations working on those pointers, without being blocked by the noalias intrinsics.
At the same time we avoided the risk to lose the noalias provenance of the instruction.

As that separation can also be useful in other situations, there was a request in the past LLVM Alias Analysis Tech calls[1],
to split out the pointer provenance changes and propose them as a next step for inclusion.

Fast forward to today. A lot more discussions about pointer provenance and related problems are ongoing.

Maybe the provided set of patches[2] and the experience with tracking the provenance in the full restrict patches can
be a first step to help solving some of those problems ?

Any feedback is welcome, also during tomorrows AA TechCall[1]

Greetings,

Jeroen Dobbelaere

[0] ⚙ D68484 [PATCH 01/27] [noalias] LangRef: noalias intrinsics and ptr_provenance documentation. [PATCH 01/27] [noalias] LangRef: noalias intrinsics and ptr_provenance documentation.
[1] LLVM AA Technical Call - Notes - Google Docs LLVM AA TechCall
[2] ⚙ D104268 [ptr_provenance] Introduce optional ptr_provenance operand to load/store [ptr_provenance] Introduce optional ptr_provenance operand to load/store

As far as I understand, your goal is to declare what's the set of objects a pointer may alias with at a memory dereference operation.
For example, you may want to say that the following load can only dereference objects pointed-to by %p and %q:
%v = load i8, * %r

If %r alias with some object other than %p/%q, the load triggers UB. This allows you to do better AA.

This is useful when you have the restrict keyword in a function argument and you inline that function. LLVM right now has no way to restrict aliasing per scope or operation, just per function.
(this story has been seen by every other attribute..)

The goal sounds useful. Though it would be nice to see some performance numbers as this is a complex feature and we need to understand if it's worth it.

Nevertheless, I would not claim this feature to be related with the discussion around the byte type. There are only 2 solutions that avoid introducing the byte type: 1) disable integer optimizations that are wrong for pointers, so that integers can carry provenance information. This has been tried by another compiler, CompCert, and didn't go very well.
Solution 2) is to lose provenance with integers and weaken AA substantially. While your proposal could help recovering precision a bit, it would never recover much, as it's a very syntactic approach. The compiler needs to find the memory dereference operations and associate them with the right provenance shrinking operations syntactically. Doesn't sound very promising, and it would only act as a cache, essentially. It's not that the front-end could give you much information. And we can prove this approach it's theoretical inferior to the byte type, as some escaping scenarios are not possible to recover from unless we disable type punning through memory (i.e., make the round-trip of store/load a value a NOP).
Of course, the practicality of the by type is yet to be proven, though work is undergoing.

Regardless, I'm happy to review your proposal if useful.

Nuno

Hi Nuno,

thanks for the interest and feedback !

From: Nuno Lopes
Subject: RE: [llvm-dev] pointer provenance introduction to load/store instructions

As far as I understand, your goal is to declare what's the set of objects a
pointer may alias with at a memory dereference operation.
For example, you may want to say that the following load can only dereference
objects pointed-to by %p and %q:
%v = load i8, * %r

If %r alias with some object other than %p/%q, the load triggers UB. This
allows you to do better AA.

Yes, this should make it possible to optimize something like:

int foo(int* a, int *b) {
  if ((uintptr_t)a +4) == (uintptr_t) b) {
    return b[0];
  } else {
    return a[1];
  }
}

to something like (pseudo code, assuming 32bit pointers):
  %a.gep = getelemenptr %a, 1
  %c = cmp %a.gep, %b ; This will not result in any code
  %prov = select %c, %b, %a ; This will also not result in any oce
  %result = load i32, i32* %a.gep, ptr_provenance i32* %prov
  ret i32 %result

Note: in the Full Restrict patches, I tried to be as specific as possible.
Later optimiziations might reveal the %c is always true or always false, which
can help with refining the ptr_provenance info. This does not prohibit the
addition of helper intrinsics that can take a more generic approach like:
  %prov = llvm.provenance.join %a, %b

This is useful when you have the restrict keyword in a function argument and
you inline that function. LLVM right now has no way to restrict aliasing per
scope or operation, just per function.
(this story has been seen by every other attribute..)

The goal sounds useful. Though it would be nice to see some performance
numbers as this is a complex feature and we need to understand if it's worth
it.

In what kind of performance numbers are you interested ?
- The optional ptr_provenance argument on its own has some impact on compile time and memory usage.
See ⚙ D68488 [PATCH 03/27] [noalias] [IR] Introduce ptr_provenance for LoadInst/StoreInst and ⚙ D68488 [PATCH 03/27] [noalias] [IR] Introduce ptr_provenance for LoadInst/StoreInst for a discussion
on some variants and their compile time impact.
Also see: LLVM Compile-Time Tracker

At this moment, we are using 'variant 1':
  variant 1 : always reserve place for the ptr_provenance; Move arguments around when adding/removing ptr_provenance

    extra space overhead for load/store instructions;
    NO impact on other instructions;
    slightly more overhead when accessing operands of load/store instructions;
    medium overhead when adding/removing ptr_provenance

- The effect of supporting full restrict on compile time ? For compilation time impact there has been a measurement in September:
https://llvm-compile-time-tracker.com/index.php?branch=dobbelaj-snps/perf/full_restrict-20200918

of course, this does not take into account the effect on code that is using local restrict and restrict pointes in aggregates.
(which, for our customers, is extremely important). The full restrict support does trigger on functions returning an aggregate
that are inlined, as those are modelled with a noalias pointer argument.

Nevertheless, I would not claim this feature to be related with the discussion
around the byte type. There are only 2 solutions that avoid introducing the
byte type: 1) disable integer optimizations that are wrong for pointers, so
that integers can carry provenance information. This has been tried by another
compiler, CompCert, and didn't go very well.
Solution 2) is to lose provenance with integers and weaken AA substantially.
While your proposal could help recovering precision a bit, it would never
recover much, as it's a very syntactic approach. The compiler needs to find
the memory dereference operations and associate them with the right provenance
shrinking operations syntactically. Doesn't sound very promising, and it would
only act as a cache, essentially. It's not that the front-end could give you
much information. And we can prove this approach it's theoretical inferior to
the byte type, as some escaping scenarios are not possible to recover from
unless we disable type punning through memory (i.e., make the round-trip of
store/load a value a NOP).
Of course, the practicality of the by type is yet to be proven, though work is
undergoing.

This is true. In my view, that discussion is more or less orthogonal to what the Full Restrict
patches add. For Full Restrict we do need to track the (noalias) provenance (this is needed for
the 'based-on' rule). For that a number of helpers were introduced:
- llvm.noalias : adds 'restrict/noalias' information to a pointer
- llvm.provenance.noalias : adds 'restrict/noalias' information to a pointer (ptr_provenance path)
- llvm.noalias.arg.guard : combines a computational path with a ptr_provenance path:
-- Only Load and Store have an explicit ptr_provenance argument
-- Other places where the provenance must be tracked (when storing the pointer, when passing it to a function, when returning it),
  the result of the 'llvm.noalias.arg.guard' is used, as that tracks both sides.
- llvm.noalias.copy.guard : annotated that a pointer points to a memory block containing restrict pointers.
-- This allows SROA to identify that a restrict pointer is copied when splitting up load/store of aggregates or
   replacing memcpy.

So, in the assumption that a memcpy and aggregate load/store propagates provenance, this allows us to keep track of that provenance.

Note: I am not claiming that this setup is perfect, but it has helped us providing full restrict support to our customers.

As far as I understand, your goal is to declare what's the set of
objects a pointer may alias with at a memory dereference operation.
For example, you may want to say that the following load can only
dereference objects pointed-to by %p and %q:
%v = load i8, * %r

If %r alias with some object other than %p/%q, the load triggers UB.
This allows you to do better AA.

Yes, this should make it possible to optimize something like:

int foo(int* a, int *b) {
  if ((uintptr_t)a +4) == (uintptr_t) b) {
    return b[0];
  } else {
    return a[1];
  }
}

to something like (pseudo code, assuming 32bit pointers):
  %a.gep = getelemenptr %a, 1
  %c = cmp %a.gep, %b ; This will not result in any code
  %prov = select %c, %b, %a ; This will also not result in any oce
  %result = load i32, i32* %a.gep, ptr_provenance i32* %prov
  ret i32 %result

This approach is the reverse of what I was thinking. Instead of restricting provenance, you are adding provenance. This is a more dangerous approach, as then provenance information can never be deleted, as it's required for correctness. The other way around uses provenance information to aid optimization, but it's not required for correctness, thus can be dropped.

So the main caveat of the proposal is that every single optimization touching memory operations needs to learn how to preserve & handle this new provenance information. Maybe all the changes will be down just to AA & a few utility functions, but still, every creation, copy, etc of memory operations needs to be audited.
In general, it's good practice to add new features to the IR such that they can be ignored by existing code that doesn't know about them.

This is useful when you have the restrict keyword in a function
argument and you inline that function. LLVM right now has no way to
restrict aliasing per scope or operation, just per function.
(this story has been seen by every other attribute..)

The goal sounds useful. Though it would be nice to see some
performance numbers as this is a complex feature and we need to

> understand if it's worth it.

In what kind of performance numbers are you interested ?

I think the first question is around benefits: Are there benchmarks we care about that benefit from this patch? Are there regressions? Even though the extra code is not materialized in assembly, it still exists and may interact with the inliner heuristics, for example.

This is true. In my view, that discussion is more or less orthogonal to what the Full Restrict patches add. For Full Restrict we do need to track the (noalias) provenance (this is needed for the 'based-on' rule). For that a number of helpers were introduced:
- llvm.noalias : adds 'restrict/noalias' information to a pointer
- llvm.provenance.noalias : adds 'restrict/noalias' information to a pointer (ptr_provenance path)
- llvm.noalias.arg.guard : combines a computational path with a ptr_provenance path:
-- Only Load and Store have an explicit ptr_provenance argument
-- Other places where the provenance must be tracked (when storing the pointer, when passing it to a function, when returning it),
  the result of the 'llvm.noalias.arg.guard' is used, as that tracks both sides.
- llvm.noalias.copy.guard : annotated that a pointer points to a memory block containing restrict pointers.
-- This allows SROA to identify that a restrict pointer is copied when splitting up load/store of aggregates or
   replacing memcpy.

So, in the assumption that a memcpy and aggregate load/store propagates provenance, this allows us to keep track of that provenance.

Thanks for this quick summary! I need to think more about this explicit provenance tracking and how far can we stretch it. This stuff is not trivial :slight_smile:

Nuno

Hi Nuno,

>> As far as I understand, your goal is to declare what's the set of
>> objects a pointer may alias with at a memory dereference operation.
>> For example, you may want to say that the following load can only
>> dereference objects pointed-to by %p and %q:
>> %v = load i8, * %r
>>
>> If %r alias with some object other than %p/%q, the load triggers UB.
>> This allows you to do better AA.
>
> Yes, this should make it possible to optimize something like:
>
> int foo(int* a, int *b) {
> if ((uintptr_t)a +4) == (uintptr_t) b) {
> return b[0];
> } else {
> return a[1];
> }
> }
>
> to something like (pseudo code, assuming 32bit pointers):
> %a.gep = getelemenptr %a, 1
> %c = cmp %a.gep, %b ; This will not
result in any code
> %prov = select %c, %b, %a ; This will also not
result in any oce
> %result = load i32, i32* %a.gep, ptr_provenance i32* %prov
> ret i32 %result

This approach is the reverse of what I was thinking. Instead of restricting
provenance, you are adding provenance. This is a more dangerous approach, as

I am not sure I understand this. This optimized load instruction has a provenance
that is related to %a and %b. Otherwise, this optimization would not be valid.

For: (note, also true for the variant with explicit control flow)
  %c = cmp %a.gep, %b
  %v1 = load i32, i32* %a.gep
  %v2 = load i32, i32* %b
  %result = select %c, %v2, %v1

'a optimization pass' that is aware of the provenance could generate the code
with merged loads, still tracking the correct provenance.

In the proposal it is such that a %prov == null, would indicate that the load can
have any provenance, so that can be considered as the safe fallback case.

then provenance information can never be deleted, as it's required for
correctness. The other way around uses provenance information to aid
optimization, but it's not required for correctness, thus can be dropped.

Hmm. I think I start seeing the difference.

The first question is what an ordinary load means:
  %L1 = load i32, i32* %p
  %L2 = load i32, i32* %p, ptr_provenance* %p
  %L3 = load i32, i32* %p, ptr_provenance* null

In the current proposal, %L1 and %L2 are equivalent. Aka, no explicit ptr_provenance
means to use the provenance of the provided pointer. %L3 is the way to indicate that
the load's provenance is broader than that of the pointer operand.

You probably had in mind that %L1 is equivalent to %L3, and %L2 needs to be
written explicitly to add a known provenance ?

As far as I am concerned, both interpretation are valid. I choose the first as the
most convenience (and less work), as, we already are doing provenance investigations
today on the pointer operand, so no change of behavior there. And the front end's
do not have to be teached about the provenance. (And keep in mind, my initial focus
was the full restrict implementation).

So the main caveat of the proposal is that every single optimization touching
memory operations needs to learn how to preserve & handle this new provenance
information. Maybe all the changes will be down just to AA & a few utility
functions, but still, every creation, copy, etc of memory operations needs to
be audited.
In general, it's good practice to add new features to the IR such that they
can be ignored by existing code that doesn't know about them.

These rules-of-thumb were the driver for the approach with Full Restrict. The
clone would not automatically track the noalias provenance, but drop it in a
safe and compatible way. For those optimizations where it made sense, knowledge
about the provenance and noalias metadata were added. The main thing that can be
problematic is the iterating over 'uses' that assume a load/store can only
be seen once.

>> This is useful when you have the restrict keyword in a function
>> argument and you inline that function. LLVM right now has no way to
>> restrict aliasing per scope or operation, just per function.
>> (this story has been seen by every other attribute..)
>>
>> The goal sounds useful. Though it would be nice to see some
>> performance numbers as this is a complex feature and we need to
> > understand if it's worth it.
>
> In what kind of performance numbers are you interested ?

I think the first question is around benefits: Are there benchmarks we care
about that benefit from this patch? Are there regressions? Even though the
extra code is not materialized in assembly, it still exists and may interact
with the inliner heuristics, for example.

For now, just having this infrastructure around will have a small cost of some extra
runtime memory for load/store instructions. Once the infrastructure is available,
this opens up a number of new things:
- Once filled in, tracking of provenance can be done more accurately and also faster.
(faster, as most of the computations can be skipped)
- IMHO, it's a mandatory step for adding full restrict support.
- During last AA TechCall, other developers already seemed to have some ideas on how
  to make use of it.

It would be nice if you can make it to the AA Tech Call today :wink:

Thanks for the feedback !

Jeroen

Hi all,

as some of you already know, the full restrict patches[0] introduce an optional 'ptr_provenance' operand
to LoadInst/StoreInst. In the context of full restrict support, this operand is used to separate the
computation of the pointer operand from the possible restrict provenance of that operand. This was necessary
so that we could keep all the optimizations working on those pointers, without being blocked by the noalias intrinsics.
At the same time we avoided the risk to lose the noalias provenance of the instruction.

As that separation can also be useful in other situations, there was a request in the past LLVM Alias Analysis Tech calls[1],
to split out the pointer provenance changes and propose them as a next step for inclusion.

Fast forward to today. A lot more discussions about pointer provenance and related problems are ongoing.

Maybe the provided set of patches[2] and the experience with tracking the provenance in the full restrict patches can
be a first step to help solving some of those problems ?

Any feedback is welcome, also during tomorrows AA TechCall[1]

I may not be able to make the call (the call starts as WG14 meetings
are ending), but I wanted to ask whether discussions of pointer
provenance are tracking the draft TS from WG14 on the subject
(http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2676.pdf)? This is a
work item from the C committee's Memory Object Model study group, and
any feedback we can provide to the SG on the TS would be greatly
appreciated as the committee's expectation is to someday roll this TS
into the C standard (with adjustments from whatever implementation
feedback is generated).

Thanks!

~Aaron

Hi Aaron,

at the moment we are (were ?) not exactly tracking the N2676 proposal, but we are keeping it mind.

Wrt to the ptr_provenance support, my current understanding is that this infrastructure will
make it possible to resolve some optimization issue, together with allowing us to improve
the alias analysis infrastructure and supporting full restrict. At the same time, the
infrastructure itself is orthogonal to the 'policy' that we want to use in llvm/clang.
(aka, what happens to the provenance for a int2ptr, etc.)

This also means that, as far I as see it at this moment, we should be able to adhere to/implement the
proposal.

Given that, it is a decent large piece of text and I am trying to iterate over it in order to grasp
everything that is said.

My understanding is that N2686 proposes the PNVI-ae-udi version as object model for C.

Something that is not clear to me (but I might have missed it in the large body of text), is what I
means for a storage instance to be exposed: based on p27, the 'pointer_from_integer_1ie.c', can I
conclude that just converting a 'pointer to a storage instance' to a uintptr_t is enough to
'expose' that instance, even if the value is never used ? (Not sure if that would lead to a useful
program)

And another observation: at this moment, I am not exactly sure how we would be able to 'proof' in a
convenient way at compile time that, like in the example 'pointer_from_int_disambiguation_3.c' on p30,
*r=11 would be associated with y (and conclude that r=r-1; *r=12 triggers UB). My current impression
is that any 'int2ptr' cast (at llvm ir level) would have to result in a pointer, with a
provenance = the union of all exposed objects. But I don't see yet how to reduce that set to the
one or two actual possibilities.

Thanks,

Jeroen Dobbelaere

Hi Aaron,

at the moment we are (were ?) not exactly tracking the N2676 proposal, but we are keeping it mind.

Fantastic, I really appreciate that!

Wrt to the ptr_provenance support, my current understanding is that this infrastructure will
make it possible to resolve some optimization issue, together with allowing us to improve
the alias analysis infrastructure and supporting full restrict. At the same time, the
infrastructure itself is orthogonal to the 'policy' that we want to use in llvm/clang.
(aka, what happens to the provenance for a int2ptr, etc.)

This also means that, as far I as see it at this moment, we should be able to adhere to/implement the
proposal.

That's great to hear!

Given that, it is a decent large piece of text and I am trying to iterate over it in order to grasp
everything that is said.

Sure -- to help with this, I've CCed Peter Sewell who is the chair of
the Memory Object Model study group and an author of the TS. Peter may
not be able to respond immediately (we're in WG14 meetings this week),
but hopefully he can chime in or CC someone who can, especially if I'm
misstating something.

My understanding is that N2686 proposes the PNVI-ae-udi version as object model for C.

Correct.

Something that is not clear to me (but I might have missed it in the large body of text), is what I
means for a storage instance to be exposed: based on p27, the 'pointer_from_integer_1ie.c', can I
conclude that just converting a 'pointer to a storage instance' to a uintptr_t is enough to
'expose' that instance, even if the value is never used ? (Not sure if that would lead to a useful
program)

That's a good question that I don't feel super confident in my answer
to, but I believe that converting the pointer to a storage instance to
uintptr_t does expose that instance without requiring further analysis
as to whether the value is actually used. But again, I could be Very
Wrong here. Hopefully Peter can chime in!

And another observation: at this moment, I am not exactly sure how we would be able to 'proof' in a
convenient way at compile time that, like in the example 'pointer_from_int_disambiguation_3.c' on p30,
*r=11 would be associated with y (and conclude that r=r-1; *r=12 triggers UB). My current impression
is that any 'int2ptr' cast (at llvm ir level) would have to result in a pointer, with a
provenance = the union of all exposed objects. But I don't see yet how to reduce that set to the
one or two actual possibilities.

That's good feedback! I think you're correct that int2ptr would need
to track the union of all exposed objects. I assume this is
potentially really expensive because that union could be quite large?

Thanks!

~Aaron