Upcoming Changes/Additions to Scoped-NoAlias metadata

Hi everyone,

As many of you might know, LLVM now has scoped noalias metadata (LLVM Language Reference Manual — LLVM 16.0.0git documentation) -- it allows us to preserve noalias function argument attributes when inlining, in addition to allowing frontends to add otherwise non-deducible aliasing properties to memory accesses. This currently works well, but needs a change and an intrinsic, as I'll explain below.

First, the problem: Currently, ScopedNoAliasAA.cpp is a little bit too much like TypeBasedAliasAnalysis.cpp in that when the metadata is used to return an aliasing result, the Size passed in the AliasAnalysis::Location object is ignored. This is not a problem if the Size is equal to (or less than) the size of the access with which the metadata was originally associated, but if the Size is larger than that associated with the original access, the result might not be correct. Fixing this is not hard for regular accesses (we can also store the size of the original access), although is harder for arbitrary function calls.

Now you might think that constructing an AliasAnalysis::Location object with a size larger than the original is uncommon, and in some sense it is, but this is exactly what the loop vectorizer does when partitioning memory accesses in a loop into potential aliasing sets. The loop vectorizer partitions a loop's memory accesses by taking the access's natural Location object, and setting the size to UnknownSize (the largest representable size) and querying using that infinite-size Location. The logic is that if two access don't alias with both sizes set to infinity, then they must always come from disjoint sets of underlying objects, and because the vectorizer only considers access pointers that are linear recurrences, implies that the accesses don't alias both within and across loop iterations.

So how does this work now? Consider this function:

void foo(float * restrict a, float * restrict b) {
  for (int i = 0; i < 1600; ++i)
    a[i] = b[i] + 1;
}
(note that restrict at the C level becomes a noalias function argument attribute at the IR level)

With the scoped noalias metadata, the aliasing information from the 'a' and 'b' function arguments is preserved such that 'a[i]' and 'b[i]' are tagged as not aliasing. There is nothing wrong with this, and in fact, when the loop vectorizer uses its infinite-sized queries on 'a[i]' and 'b[i]', and that returns NoAlias, that result is also not wrong. It is not wrong, however, because the noalias function argument attributes, logically associated with the function entry block, dominate the loop ('a[i]' is based on 'a' for all 'i', and similarly for 'b[i]').

Now here's the problematic case:

void inner(float * restrict a, float * restrict b) {
  *a = *b + 1;
}
void foo(float * a, float * b) {
  for (int i = 0; i < 1600; ++i)
    inner(a+i, b+i);
}

The problem here is that the noalias function argument attributes on 'inner', when inlined into the loop body, do not dominate the loop; they apply to pointers within each loop iteration, not across loop iterations. But because the generated noalias metadata is essentially the same as in the previous case, and as noted ScopedNoAliasAA.cpp ignores the Location's Size, the loop vectorizer's infinite-sized alias queries return NoAlias as in the previous case. This is a bug. Making ScopedNoAliasAA.cpp check the size of the original access (as it should) would fix this problem, but it will also prevent ScopedNoAliasAA.cpp from returning NoAlias for the first case were we'd like it to do so.

So to summarize, scoped-noalias metadata does not preserve enough dominance information to truly capture the full semantics of the noalias function argument attributes.

After discussing this with Chandler offline last week, here's the proposed solution: instead of having both !alias.scope and !noalias metadata, we'll have only !alias.scope metadata and an intrinsic: i8* @llvm.noalias(i8* ptr, !metadata !?) where the metadata argument corresponds to a list of !alias.scopes. The idea being that the pointer returned by this intrinsic, and all pointers derived from it, are assumed not to alias with memory accesses tagged with any of the associated !alias.scope metadata entries. This intrinsic needs to carry control dependencies (it cannot be hoisted out of a loop, for example) -- in this sense it is very much like @llvm.assume. And like @llvm.assume, we'll need to add logic to various passes to ignore it as appropriate so that it does not block optimizations unnecessarily. I was hoping this avoid this part of the design space, but I don't see any way around it -- only some non-hoistable instruction can model a control dependence.

With this in place, it will be possible to build a loop-aware AA interface capable of answering questions like: Does location A in some loop iteration alias with location B in any iteration (which is really what the vectorizer wants to know). This interface can check whether the @llvm.noalias dominates the loop while answering this query.

On the bright side, the metadata will be simpler (or at least less verbose) than the current design.

If anyone has suggestions on an alternative design, please share! :slight_smile: Regardless, the current design needs to change: Just fixing it to check the size of the location will make it non-useful for the loop vectorizer and otherwise cripple it for mod-ref queries against arbitrary function calls.

Thanks again,
Hal

After discussing this with Chandler offline last week, here's the proposed
solution: instead of having both !alias.scope and !noalias metadata, we'll
have only !alias.scope metadata and an intrinsic: i8* @llvm.noalias(i8*
ptr, !metadata !?) where the metadata argument corresponds to a list of
!alias.scopes. The idea being that the pointer returned by this intrinsic,
and all pointers derived from it, are assumed not to alias with memory
accesses tagged with any of the associated !alias.scope metadata entries.

Could you give examples? I don't quite follow this. I had thought that the
analysis would be slightly different. My expectation is that each noalias
argument pointer would turn into a @llvm.noalias call, all of them sharing
a single metadata "scope". And then any loads or stores through a pointer
derived from a @llvm.noalias call would never alias loads and stores
derived through a different @llvm.noalias call which had the same scope
metadata.

This intrinsic needs to carry control dependencies (it cannot be hoisted
out of a loop, for example) -- in this sense it is very much like
@llvm.assume. And like @llvm.assume, we'll need to add logic to various
passes to ignore it as appropriate so that it does not block optimizations
unnecessarily.

We should do something to make this simpler. I think we should have an
intrinsic inst base class that assume, lifetime, and other intrinsics which
do not represent actual code in the final program derive from so that we
don't have to update these lists all over the place. If we need 2 tiers to
model assume & noalias as distinct from the lifetime or other intrinsics,
fine. We should have high-level categories that can be tested and updated.

From: "Chandler Carruth" <chandlerc@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, November 13, 2014 7:02:58 PM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

After discussing this with Chandler offline last week, here's the
proposed solution: instead of having both !alias.scope and !noalias
metadata, we'll have only !alias.scope metadata and an intrinsic:
i8* @llvm.noalias(i8* ptr, !metadata !?) where the metadata argument
corresponds to a list of !alias.scopes. The idea being that the
pointer returned by this intrinsic, and all pointers derived from
it, are assumed not to alias with memory accesses tagged with any of
the associated !alias.scope metadata entries.

Could you give examples? I don't quite follow this. I had thought
that the analysis would be slightly different. My expectation is
that each noalias argument pointer would turn into a @llvm.noalias
call, all of them sharing a single metadata "scope". And then any
loads or stores through a pointer derived from a @llvm.noalias call
would never alias loads and stores derived through a different
@llvm.noalias call which had the same scope metadata.

Sorry, I did not state this very well. Yes, that's correct, with the exception that "derived through a different

@llvm.noalias call which had the same scope metadata" should also be supplemented with being derived from other identified objects (just like with a noalias argument attribute).

This intrinsic needs to carry control dependencies (it cannot be
hoisted out of a loop, for example) -- in this sense it is very much
like @llvm.assume. And like @llvm.assume, we'll need to add logic to
various passes to ignore it as appropriate so that it does not block
optimizations unnecessarily.

We should do something to make this simpler. I think we should have
an intrinsic inst base class that assume, lifetime, and other
intrinsics which do not represent actual code in the final program
derive from so that we don't have to update these lists all over the
place. If we need 2 tiers to model assume & noalias as distinct from
the lifetime or other intrinsics, fine. We should have high-level
categories that can be tested and updated.

Agreed. There is come commonality here that can be exploited for at least some of the passes.

-Hal

Hal, a couple of questions:

  1. Do you preserve alias to stores+loads derived through a different @llvm.noalias call which has different scope metadata? Couldn’t that be treated similarly to S+Ls derived from other identified objects?

  2. Can you describe the case for explicitly preventing the intrinsic being hoisted out of a loop? Since the intrinsic generates a new pointer, wouldn’t the interesting cases be handled by avoiding commoning of the intrinsic?

From: "Raul Silvera" <rsilvera@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, November 14, 2014 10:34:39 AM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

Hal, a couple of questions:

1. Do you preserve alias to stores+loads derived through a different
@llvm.noalias call which has different scope metadata? Couldn't that
be treated similarly to S+Ls derived from other identified objects?

From a different scope? No. Here's why:

  int i = ...;
  T a, b;
  T * y1 = ..., y2 = ...
  {
    T * restrict x = y1;
    a = x[i];
  }
  {
    T * restrict x = y2;
    b = x[i];
  }

which becomes:

  int i = ...;
  T a, b;
  T * y1 = ..., y2 = ...
  T * x1 = @llvm.noalias(y1, !scope1);
  a = x1[i]; // alias.scope !scope1
  T * x2 = @llvm.noalias(y2, !scope2);
  b = x2[i]; // alias.scope !scope2

but can we assume that 'x1[i]' does not alias with 'x2[i]'? No.

Now one possible design here is to solve this problem by mutual dominance, but to do that, we'd need to make code motion around the @llvm.noalias calls highly restricted. I'd like to allow for as much data motion as possible.

2. Can you describe the case for explicitly preventing the intrinsic
being hoisted out of a loop? Since the intrinsic generates a new
pointer, wouldn't the interesting cases be handled by avoiding
commoning of the intrinsic?

No, here's why. Take the following example:
  T * y = ...;
  for (int i = 0; i < 1600; ++i) {
    T * restrict x = y;
    T t = x[i];
    t += 1;
    x[i] = t;
  }

which becomes something like:

  T * y = ...;
  for (int i = 0; i < 1600; ++i) {
    T * x = @llvm.noalias(y, !scope1)
    T t = x[i]; // alias.scope !scope1
    t += 1;
    x[i] = t; // alias.scope !scope1
  }

now we need to make sure that the call to @llvm.noalias is not hoisted out of the loop. In order to do that, we tag it as writing (at least to y). The problem is that, as noted in the original e-mail, when looking across loop iterations (as the loop vectorizer must do), it is important to be able to test whether or not the call to @llvm.noalias dominates the loop in question. This example is over-simplified (because there is only one pointer used in the loop), but imagine there are several such pointers used. Does that make sense?

Thanks again,
Hal

Agreed. Specific to this point, I’ve seen a number of cases in discussion recently where a notion of a intrinsic which is control dependent on surrounding control flow, but does not read or write memory would be useful. It really feels like that’s what the current implementation of llvm.assume has become and likely what this proposal would require. Maybe it’s time to introduce a notion of an arbitrary non-memory control dependence? Cases that come to mind: - invariant.* - lifetime.* - float point state changes (this last one might need a stronger restriction than the others) Philip

From: "Philip Reames" <listmail@philipreames.com>
To: "Chandler Carruth" <chandlerc@google.com>, "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, November 14, 2014 11:57:54 AM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

This intrinsic needs to carry control dependencies (it cannot be
hoisted out of a loop, for example) -- in this sense it is very much
like @llvm.assume. And like @llvm.assume, we'll need to add logic to
various passes to ignore it as appropriate so that it does not block
optimizations unnecessarily.
We should do something to make this simpler. I think we should have
an intrinsic inst base class that assume, lifetime, and other
intrinsics which do not represent actual code in the final program
derive from so that we don't have to update these lists all over the
place. If we need 2 tiers to model assume & noalias as distinct from
the lifetime or other intrinsics, fine. We should have high-level
categories that can be tested and updated. Agreed.

Specific to this point, I've seen a number of cases in discussion
recently where a notion of a intrinsic which is control dependent on
surrounding control flow, but does not read or write memory would be
useful. It really feels like that's what the current implementation
of llvm.assume has become and likely what this proposal would
require. Maybe it's time to introduce a notion of an arbitrary
non-memory control dependence?

Cases that come to mind:
- invariant.*
- lifetime.*

But these two do have a memory control dependence, right? You can't reorder a store to x and a lifetime.end(x).

-Hal

The specific case you gave is actually fine, but your point is true. I'd gotten myself confused here.

If you have a store immediate before a lifetime.end, the store is dead and can be assumed to not be visible in the program. As a result, I don't see why it's illegal to reorder them.

*addr = x;
lifetime.end(addr)

Thanks, Hal… Let me stay on the first question for now.

If we take your example and remove the first restrict:
T A;
T * y1 = …, y2 = …
{
T * x = &A;
a = x[i];
}
{
T * restrict x = y2;
b = x[i];
}

Will we assert that *x1 and *x2 do not alias each other? If so, why is it OK here and not if the first one is restrict? I believe from a language perspective you just need the bottom restrict to make the guarantee.

+1

The design proposal seems reasonable; I have a couple of comments on implementation.

After discussing this with Chandler offline last week, here's the proposed solution: instead of having both !alias.scope and !noalias metadata, we'll have only !alias.scope metadata and an intrinsic: i8* @llvm.noalias(i8* ptr, !metadata !?) where the metadata argument corresponds to a list of !alias.scopes. The idea being that the pointer returned by this intrinsic, and all pointers derived from it, are assumed not to alias with memory accesses tagged with any of the associated !alias.scope metadata entries. This intrinsic needs to carry control dependencies (it cannot be hoisted out of a loop, for example) -- in this sense it is very much like @llvm.assume. And like @llvm.assume, we'll need to add logic to various passes to ignore it as appropriate so that it does not block optimizations unnecessarily. I was hoping this avoid this part of the design space, but I don't see any way around it -- only some non-hoistable instruction can model a control dependence.

There have been several related ideas being talked about recently. If possible, it'd be nice to arrive at something fairly general.

Andy Trick and I were talking about the possibility of a more general control dependent metadata holder at the dev meeting. The basic idea is that you'd have an intrinsic something like "void llvm.tag_metadata(any_ty %value, !metadata !?)" The semantics would be that the given metadata applies to the given value at the specific location. By combing this with existing forms of metadata, this converts each from being a property of a value to being a property of a value at a particular location. Implementation wise, it would be extremely similiar to the existing llvm.assume intrinsic.

Your current proposal uses the data dependence off the intrinsic, whereas I was thinking using something closer to the assume mechanism. Andy had previously put forth an idea (in the 'Optimization hints for "constant" loads' thread) for a similar intrinsic to create a new value with a data dependence tied to a function with control dependence. (To make scoping !invariant possible.) I can see appeal in both schemes, but it seems like most folks are leaning towards the data dependent model.

Do you think it makes sense to roll this all into one family of intrinsics? Or do you see something in your proposed use which wouldn't work for other types of metadata?

Other example use cases:
- !invariant loads mixed with initialization of the same location
- !nonnull and !range facts recorded by a language frontend

p.s. For the sake of completeness, Andy and I were also talking about a version of this idea for function attributes as well. This would give us the ability to say things like "if this value is non-null, it is dereferenceable to size X".

Philip

From: "Raul Silvera" <rsilvera@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, November 14, 2014 12:15:51 PM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

Thanks, Hal... Let me stay on the first question for now.

If we take your example and remove the first restrict:
T A;
T * y1 = ..., y2 = ...
{
T * x = &A;
a = x[i];
}
{
T * restrict x = y2;
b = x[i];
}

Will we assert that *x1 and *x2 do not alias each other? If so, why
is it OK here and not if the first one is restrict?

You don't have x1 and x2 in your example, assuming you mean:

  int i = 0;
  T A;
  T * y2 = ...
  {
    T * x1 = &A;
    a = x1[i];
  }
  {
    T * restrict x2 = y2;
    b = x2[i];
  }

then the answer is no, the fact that x2 is restrict qualified does not allow us to conclude that &x2[i] and &x1[i] don't alias.

I believe from a
language perspective you just need the bottom restrict to make the
guarantee.

Now assuming that the '...' is not &A, the BasicAA will catch this, but that's another matter. You'd need to have the restrict in a parent block here.

-Hal

From: "Philip Reames" <listmail@philipreames.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, November 14, 2014 12:27:58 PM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

+1

The design proposal seems reasonable; I have a couple of comments on
implementation.

> After discussing this with Chandler offline last week, here's the
> proposed solution: instead of having both !alias.scope and
> !noalias metadata, we'll have only !alias.scope metadata and an
> intrinsic: i8* @llvm.noalias(i8* ptr, !metadata !?) where the
> metadata argument corresponds to a list of !alias.scopes. The idea
> being that the pointer returned by this intrinsic, and all
> pointers derived from it, are assumed not to alias with memory
> accesses tagged with any of the associated !alias.scope metadata
> entries. This intrinsic needs to carry control dependencies (it
> cannot be hoisted out of a loop, for example) -- in this sense it
> is very much like @llvm.assume. And like @llvm.assume, we'll need
> to add logic to various passes to ignore it as appropriate so that
> it does not block optimizations unnecessarily. I was hoping this
> avoid this part of the design space, but I don't see any way
> around it -- only some non-hoistable instruction can model a
> control dependence.
There have been several related ideas being talked about recently. If
possible, it'd be nice to arrive at something fairly general.

Andy Trick and I were talking about the possibility of a more general
control dependent metadata holder at the dev meeting.

I know :wink: -- Andy and I talked about it too.

The basic idea
is
that you'd have an intrinsic something like "void
llvm.tag_metadata(any_ty %value, !metadata !?)" The semantics would
be
that the given metadata applies to the given value at the specific
location. By combing this with existing forms of metadata, this
converts each from being a property of a value to being a property of
a
value at a particular location. Implementation wise, it would be
extremely similiar to the existing llvm.assume intrinsic.

I like this idea, I don't think it works for this case (unfortunately). Explained below...

Your current proposal uses the data dependence off the intrinsic,
whereas I was thinking using something closer to the assume
mechanism.
Andy had previously put forth an idea (in the 'Optimization hints for
"constant" loads' thread) for a similar intrinsic to create a new
value
with a data dependence tied to a function with control dependence.
(To
make scoping !invariant possible.) I can see appeal in both schemes,
but it seems like most folks are leaning towards the data dependent
model.

The data dependence is easier to find, but harder on the optimizer (because the intrinsic appears opaque to anything not taught to look through it). For @llvm.assume, using a data dependence really was not possible (the assumption could involve many values, which to return? and if all of them, looking through all of the insert/extractvalue instructions would have been painful). There are two issues here that I see with using a control dependence alone. First, it might be too easy to lose. For example, let's say we have:

  foo(T * x) {
    T * restrict y = x - 1;
    y[0] = 0;
    y[1] = 1;
    ...
  }

if we take the control dependence model, and do something like this:

  foo(T * x) {
    T * y = x - 1;
    @llvm.noalias(x - 1, !scope1);
    y[0] = 0;
    y[1] = 1;
    ...
  }

but the optimizer will likely make this into:

  foo(T * x) {
    T * y = x - 1;
    @llvm.noalias(x - 1, !scope1);
    x[-1] = 0;
    x = 1;
    ...
  }

and, effectively, the noalias information will be lost for the second access. That's why I picked a data dependence here. Now you might think that we could get around this by canonicalizing @llvm.noalias(x - 1, !scope1) into @llvm.noalias(x, !scope1), but that's not really legal either. Actually, it's worse than that, it is really important to partition the users of the @llvm.noalias from other uses of the pointer. Here's my canonical example:

void foo(T * restrict x, T * restrict y) {
  for (int i = 0; i < 1600; ++i)
    x[2*i+1] = y[2*i] + 1;
}

now imagine inlining a call of foo(q, q). This is allowed because the accesses based on x and those based on y are disjoint, but this cannot be modeled safely with the control-dependence and a single scope alone.

[Coincidentally, this is the second time today I've used this example; see http://llvm.org/bugs/show_bug.cgi?id=21556 for the other place, which you might also find interesting].

Do you think it makes sense to roll this all into one family of
intrinsics? Or do you see something in your proposed use which
wouldn't
work for other types of metadata?

I do want this, as a separate matter, but I don't think it necessarily works here.

Other example use cases:
- !invariant loads mixed with initialization of the same location
- !nonnull and !range facts recorded by a language frontend

p.s. For the sake of completeness, Andy and I were also talking about
a
version of this idea for function attributes as well. This would
give
us the ability to say things like "if this value is non-null, it is
dereferenceable to size X".

Yep. I certainly care about dereferenceability too :wink:

Thanks again,
Hal

You don't have x1 and x2 in your example, assuming you mean:

  int i = 0;
  T A;
  T * y2 = ...
  {
    T * x1 = &A;
    a = x1[i];
  }
  {
    T * restrict x2 = y2;
    b = x2[i];
  }

then the answer is no, the fact that x2 is restrict qualified does not
allow us to conclude that &x2[i] and &x1[i] don't alias.

​It should, no? by virtue of x2 being restrict you know that *x2 doesn't
alias A, and *x1 is A.​

This example is flawed anyway ​because it only has loads, so the aliasing
isn't that interesting. Something more realistic:

  T
​A, B

  T * x1 =
​.... // either &A or &B
  T * y2 =
​.... // ​
​maybe &A
​ ​
{
    T * restrict x2 = y2;

​*​
x
​1
​ = ...

​*​
x2
​ = ...

  }

​In this case you'll be able to tell *x1 doesn't alias​ *x2, right? How
about if you add restrict to x1?

From: "Raul Silvera" <rsilvera@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Friday, November 14, 2014 5:31:24 PM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

You don't have x1 and x2 in your example, assuming you mean:

int i = 0;
T A;
T * y2 = ...
{
T * x1 = &A;
a = x1[i];
}
{
T * restrict x2 = y2;
b = x2[i];
}

then the answer is no, the fact that x2 is restrict qualified does
not allow us to conclude that &x2[i] and &x1[i] don't alias.

It should, no? by virtue of x2 being restrict you know that *x2
doesn't alias A, and *x1 is A.

No, it doesn't. The fact that x2 is restrict does not mean that it does not alias with any other potential accesses from variables live in its block. It only means it does not alias with other accesses with that occur in the block where x2 is live. There is no access to A or x1 in that block, so we can say nothing about it.

This example is flawed anyway ​because it only has loads,

Yes, but I'm ignoring that for the time being.

so the
aliasing isn't that interesting. Something more realistic:

  T A, B;
  T * x1 = .... // either &A or &B
  T * y2 =​ .... // ​​maybe &A
  {
    T * restrict x2 = y2;
​ *​x1 = ...
​ *​x2 = ...
  }

In this case you'll be able to tell *x1 doesn't alias​ *x2, right?

In this case, yes, we can conclude that x1 and x2 don't alias (because *x1 and *x2 cannot both legally refer to the same object).

How about if you add restrict to x1?

The conclusion is the same, but if you add restrict to x1, you don't need it on x2. x2 is definitely not based on x1, so if x1 is restrict, then we know that x1 and x2 don't alias.

Thanks again,
Hal

You don’t have x1 and x2 in your example, assuming you mean:

int i = 0;
T A;
T * y2 = …
{
T * x1 = &A;
a = x1[i];
}
{
T * restrict x2 = y2;
b = x2[i];
}

It should, no? by virtue of x2 being restrict you know that *x2
doesn’t alias A, and *x1 is A.

No, it doesn’t. The fact that x2 is restrict does not mean that it does not alias with any other potential accesses from variables live in its block. It only means it does not alias with other accesses with that occur in the block where x2 is live. There is no access to A or x1 in that block, so we can say nothing about it.

It does. You can assume x2 is not aliased to A and still get well-defined semantics, precisely because A is not referenced in the scope of x2. That refinement would only get you into trouble if A is referenced in the scope of x2, which would trigger UB.

Going further, logically the intrinsic should return a pointer to a new object, disjoint from all other live objects. It is not aliased to A, and is well defined even if it contains &A because A is not referenced in the scope. This does require dataflow barriers on entrance/exits to the scope, but those can be made no worse than the original code.

Aliasing x2 to A is not only unnecessary, but also pessimistic because in general you do not have access to the dynamic scope of the restricted pointer.

T A, B;
T * x1 = … // either &A or &B
T * y2 =​ … // ​​maybe &A
{
T * restrict x2 = y2;
​ *​x1 = …
​ *​x2 = …
}

In this case you’ll be able to tell *x1 doesn’t alias​ *x2, right?

In this case, yes, we can conclude that x1 and x2 don’t alias (because *x1 and *x2 cannot both legally refer to the same object).

How about if you add restrict to x1?

The conclusion is the same, but if you add restrict to x1, you don’t need it on x2. x2 is definitely not based on x1, so if x1 is restrict, then we know that x1 and x2 don’t alias.

Agreed. So will your approach be able to catch both cases? It seemed to me it wouldn’t be able to catch the second one because it would have a different scope, but probably I’m missing something.

Thanks for your patience,

From: "Raul Silvera" <rsilvera@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, November 17, 2014 11:26:57 AM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

> You don't have x1 and x2 in your example, assuming you mean:
>
> int i = 0;
> T A;
> T * y2 = ...
> {
> T * x1 = &A;
> a = x1[i];
> }
> {
> T * restrict x2 = y2;
> b = x2[i];
> }
>
> It should, no? by virtue of x2 being restrict you know that *x2
> doesn't alias A, and *x1 is A.

No, it doesn't. The fact that x2 is restrict does not mean that it
does not alias with any other potential accesses from variables live
in its block. It only means it does not alias with other accesses
with that occur in the block where x2 is live. There is no access to
A or x1 in that block, so we can say nothing about it.

It does. You can assume x2 is not aliased to A and still get
well-defined semantics, precisely because A is not referenced in the
scope of x2. That refinement would only get you into trouble if A is
referenced in the scope of x2, which would trigger UB.

I don't understand exactly what you're saying here. You can do that at the source level where you still have the original blocks. The problem is that, at the IR level, these blocks don't remain separate basic blocks, and the distinction then matters.

Going further, logically the intrinsic should return a pointer to a
new object, disjoint from all other live objects. It is not aliased
to A, and is well defined even if it contains &A because A is not
referenced in the scope.

This is essentially what is done, but only for accesses in the scope (or some sub-scope). I don't think the semantics allow for what you're suggesting. The specific language from 6.7.3.1p4 says:

[from C]
During each execution of B, let L be any lvalue that has &L based on P. If L is used to
access the value of the object X that it designates, ...,
then the following requirements apply: ... Every other lvalue
used to access the value of X shall also have its address based on P.
[end from C]

Where B is defined in 6.7.3.1p2 to be, essentially, the block in which the relevant declaration appears. And we can really only extrapolate from that to the other access in that block, and not to the containing block.

This does require dataflow barriers on

entrance/exits to the scope, but those can be made no worse than the
original code.

These don't turn into general scheduling barriers anyway. They'll be tagged as writing to memory, yes, but like with @llvm.assume, they'll get special treatment in BasicAA and a few other places so they don't hurt code motion too badly.

Aliasing x2 to A is not only unnecessary, but also pessimistic

It is pessimistic, but only in the sense that the restrict qualifier does not say anything about it.

because in general you do not have access to the dynamic scope of
the restricted pointer.

T A, B;
T * x1 = .... // either &A or &B
T * y2 =​ .... // ​​maybe &A
{
T * restrict x2 = y2;
*​x1 = ...
*​x2 = ...
}

>
> In this case you'll be able to tell *x1 doesn't alias​ *x2, right?

In this case, yes, we can conclude that x1 and x2 don't alias
(because *x1 and *x2 cannot both legally refer to the same object).

> How about if you add restrict to x1?

The conclusion is the same, but if you add restrict to x1, you don't
need it on x2. x2 is definitely not based on x1, so if x1 is
restrict, then we know that x1 and x2 don't alias.

Agreed. So will your approach be able to catch both cases? It seemed
to me it wouldn't be able to catch the second one because it would
have a different scope, but probably I'm missing something.

Yes, it will catch it. Just as in the current metadata design, the scope of each access is really a list of scopes. The accesses in the inner blocks get tagged with both the inner and the outer scopes, so they pick up the restrict from the outer scope.

Thanks for your patience,

Not a problem; I appreciate the feedback!

-Hal

You don’t have x1 and x2 in your example, assuming you mean:

int i = 0;
T A;
T * y2 = …
{
T * x1 = &A;
a = x1[i];
}
{
T * restrict x2 = y2;
b = x2[i];
}

It should, no? by virtue of x2 being restrict you know that *x2
doesn’t alias A, and *x1 is A.

No, it doesn’t. The fact that x2 is restrict does not mean that it
does not alias with any other potential accesses from variables live
in its block. It only means it does not alias with other accesses
with that occur in the block where x2 is live. There is no access to
A or x1 in that block, so we can say nothing about it.

It does. You can assume x2 is not aliased to A and still get
well-defined semantics, precisely because A is not referenced in the
scope of x2. That refinement would only get you into trouble if A is
referenced in the scope of x2, which would trigger UB.

I don’t understand exactly what you’re saying here. You can do that at the source level where you still have the original blocks. The problem is that, at the IR level, these blocks don’t remain separate basic blocks, and the distinction then matters.

Agreed. My point is that if you preservethe
​block boundaries​
you

can use

​better
a
​​
liasing for the restricted pointers. You are already preserving the block entry
​by introducing the
intrinsic
​;
the block exits could be similarly preserved.

Going further, logically the intrinsic should return a pointer to a
new object, disjoint from all other live objects. It is not aliased
to A, and is well defined even if it contains &A because A is not
referenced in the scope.

This is essentially what is done, but only for accesses in the scope (or some sub-scope). I don’t think the semantics allow for what you’re suggesting. The specific language from 6.7.3.1p4 says:

[from C]
During each execution of B, let L be any lvalue that has &L based on P. If L is used to
access the value of the object X that it designates, …,
then the following requirements apply: … Every other lvalue
used to access the value of X shall also have its address based on P.
[end from C]

Where B is defined in 6.7.3.1p2 to be, essentially, the block in which the relevant declaration appears. And we can really only extrapolate from that to the other access in that block, and not to the containing block.

Inside that block
​ (the lifetime of P)
, it is safe to assume that X is​disjoint from an
arbitrary live object​
A. It if was​
n’t

, either:

  • A is independently referenced inside the block, so there is UB and all bets are off.
  • A is not independently referenced inside the block,
    ​so
    there are no pairs of accesses to incorrectly reorder as all accesses to A in

    the block are done through P. You just need to delimit the block with dataflow barriers​,​
    summariz
    ​ing
    the effect of the block at entry/exit.

This is similar to the way dummy args are implemented on Fortran compilers, extended to arbitrary scopes.

From: "Raul Silvera" <rsilvera@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, November 18, 2014 11:23:25 AM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

> > You don't have x1 and x2 in your example, assuming you mean:
> >
> > int i = 0;
> > T A;
> > T * y2 = ...
> > {
> > T * x1 = &A;
> > a = x1[i];
> > }
> > {
> > T * restrict x2 = y2;
> > b = x2[i];
> > }
> >
> > It should, no? by virtue of x2 being restrict you know that *x2
> > doesn't alias A, and *x1 is A.
>
> No, it doesn't. The fact that x2 is restrict does not mean that it
> does not alias with any other potential accesses from variables
> live
> in its block. It only means it does not alias with other accesses
> with that occur in the block where x2 is live. There is no access
> to
> A or x1 in that block, so we can say nothing about it.
>
>
>
> It does. You can assume x2 is not aliased to A and still get
> well-defined semantics, precisely because A is not referenced in
> the
> scope of x2. That refinement would only get you into trouble if A
> is
> referenced in the scope of x2, which would trigger UB.

I don't understand exactly what you're saying here. You can do that
at the source level where you still have the original blocks. The
problem is that, at the IR level, these blocks don't remain separate
basic blocks, and the distinction then matters.

Agreed. My point is that if you preserve the
block boundaries yo u

can use
better a
liasing for the restricted pointers. You are already preserving the
blo ck entry
by introducing the intrinsic
; the block exits could be similarly preserved.

I preserve them only weakly... I don't want full barriers; in fact, I plan to add InstCombine code to combine calls to @llvm.noalias (it will also take a list of scopes, not just one, so this is possible). The goal is to have as few barriers as possible.

> Going further, logically the intrinsic should return a pointer to a
> new object, disjoint from all other live objects. It is not aliased
> to A, and is well defined even if it contains &A because A is not
> referenced in the scope.

This is essentially what is done, but only for accesses in the scope
(or some sub-scope). I don't think the semantics allow for what
you're suggesting. The specific language from 6.7.3.1p4 says:

[from C]
During each execution of B, let L be any lvalue that has &L based on
P. If L is used to
access the value of the object X that it designates, ...,
then the following requirements apply: ... Every other lvalue
used to access the value of X shall also have its address based on P.
[end from C]

Where B is defined in 6.7.3.1p2 to be, essentially, the block in
which the relevant declaration appears. And we can really only
extrapolate from that to the other access in that block, and not to
the containing block.

Inside that block
(the lifetime of P) , it is safe to assume that X is
disjoint from an arbitrary live object
A. It if was

n't
, either:
- A is independently referenced inside the block, so there is UB and
all bets are off.
- A is not independently referenced inside the blo ck,
so t here are no pairs of accesses to incorrectly reorder as all
accesses to A in
the block are done through P. You just need to delimit the block
with dataflow barriers
, summar iz
ing the effect of the block at entry/exit.

Okay, I think I agree with you assuming that we put in entry/exit barriers to preserve the block boundaries. I'd specifically like to avoid that, however.

This is similar to the way dummy args are implemented on Fortran
compilers, extended to arbitrary scopes.

Interesting.

Thanks again,
Hal

I preserve them only weakly... I don't want full barriers; in fact, I plan
to add InstCombine code to combine calls to @llvm.noalias (it will also
take a list of scopes, not just one, so this is possible). The goal is to
have as few barriers as possible.

Good.

> Going further, logically the intrinsic should return a pointer to a
> > new object, disjoint from all other live objects. It is not aliased
> > to A, and is well defined even if it contains &A because A is not
> > referenced in the scope.
>
> This is essentially what is done, but only for accesses in the scope
> (or some sub-scope). I don't think the semantics allow for what
> you're suggesting. The specific language from 6.7.3.1p4 says:
>
> [from C]
> During each execution of B, let L be any lvalue that has &L based on
> P. If L is used to
> access the value of the object X that it designates, ...,
> then the following requirements apply: ... Every other lvalue
> used to access the value of X shall also have its address based on P.
> [end from C]
>
> Where B is defined in 6.7.3.1p2 to be, essentially, the block in
> which the relevant declaration appears. And we can really only
> extrapolate from that to the other access in that block, and not to
> the containing block.
>
>
> Inside that block
> (the lifetime of P) , it is safe to assume that X is
> disjoint from an arbitrary live object
> A. It if was
>
> n't
> , either:
> - A is independently referenced inside the block, so there is UB and
> all bets are off.
> - A is not independently referenced inside the blo ck,
> so t here are no pairs of accesses to incorrectly reorder as all
> accesses to A in
> the block are done through P. You just need to delimit the block
> with dataflow barriers
> , summar iz
> ing the effect of the block at entry/exit.

Okay, I think I agree with you assuming that we put in entry/exit barriers
to preserve the block boundaries. I'd specifically like to avoid that,
however.

I'm not proposing full code motion barriers, only punctual dataflow
use/defs to signal entry/exit to the scope.

Logically, entering the scope transfers the pointed data into a new unique
block of memory, and puts its address on the restrict pointer. Exiting the
scope transfers it back. Of course you do not want to actually allocate a
new object and move the data, but you can use these semantics to define the
scope entry/exit intrinsics. Their contribution to dataflow is only limited
to the content of the address used to initialize the restricted pointer.
These would be lighter than the proposed intrinsic as they would not have
specialized control-flow ​restrictions.

This approach makes the restrict attribute effective against all live
variables without having to examine the extent of the scope to collect all
references, which is in general impractical. It also removes the need for
scope metadata, as there would be no need to name the scopes.

Anyway, this is just a general alternate design, since you were asking for
one. I'm sure still would take some time/effort to map it onto the LLVM
framework.

Regards,

From: "Raul Silvera" <rsilvera@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, November 18, 2014 9:09:40 PM
Subject: Re: [LLVMdev] Upcoming Changes/Additions to Scoped-NoAlias metadata

I preserve them only weakly... I don't want full barriers; in fact, I
plan to add InstCombine code to combine calls to @llvm.noalias (it
will also take a list of scopes, not just one, so this is possible).
The goal is to have as few barriers as possible.

Good.

> > Going further, logically the intrinsic should return a pointer to
> > a
> > new object, disjoint from all other live objects. It is not
> > aliased
> > to A, and is well defined even if it contains &A because A is not
> > referenced in the scope.
>
> This is essentially what is done, but only for accesses in the
> scope
> (or some sub-scope). I don't think the semantics allow for what
> you're suggesting. The specific language from 6.7.3.1p4 says:
>
> [from C]
> During each execution of B, let L be any lvalue that has &L based
> on
> P. If L is used to
> access the value of the object X that it designates, ...,
> then the following requirements apply: ... Every other lvalue
> used to access the value of X shall also have its address based on
> P.
> [end from C]
>
> Where B is defined in 6.7.3.1p2 to be, essentially, the block in
> which the relevant declaration appears. And we can really only
> extrapolate from that to the other access in that block, and not to
> the containing block.
>
>
> Inside that block
> (the lifetime of P) , it is safe to assume that X is
> disjoint from an arbitrary live object
> A. It if was
>
> n't
> , either:
> - A is independently referenced inside the block, so there is UB
> and
> all bets are off.
> - A is not independently referenced inside the blo ck,
> so t here are no pairs of accesses to incorrectly reorder as all
> accesses to A in
> the block are done through P. You just need to delimit the block
> with dataflow barriers
> , summar iz
> ing the effect of the block at entry/exit.

Okay, I think I agree with you assuming that we put in entry/exit
barriers to preserve the block boundaries. I'd specifically like to
avoid that, however.

I'm not proposing full code motion barriers, only punctual dataflow
use/defs to signal entry/exit to the scope.

Logically, entering the scope transfers the pointed data into a new
unique block of memory, and puts its address on the restrict
pointer. Exiting the scope transfers it back. Of course you do not
want to actually allocate a new object and move the data, but you
can use these semantics to define the scope entry/exit intrinsics.
Their contribution to dataflow is only limited to the content of the
address used to initialize the restricted pointer. These would be
lighter than the proposed intrinsic as they would not have
specialized control-flow ​restrictions.

Thanks for explaining, I now understand what you're proposing.

This approach makes the restrict attribute effective against all live
variables without having to examine the extent of the scope to
collect all references, which is in general impractical.

I think you've misunderstood this. For restrict-qualified local variables, every memory access within the containing block (which is everything in the function for function argument restrict-qualified pointers) get tagged with the scope. This is trivial to determine.

It also
removes the need for scope metadata, as there would be no need to
name the scopes.

Indeed.

Anyway, this is just a general alternate design, since you were
asking for one.

Yes, and thank you for doing so.

I'm sure still would take some time/effort to map it
onto the LLVM framework.

That does not seem too difficult, the question is really just whether or not it gives us what we need...

So in this scheme, we'd have the following:

void foo(T * restrict a, T * restrict b) {
  *a = *b;
}

T * x = ..., *y = ..., *z = ..., *w = ...;
foo(x, y);
foo(z, w);

become:

T * x = ..., *y = ..., *z = ..., *w = ...;

T * a1 = @llvm.noalias.start(x); // model: reads from x (with a general write control dep).
T * b1 = @llvm.noalias.start(y);
*a1 = *b1;
@llvm.noalias.end(a1, x); // model: reads from a1, writes to x.
@llvm.noalias.end(b1, y);

T * a2 = @llvm.noalias.start(z);
T * b2 = @llvm.noalias.start(w);
*a2 = *b2;
@llvm.noalias.end(a2, z);
@llvm.noalias.end(b2, w);

This does indeed seem generally equivalent to the original proposal in the sense that the original proposal has an implicit ending barrier at the last relevant derived access, and here we have explicit ending barriers. The advantage is the lack of metadata (and associated implementation complexity). The disadvantage is that we have additional barriers to manage, and these are write barriers on the underlying pointers. It is not clear to me this would make too much difference, so long as we aggressively hoisted the ending barriers to just after the last use based on their 'noalias' operands.

So this is relatively appealing, and I think would not be a bad way to model C99 restrict (extending the scheme to handle mutually-ambiguous restrict-qualified pointers from aggregates seems straightforward). It does not, however, cover cases where the region of guaranteed disjointness (for lack of a better term) is not continuous. This will come up when implementing a scheme such as that in the current C++ alias-set proposal (N4150). To construct a quick example, imagine that our implementation of std::vector is annotated such that (assuming the standard allocator) each std::vector object's internal storage has a distinct alias set, and we have:

  std::vector<T> x, y;
  ...
  T * q = &x[0];
  for (int i = 0; i < 1600; ++i) {
    x[i] = y[i];
    *q += x[i];
  }

so here we know that the memory accesses inside the operator from x and y don't alias, but the alias-set attribute does not tell us about the relationship between those accesses and the *q. The point of dominance, however, needs to associated with the declaration of x and y (specifically, we want to preserve the dominance over the loop). A start/end barrier scheme localized around the inlined operator functions would not do that, and placing start/end barriers around the entire live region of x and y would not be correct. I can, however, represent this using the metadata scheme.

Thanks again,
Hal