C99 restrict

Has there been any discussion of supporting the ‘restrict’ C99 keyword and C++ extension to boost alias analysis? My impression is that this would require modification of the LLVM IR. I couldn’t find any discussion hits using the usual suspects for searches…

Also, isn’t this going to be critical to supporting FORTRAN well?

So far, there hasn't been a discussion. IMO, the most important form is for formal arguments. That could easily be added thorough the use of an attribute on the parameter.

-Chris

I assume the idea here is to avoid actually attributing the type (as was avoided with signed/unsigned integers by differentiating ops).

What about an approach not unlike how debugging information is handled? That is have an llvm intrinsic that encodes the known alias free range for a pointer.

So far, there hasn't been a discussion. IMO, the most important form is
for formal arguments. That could easily be added thorough the use of an
attribute on the parameter.

I assume the idea here is to avoid actually attributing the type (as was avoided with signed/unsigned integers by differentiating ops).

Yep, exactly.

What about an approach not unlike how debugging information is handled? That is have an llvm intrinsic that encodes the known alias free range for a pointer.

That is another great possibility. One issue is that I haven't had time to study the implications of C99 restrict, so I don't feel qualified to propose a concrete solution yet. Ideally, the mechanism added to LLVM would be able to handle restrict as well as fortran argument parameters (even if the fortran functions are inlined!), etc. IOW, we don't want to have an feature that just implements C99 restrict, we want a more general mechanism which C99 restrict can be implemented in terms of. It seems like an intrinsic would be a good way to do that.

-Chris

Certainly language independence is a requirement.

I think your point about inlined functions is also valid for restrict parameters to functions that have been inlined in C/C++. The aliasing guarantees are limited to within that function's scope For an intrinsics approach this would require intrinsics which estrict/unrestrict the pointer bounding it's life within the function where it is declared restrict. The other approach would be to add attributes that mark all uses of the pointers as explicitly alias free, which would be much more invasive.

On a related topic, I think that similar types of attributes will be needed if LLVM is to support a concurrent memory model. Perhaps these could all be rolled into the same package.

Ideally I think it would be nice to be able to start off using intrinsics to communicate this kind of information, though it may end up as changes to instruction down the line, but I'm not enough of an expert to be able to say whether this is a feasible strategy.

For representing scoping information, a relatively non-invasive
approach is to introduce a special "copy" operation, which in LLVM
might look like
  %a = copy %b
This operation has to be somewhat special in that can't be folded away
in most cases, but it can otherwise be pretty straight-forward to
optimizers. The idea is to allow %a to be used in whatever alias
intrinsics are available without tainting %b. An inliner would generate
these special instructions for each of the arguments, and then all
references to the arguments within the inlined body would be made
through these copies.

An alias-free attribute wouldn't be usable in general because pointers
are almost never completely alias-free. For example in C:

  int * restrict p = ...;
  p = 0;
  p[y] = 1;
  int * q = p + z;
  *q = 2;
  *p = 3;
  int * r = foo(p);
  *r = 4;

Translated to LLVM:

  %t0 = getelementptr %p, %x
  store 0, %t0
  %t1 = getelementptr %p, %y
  store 1, %t1
  %q = getelementptr %p, %z
  store 2, %q
  store 3, %p
  %r = call @foo(%p)
  store 4, %r

C's restrict keyword allows pointers to be "based-on" restrict-qualified
pointers, such as q in this example. The definition of "based-on" does
not require the relationship to be obvious though. r in this example may
or may not be "based-on" p, so an optimizer must make very conservative
assumptions unless it can get more information about foo.

Beyond C, some form of "based-on" relationship would also be needed for
LLVM IR, at least so that it could cope with things like %t0 and %t1
aliasing each other and %p in this example.

Dan

What about an approach not unlike how debugging information is
handled? That
is have an llvm intrinsic that encodes the known alias free range
for a
pointer.

That is another great possibility. One issue is that I haven’t had
time
to study the implications of C99 restrict, so I don’t feel
qualified to
propose a concrete solution yet. Ideally, the mechanism added to LLVM
would be able to handle restrict as well as fortran argument
parameters
(even if the fortran functions are inlined!), etc. IOW, we don’t
want to
have an feature that just implements C99 restrict, we want a more
general
mechanism which C99 restrict can be implemented in terms of. It seems
like an intrinsic would be a good way to do that.

Certainly language independence is a requirement.

I think your point about inlined functions is also valid for restrict
parameters to functions that have been inlined in C/C++. The aliasing
guarantees are limited to within that function’s scope For an
intrinsics approach this would require intrinsics which estrict/
unrestrict the pointer bounding it’s life within the function where
it is declared restrict. The other approach would be to add
attributes that mark all uses of the pointers as explicitly alias
free, which would be much more invasive.

For representing scoping information, a relatively non-invasive
approach is to introduce a special “copy” operation, which in LLVM
might look like
%a = copy %b
This operation has to be somewhat special in that can’t be folded away
in most cases, but it can otherwise be pretty straight-forward to
optimizers. The idea is to allow %a to be used in whatever alias
intrinsics are available without tainting %b. An inliner would generate
these special instructions for each of the arguments, and then all
references to the arguments within the inlined body would be made
through these copies.

An alias-free attribute wouldn’t be usable in general because pointers
are almost never completely alias-free. For example in C:

int * restrict p = …;
p = 0;
p[y] = 1;
int * q = p + z;
*q = 2;
*p = 3;
int * r = foo(p);
*r = 4;

Translated to LLVM:

%t0 = getelementptr %p, %x
store 0, %t0
%t1 = getelementptr %p, %y
store 1, %t1
%q = getelementptr %p, %z
store 2, %q
store 3, %p
%r = call @foo(%p)
store 4, %r

C’s restrict keyword allows pointers to be “based-on” restrict-qualified
pointers, such as q in this example. The definition of “based-on” does
not require the relationship to be obvious though. r in this example may
or may not be “based-on” p, so an optimizer must make very conservative
assumptions unless it can get more information about foo.

My understanding of ‘restrict’ is that the compiler isn’t required to check that the semantics of restrict aren’t violated. I would assume that FORTRAN arguments are the same way, since determining whether or not the semantics of are being violated is just as hard as normal alias analysis.

It may even be incorrect for the compiler to enforce the semantics of restrict pointers (with an error), even when the compilers alias analysis determines that there is a may-alias relationship between to pointers. Take the following example, which the compiler is now free to unroll and reschedule:

struct element { int a, b; };

void swap(element * arr, int size)
{
int * __restrict a = &arr[0].a;
int * __restrict b = &arr[0].b;
for (int i=0; i!=size; ++i)
{
int tmp = a[i];
a[i] = b[i];
b[i] = tmp;
a += 2;
b += 2;
}
}

Christopher Lamb wrote:-

It may even be incorrect for the compiler to enforce the semantics of
restrict pointers (with an error), even when the compilers alias
analysis determines that there is a may-alias relationship between to
pointers.

An error is an example of undefined behaviour.

Neil.

mechanism which C99 restrict can be implemented in terms of. It seems
like an intrinsic would be a good way to do that.

Certainly language independence is a requirement.

I think your point about inlined functions is also valid for restrict
parameters to functions that have been inlined in C/C++. The aliasing

Yep.

Ideally I think it would be nice to be able to start off using
intrinsics to communicate this kind of information, though it may end
up as changes to instruction down the line, but I'm not enough of an
expert to be able to say whether this is a feasible strategy.

Yes, that is a preferable approach. Intrinsics are easy to add, and can do almost anything that instructions can. They are a great way to get experience with an approach.

-Chris

>>
>>
>>
>>>>What about an approach not unlike how debugging information is
>>>>handled? That
>>>>is have an llvm intrinsic that encodes the known alias free range
>>>>for a
>>>>pointer.
>>>
>>>That is another great possibility. One issue is that I haven't had
>>>time
>>>to study the implications of C99 restrict, so I don't feel
>>>qualified to
>>>propose a concrete solution yet. Ideally, the mechanism added to
>>>LLVM
>>>would be able to handle restrict as well as fortran argument
>>>parameters
>>>(even if the fortran functions are inlined!), etc. IOW, we don't
>>>want to
>>>have an feature that just implements C99 restrict, we want a more
>>>general
>>>mechanism which C99 restrict can be implemented in terms of. It
>>>seems
>>>like an intrinsic would be a good way to do that.
>>
>>Certainly language independence is a requirement.
>>
>>I think your point about inlined functions is also valid for restrict
>>parameters to functions that have been inlined in C/C++. The aliasing
>>guarantees are limited to within that function's scope For an
>>intrinsics approach this would require intrinsics which estrict/
>>unrestrict the pointer bounding it's life within the function where
>>it is declared restrict. The other approach would be to add
>>attributes that mark all uses of the pointers as explicitly alias
>>free, which would be much more invasive.
>
>For representing scoping information, a relatively non-invasive
>approach is to introduce a special "copy" operation, which in LLVM
>might look like
> %a = copy %b
>This operation has to be somewhat special in that can't be folded away
>in most cases, but it can otherwise be pretty straight-forward to
>optimizers. The idea is to allow %a to be used in whatever alias
>intrinsics are available without tainting %b. An inliner would
>generate
>these special instructions for each of the arguments, and then all
>references to the arguments within the inlined body would be made
>through these copies.

BTW, I misspoke here; this doesn't address the scoping problem.

>
>An alias-free attribute wouldn't be usable in general because pointers
>are almost never completely alias-free. For example in C:

From my understanding the 'restrict' qualifier is a directive to the
compiler that the pointer or reference should be treated as alias
free within the scope of the function, regardless. For the below code
alias(p, q) == No, and alias(p, r) == No, even though q is "based-on"
p and r may be "based-on" p. This is both the power and danger of
'restrict', as it is up to the programmer to use it correctly.

The definition of restrict is more involved than that. A restrict-qualified
pointer may alias a pointer "based-on" it. And if the "based-on" relationships
aren't known, then the answer to the associated alias queries must be Maybe.

In the test I posted below, the program is correct if x, y, and z are all
the same, so the stores cannot be reordered.

> int * restrict p = ...;
> p = 0;
> p[y] = 1;
> int * q = p + z;
> *q = 2;
> *p = 3;
> int * r = foo(p);
> *r = 4;
>
>Translated to LLVM:
>
> %t0 = getelementptr %p, %x
> store 0, %t0
> %t1 = getelementptr %p, %y
> store 1, %t1
> %q = getelementptr %p, %z
> store 2, %q
> store 3, %p
> %r = call @foo(%p)
> store 4, %r
>
>C's restrict keyword allows pointers to be "based-on" restrict-
>qualified
>pointers, such as q in this example. The definition of "based-on" does
>not require the relationship to be obvious though. r in this
>example may
>or may not be "based-on" p, so an optimizer must make very
>conservative
>assumptions unless it can get more information about foo.

My understanding of 'restrict' is that the compiler isn't required to
check that the semantics of restrict aren't violated. I would assume
that FORTRAN arguments are the same way, since determining whether or
not the semantics of are being violated is just as hard as normal
alias analysis.

It may even be incorrect for the compiler to enforce the semantics of
restrict pointers (with an error), even when the compilers alias
analysis determines that there is a may-alias relationship between to
pointers. Take the following example, which the compiler is now free
to unroll and reschedule:

struct element { int a, b; };

void swap(element * arr, int size)
{
  int * __restrict a = &arr[0].a;
  int * __restrict b = &arr[0].b;
  for (int i=0; i!=size; ++i)
  {
    int tmp = a[i];
    a[i] = b[i];
    b[i] = tmp;
    a += 2;
    b += 2;
  }
}

Yes; this test case is relatively simple to analyze.

The point I was trying to make is that the ‘restrict’ qualifier stipulates the result of an alias query between to such qualified pointers. So even if the alias analysis returns that these two pointers may alias, it would not be correct to flag an error. This link has an example which may be useful http://www.lysator.liu.se/c/restrict.html#overlapping-objects.

From my understanding the ‘restrict’ qualifier is a directive to the compiler that the pointer or reference should be treated as alias free within the scope of the function, regardless. For the below code alias(p, q) == No, and alias(p, r) == No, even though q is “based-on” p and r may be “based-on” p. This is both the power and danger of ‘restrict’, as it is up to the programmer to use it correctly.

My understand here was incorrect. Restrict and non-restrict pointers may alias each other, so alias(p, q) == May and alias(p, r) == May. It is only the case that two restrict pointers do not alias each other.

For representing scoping information, a relatively non-invasive
approach is to introduce a special "copy" operation, which in LLVM
might look like
  %a = copy %b
This operation has to be somewhat special in that can't be folded away
in most cases, but it can otherwise be pretty straight-forward to
optimizers. The idea is to allow %a to be used in whatever alias
intrinsics are available without tainting %b. An inliner would generate
these special instructions for each of the arguments, and then all
references to the arguments within the inlined body would be made
through these copies.

I think the goal should be to capture the most important information. The
c99 definition of restrict has some nasty and complex implementation issues. It may not be reasonable or interesting to capture every corner case.

A more interesting question is how to represent TBAA information in LLVM :).

Here is a sketch of a concrete proposal to get us started:

1. Add a new argument attribute "noalias", which can be applied to pointer
    formals and pointer function results.
2. Add a new intrinsic "i8* @llvm.noalias(i8*)

The semantics of 'noalias' are that they "define a new object". Any pointer derived from it ("it" being the result of a no-alias call result, a formal parameter, or the result of llvm.noalias) is considered to not alias:

A. any other objects (e.g. global vars, allocas, functions, etc)
B. any other "noalias" objects
C. any other formal argument parameters or call results.

For these purposes, "pointers derived" means getelementptr and bitcast.

Can C99 restrict be mapped onto these semantics? Is "C" really safe?

With these semantics, it is clear that 'calloc' could be declared to have a result pointer that is noalias for example, but I'm not sure if it would be safe to map restrict onto it.

An alias-free attribute wouldn't be usable in general because pointers
are almost never completely alias-free. For example in C:

From my understanding the 'restrict' qualifier is a directive to the compiler that the pointer or reference should be treated as alias free within the scope of the function, regardless. For the below code alias(p, q) == No, and alias(p, r) == No, even though q is "based-on" p and r may be "based-on" p. This is both the power and danger of 'restrict', as it is up to the programmer to use it correctly.

C99 Restrict is very tricky and subtle. IIRC, it is defined in terms of accesses and objects, not in terms of anything intuitive or natural :slight_smile:

the following example, which the compiler is now free to unroll and reschedule:

struct element { int a, b; };

void swap(element * arr, int size)
{
int * __restrict a = &arr[0].a;
int * __restrict b = &arr[0].b;
for (int i=0; i!=size; ++i)
{
  int tmp = a[i];
  a[i] = b[i];
  b[i] = tmp;
  a += 2;
  b += 2;
  }
}

This example is also undefined because of the pointer arithmetic you are using.

-Chris

For representing scoping information, a relatively non-invasive
approach is to introduce a special "copy" operation, which in LLVM
might look like
  %a = copy %b
This operation has to be somewhat special in that can't be folded away
in most cases, but it can otherwise be pretty straight-forward to
optimizers. The idea is to allow %a to be used in whatever alias
intrinsics are available without tainting %b. An inliner would generate
these special instructions for each of the arguments, and then all
references to the arguments within the inlined body would be made
through these copies.

I think the goal should be to capture the most important information. The
c99 definition of restrict has some nasty and complex implementation
issues. It may not be reasonable or interesting to capture every corner
case.

A more interesting question is how to represent TBAA information in LLVM
:).

Here is a sketch of a concrete proposal to get us started:

1. Add a new argument attribute "noalias", which can be applied to pointer
    formals and pointer function results.
2. Add a new intrinsic "i8* @llvm.noalias(i8*)

In the C99 spec it would appear that declaring a pointer function result as restrict has no effect, as it depends purely on whether the pointer being assigned to is declared restrict or not. It's pointed out here that this also applies to using restrict on the pointer type of casts, the restrict qualifier makes no difference in the cast, only to the type that's being assigned to.

In this case I'd think that pointer function results would have the intrinsic applied to them if they were assigned to a restrict pointer. Other than clearing up the pointer function results this seems like a good concise proposal.

The semantics of 'noalias' are that they "define a new object". Any
pointer derived from it ("it" being the result of a no-alias call result,
a formal parameter, or the result of llvm.noalias) is considered to not
alias:

A. any other objects (e.g. global vars, allocas, functions, etc)
B. any other "noalias" objects
C. any other formal argument parameters or call results.

For these purposes, "pointers derived" means getelementptr and bitcast.

Can C99 restrict be mapped onto these semantics?

I just read through the C99 formal definition of restrict (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf page 110) with an eye to this proposal, and I believe that C99 restrict can be mapped onto these semantics. Primarily because the requirement is for access though a restrict pointer to be equivalent to accesses to a copy of the "object" that the pointer points to.

  Is "C" really safe?

I'm not sure what you mean by safe, but I think it's consistent with C99 restrict.

With these semantics, it is clear that 'calloc' could be declared to have
a result pointer that is noalias for example, but I'm not sure if it would
be safe to map restrict onto it.

Declaring calloc() to return a restrict pointer would have no semantic difference than returning a normal pointer in C99. It would need to be assigned to a variable declared as a restrict pointer to make a difference.

the following example, which the compiler is now free to unroll and
reschedule:

struct element { int a, b; };

void swap(element * arr, int size)
{
int * __restrict a = &arr[0].a;
int * __restrict b = &arr[0].b;
for (int i=0; i!=size; ++i)
{
  int tmp = a[i];
  a[i] = b[i];
  b[i] = tmp;
  a += 2;
  b += 2;
  }
}

This example is also undefined because of the pointer arithmetic you are
using.

Yes. Yes it is. I think I need more sleep. =) I was trying to cook up something like Example 2 (p. 111) from the C99 spec.