Pointer aliasing

Hi LLVMers,
I would like to ask a question regarding aliasing. Suppose I have the
following program:

double f(double** p )
{
   double a,b,c;
   double * x = &a;
   double * y = &b;
   double * z = &c;

   *x = 1;
   *y = *x + 2;
   *z = *x + 3;

   return *x+*y+*z;
}

LLVM can tell that the three pointers do not alias each other so can
perform the constant folding at compile time.

define double @f(double** nocapture %p) nounwind uwtable readnone {
  ret double 8.000000e+00
}

Now consider the function below. I know (in my particluar case) that
the pointers in the p array do not alias each other. I tried to
communicate this information to llvm/clang via the __restrict__
qualifier but it does not seem to have an effect. Can you please
suggest what is wrong. How can I achieve what I want?

double f(double** p )
{
   double *__restrict__ x = p[0];
   double *__restrict__ y = p[1];
   double *__restrict__ z = p[2];

   *x = 1;
   *y = *x + 2;
   *z = *x + 3;

   return *x+*y+*z;
}

define double @f(double** nocapture %p) nounwind uwtable {
  %1 = load double** %p, align 8, !tbaa !0
  %2 = getelementptr inbounds double** %p, i64 1
  %3 = load double** %2, align 8, !tbaa !0
  %4 = getelementptr inbounds double** %p, i64 2
  %5 = load double** %4, align 8, !tbaa !0
  store double 1.000000e+00, double* %1, align 8, !tbaa !3
  store double 3.000000e+00, double* %3, align 8, !tbaa !3
  %6 = load double* %1, align 8, !tbaa !3
  %7 = fadd double %6, 3.000000e+00
  store double %7, double* %5, align 8, !tbaa !3
  %8 = load double* %1, align 8, !tbaa !3
  %9 = load double* %3, align 8, !tbaa !3
  %10 = fadd double %8, %9
  %11 = fadd double %10, %7
  ret double %11
}

!0 = metadata !{metadata !"any pointer", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}
!3 = metadata !{metadata !"double", metadata !1}

Thank you for any help.

Brent

Hi Brent,

Looking at your code I can see at least one reason why some of the store operations remain in the output since you are (through x, y, and z) writing in memory which exists outside of your function (p).

Constant propagation also seems to work in the first few lines, *y = *x +1 (%3) is stored directly.

The strange thing to me is that the same doesn't happen for *z = *x + 2. Here *x is loaded again and the addition is still performed...

From this point on, constant propagation seems to stop working completely. Looking at the IR I would have expected something like the following:

define double @f(double** nocapture %p) nounwind uwtable {
    %1 = load double** %p, align 8, !tbaa !0
    %2 = getelementptr inbounds double** %p, i64 1
    %3 = load double** %2, align 8, !tbaa !0
    %4 = getelementptr inbounds double** %p, i64 2
    %5 = load double** %4, align 8, !tbaa !0
    store double 1.000000e+00, double* %1, align 8, !tbaa !3
    store double 3.000000e+00, double* %3, align 8, !tbaa !3
    store double 4.000000e+00, double* %5, align 8, !tbaa !3
    ret double 8.000000e+00
}

!0 = metadata !{metadata !"any pointer", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}
!3 = metadata !{metadata !"double", metadata !1}

I am afraid I can't really help you in telling what went wrong here and caused the missing optimizations, but I agree with you that the result in not what I would have expected.

Cheers,
Roel

Hi Roel,
the code you list below is precisely what I expect to get (of course
the stores must happen but the constant folding should happen as
well).

It all looks very strange. LLVM is behaving as if the __restrict__
keyword was not used at all. Even more strange is the fact that for
this function:

double f(double *__restrict__ x, double *__restrict__ y, double *__restrict__ z)
{
  *x = 1.0;
  *y = *x + 2;
  *z = *x + 3;

  return *x + *y + *z;
}

everything works as expected:

define double @_Z1fPdS_S_(double* noalias nocapture %x, double*
noalias nocapture %y, double* noalias nocapture %z) nounwind uwtable {
  store double 1.000000e+00, double* %x, align 8, !tbaa !0
  store double 3.000000e+00, double* %y, align 8, !tbaa !0
  store double 4.000000e+00, double* %z, align 8, !tbaa !0
  ret double 8.000000e+00
}

!0 = metadata !{metadata !"double", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}

So I can't figure out what the mechanism is for telling llvm that two
pointers that are local variables (and not input arguments) to a
function do not alias each other. I hope one of the developers can
shed some light on this.

Thank you in advance for any help,
Brent

Hi Brent,

I think this is a problem in the easy-cse transform. In this transform load operations can be replaced by their subexpression, in this case the propagated constant, based on the value of the 'CurrentGeneration' of memory writes. This implies that any store operation invalidates the knowledge about previously stored subexpressions.

In general, this is a safe assumption but in this case it is missing quite some optimization potential.

The effect of this can be shown by moving the line %6 one up, to before the previous store operation. This doesn't change the program behaviour but does influence the optimization.

More info on this is in lib/Transforms/Scalar/EarlyCSE.cpp (line 415 is a good start)

I don't have time to improve this at this moment so I'll leave that to you (or anyone else that feels inspired).

Cheers,
Roel

Can you explain please why it works for this version of the function:

double f(double *__restrict__ x, double *__restrict__ y, double
*__restrict__ z);

What is different here? There are stores here as well.

Brent

I have no clue, I didn't have time to look into that example yet.

How does the IR (before optimization) differ from the other version?

Roel

I think the problem here is that the IR doesn't have any way to attach restrict information to loads/stores/pointers.

It works on arguments because they can be given the 'noalias' attribute, and then the alias analyzer must understand what that means.

Pete

Peter Cooper wrote:

I think the problem here is that the IR doesn't have any way to attach restrict information to loads/stores/pointers.

I think we do now, actually. Now that the loads and stores have TBAA metadata, I think the restrict attribute can go there. It needs to be attached to every use of a restrict pointer, but that's similar to how TBAA already works.

It works on arguments because they can be given the 'noalias' attribute, and then the alias analyzer must understand what that means.

Yep, I concur.

Nick

Peter Cooper wrote:

I think the problem here is that the IR doesn't have any way to attach restrict information to loads/stores/pointers.

I think we do now, actually. Now that the loads and stores have TBAA metadata, I think the restrict attribute can go there. It needs to be attached to every use of a restrict pointer, but that's similar to how TBAA already works.

Yeah, metadata would work between 2 restrict pointers, but i'm not sure if its safe to use between restrict and non-restrict pointers. For example,

float* __restrict p = …
float* q = p;
*p = ...
*q = …

The 2 stores would be given different metadata which would make them not alias, but i'm not sure if thats what the spec says.

LLVM ignores restrict everywhere except function parameters. This is a
compromise aimed at a sweet spot in the balance of compiler complexity
vs. optimization opportunity.

- Many analysis and optimization techniques naturally apply to whole
   functions. When restrict appears on a local variable inside a
   function, its special aliasing property applies to only a subset of
   the function. It's awkward to teach such code to understand and
   respect local scope boundaries, in general.

- Function boundaries are often the boundaries of analysis.
   Interprocedural analysis can be expensive and complex, so many
   optimization passes are limited to thinking about one function
   at a time. And even interprocedural analysis passes are
   bounded by shared library boundaries. While local variables can
   often be analyzed automatically (as in your first example),
   function paramters are often incoming mystery values, so they
   are where restrict is most often interesting.

This compromise does mean that some opportunities are lost (as in
your second example), but from clang's perspective these cases are
rare.

Dan

Thank you for your reply. The compromise you describe below, is it a
compromise in the LLVM back end or in clang? I run into this while
building a compiler for a small DSL language for which I generate
functions that receive a context from which they extract a bunch of
pointers to doubles from which inputs are passed to the function (I
just used C/clang in my examples to illustrate the problem).

Unless I can mark these extracted pointers as non-aliasing, performance
suffers a lot. It's not just CSE that suffers -- LLVM does not do
copy propagation either so we keep loading the same values from memory
over and over again. See the example here:

double f(double a, double * x, double *y, double * z)
{
  *x = a;
  *y = a+1;
  *z = *x + 3;

  return *x + *y + *z;
}

define double @f(double %a, double* nocapture %x, double* nocapture
%y, double* nocapture %z) nounwind uwtable {
  store double %a, double* %x, align 8, !tbaa !0
  %1 = fadd double %a, 1.000000e+00
  store double %1, double* %y, align 8, !tbaa !0
  %2 = load double* %x, align 8, !tbaa !0
  %3 = fadd double %2, 3.000000e+00
  store double %3, double* %z, align 8, !tbaa !0
  %4 = load double* %x, align 8, !tbaa !0
  %5 = load double* %y, align 8, !tbaa !0
  %6 = fadd double %4, %5
  %7 = fadd double %6, %3
  ret double %7
}

!0 = metadata !{metadata !"double", metadata !1}
!1 = metadata !{metadata !"omnipotent char", metadata !2}
!2 = metadata !{metadata !"Simple C/C++ TBAA", null}

(I used arguments here without __restrict__ which has the same effect as
loading my pointers from context as locals). As you can see we keep
loading the value of x from memory, even though we just stored a local
into it. Given that I am generating LLVM IR directly (via the C++
interface) can you suggest someway I could pass the noalias attribute
onto the locals?

One work around is of course to generate two functions as follows:

double f1( struct ctx* ctx )
{
   return f2(ctx->a, ctx->x, ctx->y, ctx->z);
}

double f2( double a, double *__restrict__ x, double *__restrict__ y,
double *__restrict__ z)
{
    *x = a;
    *y = a+1;
    *z = *x + 3;

    return *x + *y + *z;
}

but if at all possible I would like to avoid such acrobatics.

Thank you in advance for any help.
Brent

Hi Brent,

Unless I can mark these extracted pointers as non-aliasing, performance
suffers a lot. It's not just CSE that suffers -- LLVM does not do
copy propagation either so we keep loading the same values from memory
over and over again. See the example here:

double f(double a, double * x, double *y, double * z)
{
   *x = a;
   *y = a+1;
   *z = *x + 3;

   return *x + *y + *z;
}

here you are obliged to reload the values since some of the pointers might
be equal. For example the store to *y will change the value of *x if x and
y are the same pointer.

Ciao, Duncan.

Hi Duncan, you have misunderstood me I think. Of course in this case
you have to reload the doubles since as you say the pointers may be
aliasing each other. I just wrote it this way to show the kind of
code one gets if one cannot specify that pointers do now alias -- not
adding the __restrict__ to the function arguments gives the same code
as in my example with the local variables (with or without the
restrict).

If you could you answer the question in my email I would be grateful.
Given that I am generating IR directly, is it possible for me to mark
locals pointers as non-aliasing? When you said LLVM cannot do it, did
you mean clang cannot or the backend cannot? Is the 2 function
solution I outlined in my previous email my only option?

Thanks,
Brent

Hi Brent,

Hi Duncan, you have misunderstood me I think.

that's quite possible - I haven't been following this thread.

   Of course in this case

you have to reload the doubles since as you say the pointers may be
aliasing each other. I just wrote it this way to show the kind of
code one gets if one cannot specify that pointers do now alias -- not
adding the __restrict__ to the function arguments gives the same code
as in my example with the local variables (with or without the
restrict).

If you could you answer the question in my email I would be grateful.
Given that I am generating IR directly, is it possible for me to mark
locals pointers as non-aliasing?

I don't know.

   When you said LLVM cannot do it, did

you mean clang cannot or the backend cannot?

I didn't say this, maybe you are confusing with someone else.

   Is the 2 function

solution I outlined in my previous email my only option?

I don't know. Sorry not to be more helpful!

Ciao, Duncan.

For restrict-style alias information, the limitation is in LLVM.

Since you're not using clang, you might be able to use a custom TBAA
type tree instead. TBAA works differently from restrict; it applies
to loads and stores, rather than to pointers. But if you can frame
your aliasing property as a type-oriented property, saying for
example that each array element is a pointer to a distinct type, it
may suffice.

Dan

Hi Dan and Others ,

I’m newbie to LLVM and Clang ,But has experience on compiler optimization and VM .
Everyone talking about the LLVM in my organisation so thought of peeking into it and where this discussion is stalled me …

so i tried to simulate the problem ,which is discussed here .
vi sample.c

double f(double** p )
{
double a,b,c;
double * x = &a;
double * y = &b;
double * z = &c;

*x = 1;
*y = *x + 2;
*z = *x + 3;

return *x+*y+*z;
}

double ff(double** p )
{
double a,b,c;
double * x = &a;
double * y = &b;
double * z = &c;

*x = 1;
*y = *x + 2;
*z = *x + 3;

return *x+*y+*z;
}

compiled the sample.c .i.e clang sample.c -S -O3 -emit-llvm
cat sample.s

target datalayout = “e-p:32:32:32-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-f80:32:32-n8:16:32-S32”
target triple = “i386-pc-cygwin”

define double @f(double** nocapture %p) nounwind readnone {
ret double 8.000000e+00
}

define double @ff(double** nocapture %p) nounwind readnone {
ret double 8.000000e+00
}

Boom …BasicAA or TBAA is doing what it suppose to do :slight_smile: for restrict qualifier too…

Any lights on this ???

FYI , $ lli --version
Low Level Virtual Machine (http://llvm.org/):
llvm version 3.0
Optimized build.
Built Jan 24 2012 (05:48:10).
Host: i386-pc-cygwin
Host CPU: penryn

Thanks
~Umesh

Hi Umesh, when the compiler can see that the storage locations do not overlap
there is no problem, for example when storing to variables located on the stack
like in your example. The tricky case is when the compiler can't deduce from
what it can see that the storage locations don't alias, so needs to be helped
for example by using a restrict qualifier.

Ciao, Duncan.