Comparing stack addresses and function args (Was: [llvm] r174131 - Add a comment explaining an unavailable optimization)

Apologies if this has come up before. I couldn't find any previous
discussion, but I did find this commit.

I was wondering why LLVM cannot optimize this code (which GCC does optimize):

  int f(int *p) { int x; return p == &x; }

it would seem that this must always return 0. (This occurs as a
self-assignment check in the code I was looking at; I was hoping we
could fold that check away.)

I'd be interested to hear what those with a stronger understanding of
the standard than myself think about this, and also if there is any
example of something that could break because of this optimization. If
not, I'd like us to optimize it :slight_smile:

Author: djg
Date: Thu Jan 31 18:49:06 2013
New Revision: 174131

URL: http://llvm.org/viewvc/llvm-project?rev=174131&view=rev
Log:
Add a comment explaining an unavailable optimization.

Modified:
    llvm/trunk/lib/Analysis/InstructionSimplify.cpp

Modified: llvm/trunk/lib/Analysis/InstructionSimplify.cpp
URL: http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/Analysis/InstructionSimplify.cpp?rev=174131&r1=174130&r2=174131&view=diff

--- llvm/trunk/lib/Analysis/InstructionSimplify.cpp (original)
+++ llvm/trunk/lib/Analysis/InstructionSimplify.cpp Thu Jan 31 18:49:06 2013
@@ -1688,6 +1688,34 @@ static Value *ExtractEquivalentCondition
   return 0;
}

+// A significant optimization not implemented here is assuming that alloca
+// addresses are not equal to incoming argument values. They don't *alias*,
+// as we say, but that doesn't mean they aren't equal, so we take a
+// conservative approach.
+//
+// This is inspired in part by C++11 5.10p1:
+// "Two pointers of the same type compare equal if and only if they are both
+// null, both point to the same function, or both represent the same
+// address."
+//
+// This is pretty permissive.

Indeed :-/

+// It's also partly due to C11 6.5.9p6:
+// "Two pointers compare equal if and only if both are null pointers, both are
+// pointers to the same object (including a pointer to an object and a
+// subobject at its beginning) or function, both are pointers to one past the
+// last element of the same array object, or one is a pointer to one past the
+// end of one array object and the other is a pointer to the start of a
+// different array object that happens to immediately follow the ï¬ rst array
+// object in the address space.)
+//
+// C11's version is more restrictive, however there's no reason why an argument
+// couldn't be a one-past-the-end value for a stack object in the caller and be
+// equal to the beginning of a stack object in the callee.

This is interesting.

For the one-past-the-end pointer to point into the callee, the stack
would have to be growing upwards. So this won't happen on X86. Can we
turn this optimization on for downward-growing-stack targets?

Second, if the stack grows upward, and the function argument does
point into the callee stack frame, "p" and "&x" could have the same
contents. So per the "represent the same address" part above, they
should compare equal? But they're noalias? Are we allowed to write
through p? It wasn't a pointer to a valid object when we made the
call, but it became valid in the callee? This is all terrifying.

I suppose one could store the value of &x though, and then use it
again later, i.e.:

  int *global;
  int f(int *p) {
    int x;
    global = &x;
    return p == &x;
  }
  int g() {
    f(0);
    return f(global);
  }

Is g() guaranteed to return 1 here? Maybe we could claim it's
implementation dependent? GCC does not seem fold p==&x to 0 here. I
suppose we could make sure to check whether &x escapes the function?

- Hans

Apologies if this has come up before. I couldn't find any previous
discussion, but I did find this commit.

I was wondering why LLVM cannot optimize this code (which GCC does optimize):

  int f(int *p) { int x; return p == &x; }

it would seem that this must always return 0. (This occurs as a
self-assignment check in the code I was looking at; I was hoping we
could fold that check away.)

This is different than a self-assignment check, is it not?

blah& operator=(const blah &b) {
  if (&b == this) {}
  // ...
}

(Because it gets the pointer from the parameter and compares against a
"local" pointer?)

I just want to make sure that you're not suggesting we should optimize
away self-assignment checks in the general case. I agree that the code
you provided should optimize away, based on my understanding of the C
and C++ standards (which may be faulty, and that's why I've CCed in
Richard).

~Aaron

I was wondering why LLVM cannot optimize this code (which GCC does optimize):

  int f(int *p) { int x; return p == &x; }

it would seem that this must always return 0. (This occurs as a
self-assignment check in the code I was looking at; I was hoping we
could fold that check away.)

This is different than a self-assignment check, is it not?

blah& operator=(const blah &b) {
  if (&b == this) {}
  // ...
}

(Because it gets the pointer from the parameter and compares against a
"local" pointer?)

I just want to make sure that you're not suggesting we should optimize
away self-assignment checks in the general case.

Right, I'm not suggesting that :slight_smile:

The code I looked at went something like this:

  struct S {
    S& operator=(const S& other) {
      if (&other != this)
        val = other.val;
      return *this;
    }
    void foo();
    int val;
  };
  void S::foo() {
    S tmp;
    tmp.val = 42;
    *this = tmp; // operator= gets inlined; we should know(?) that &tmp != this
  }

This is of course a silly example, but with GCC we get:

  movl $42, (%rdi)
  ret

whereas Clang generates:

  movl $42, -8(%rsp)
  leaq -8(%rsp), %rax
  cmpq %rdi, %rax
  je .LBB0_2
  movl $42, (%rdi)
  .LBB0_2:
  retq

which made me sad.

I was wondering why LLVM cannot optimize this code (which GCC does optimize):

  int f(int *p) { int x; return p == &x; }

it would seem that this must always return 0. (This occurs as a
self-assignment check in the code I was looking at; I was hoping we
could fold that check away.)

This is different than a self-assignment check, is it not?

blah& operator=(const blah &b) {
  if (&b == this) {}
  // ...
}

(Because it gets the pointer from the parameter and compares against a
"local" pointer?)

I just want to make sure that you're not suggesting we should optimize
away self-assignment checks in the general case.

Right, I'm not suggesting that :slight_smile:

The code I looked at went something like this:

  struct S {
    S& operator=(const S& other) {
      if (&other != this)
        val = other.val;
      return *this;
    }
    void foo();
    int val;
  };
  void S::foo() {
    S tmp;
    tmp.val = 42;
    *this = tmp; // operator= gets inlined; we should know(?) that &tmp != this
  }

This is of course a silly example, but with GCC we get:

  movl $42, (%rdi)
  ret

whereas Clang generates:

  movl $42, -8(%rsp)
  leaq -8(%rsp), %rax
  cmpq %rdi, %rax
  je .LBB0_2
  movl $42, (%rdi)
  .LBB0_2:
  retq

which made me sad.

Ah, yes, this makes perfect sense to me. Thank you for the explanation!

~Aaron

I threw together a patch which implements this (attached.) If we decide that this is actually a legal transform, I'm happy to post this for review.

In addition to the version proposed here, I also implemented a case where a trivially escaped alloca's address is not equal to any other value. I believe both are valid, but we should confirm.

Philip

AllocaEscape.diff (1.89 KB)

I tried your patch on a Clang build to see if it would fire. It
reduced the size of a bootstrap with 8500 bytes. Not huge, but it
seems like a nice improvement. And maybe it could be made more
powerful: not just checking if the address is a param or alloca, but
an address based on such values.

FWIW, consider the following:

$ cat t.c

extern “C”
int printf(const char *, …);

struct S {
int x;
};

void f(S a) {
S b;
printf(“%p %p\n”, &a, &b);
}

int main() {
S s;
f(s);
}

$ clang -O2 t.cpp -o t && ./t
0x7fff5e75a8e8 0x7fff5e75a8e8

Perhaps Dan was concerned about zero sized objects?

FWIW, consider the following:
$ cat t.c
extern "C"
int printf(const char *, ...);

struct S {
    int x;
};

void f(S a) {
    S b;
    printf("%p %p\n", &a, &b);
}

int main() {
    S s;
    f(s);
}

$ clang -O2 t.cpp -o t && ./t
0x7fff5e75a8e8 0x7fff5e75a8e8

Perhaps Dan was concerned about zero sized objects?

This case is a red herring, we CSE'd the stack allocations for 'a' and 's'
which left us with 'call i32 (i8*, ...)* @printf(i8* getelementptr inbounds
([7 x i8]* @.str, i64 0, i64 0), %struct.S* %b.i, %struct.S* %b.i)'

I tried your patch on a Clang build to see if it would fire. It
reduced the size of a bootstrap with 8500 bytes. Not huge, but it
seems like a nice improvement. And maybe it could be made more
powerful: not just checking if the address is a param or alloca, but
an address based on such values.

Yeah, I realized after posting that this should be integrated with the existing noalias handling in the pointer icmp routine. However, let's defer that until we decide if this is actually correct to implement. :slight_smile:

I believe the background for this comment is just general caution resulting from

  • vague “both represent the same address.” wording in a supported language standard, and

  • the surprising situation in which pointer equality semantics are almost entirely distinct from pointer aliasing semantics

In general, ICmp folding shouldn’t consult alias analysis or do its own alias-analysis-like things.

Dan

From: "Philip Reames via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Hans Wennborg" <hans@chromium.org>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Dan Gohman" <dan.gohman@gmail.com>
Sent: Friday, September 25, 2015 3:26:01 PM
Subject: Re: [llvm-dev] Comparing stack addresses and function args (Was: [llvm] r174131 - Add a comment explaining
an unavailable optimization)

> I tried your patch on a Clang build to see if it would fire. It
> reduced the size of a bootstrap with 8500 bytes. Not huge, but it
> seems like a nice improvement. And maybe it could be made more
> powerful: not just checking if the address is a param or alloca,
> but
> an address based on such values.
Yeah, I realized after posting that this should be integrated with
the
existing noalias handling in the pointer icmp routine. However,
let's
defer that until we decide if this is actually correct to implement.
:slight_smile:

Agreed (in your patch, you'd need to call stripPointerCasts() and GetUnderlyingObjects anyway). This should go in computePointerICmp in lib/Analysis/InstructionSimplify.cpp near where IsAllocDisjoint is defined. The optimization seems valid to be, in that we generally prohibit 'getting lucky' with address comparisons; if you couldn't know the address, then guessing doesn't count.

-Hal

From: "Philip Reames via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Hans Wennborg" <hans@chromium.org>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Dan Gohman" <dan.gohman@gmail.com>
Sent: Friday, September 25, 2015 3:26:01 PM
Subject: Re: [llvm-dev] Comparing stack addresses and function args (Was: [llvm] r174131 - Add a comment explaining
an unavailable optimization)

> I tried your patch on a Clang build to see if it would fire. It
> reduced the size of a bootstrap with 8500 bytes. Not huge, but it
> seems like a nice improvement. And maybe it could be made more
> powerful: not just checking if the address is a param or alloca,
> but
> an address based on such values.
Yeah, I realized after posting that this should be integrated with
the
existing noalias handling in the pointer icmp routine. However,
let's
defer that until we decide if this is actually correct to implement.
:slight_smile:

Agreed (in your patch, you'd need to call stripPointerCasts() and GetUnderlyingObjects anyway). This should go in computePointerICmp in lib/Analysis/InstructionSimplify.cpp near where IsAllocDisjoint is defined. The optimization seems valid to be, in that we generally prohibit 'getting lucky' with address comparisons; if you couldn't know the address, then guessing doesn't count.

I do think we'll need to be a little bit careful here, though. I think
we can optimize this by acting as-if no one could know the address of
a stack variable, but then we need to be able to do so consistently.
For example, in this code

  int f(int *p) {
    int x;
    bool a = (p == &x);
    bool b = equals(p, &x);
  }

a and b needs to end up with the same value.

From: "Philip Reames via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Hans Wennborg" <hans@chromium.org>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>, "Dan Gohman" <dan.gohman@gmail.com>
Sent: Friday, September 25, 2015 3:26:01 PM
Subject: Re: [llvm-dev] Comparing stack addresses and function args (Was: [llvm] r174131 - Add a comment explaining
an unavailable optimization)

> I tried your patch on a Clang build to see if it would fire. It
> reduced the size of a bootstrap with 8500 bytes. Not huge, but it
> seems like a nice improvement. And maybe it could be made more
> powerful: not just checking if the address is a param or alloca,
> but
> an address based on such values.
Yeah, I realized after posting that this should be integrated with
the
existing noalias handling in the pointer icmp routine. However,
let's
defer that until we decide if this is actually correct to implement.
:slight_smile:

Agreed (in your patch, you'd need to call stripPointerCasts() and GetUnderlyingObjects anyway). This should go in computePointerICmp in lib/Analysis/InstructionSimplify.cpp near where IsAllocDisjoint is defined. The optimization seems valid to be, in that we generally prohibit 'getting lucky' with address comparisons; if you couldn't know the address, then guessing doesn't count.

I do think we'll need to be a little bit careful here, though. I think
we can optimize this by acting as-if no one could know the address of
a stack variable, but then we need to be able to do so consistently.
For example, in this code

  int f(int *p) {
    int x;
    bool a = (p == &x);
    bool b = equals(p, &x);
  }

a and b needs to end up with the same value.

I've sent out a patch for this: ⚙ D13358 InstCombine: Fold comparisons between unguessable allocas and other pointers