Semantics of an Inbounds GetElementPtr

Hi - I've got a question about what optimizations the "inbounds"
keyword of "getelementptr" allows you to use. In the code below, %five
is loaded from and inbounds offset of either a null pointer or %mem:

target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"

define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  br label %done
done:
  %after.phi = phi i8* [ %mem, %mem.is.valid ], [ null, %0 ]
  %2 = getelementptr inbounds i8, i8* %after.phi, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

According to the documentation, "the result value of the getelementptr
is a poison value if the base pointer is not an in bounds address of
an allocated object", so does this mean it's valid to optimise the
function to:

define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
  ret i8 5
done:
  ret i8 undef
}

Or even this:

define i8 @func(i8* %mem) {
  ret i8 5
}

...? This is a reduced example of something I saw while running "opt"
on a test case that missed a null check. Thanks -

Nick

Hi - I've got a question about what optimizations the "inbounds"
keyword of "getelementptr" allows you to use. In the code below, %five
is loaded from and inbounds offset of either a null pointer or %mem:

target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"

define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  br label %done
done:
  %after.phi = phi i8* [ %mem, %mem.is.valid ], [ null, %0 ]
  %2 = getelementptr inbounds i8, i8* %after.phi, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

According to the documentation, "the result value of the getelementptr
is a poison value if the base pointer is not an in bounds address of
an allocated object", so does this mean it's valid to optimise the
function to:

define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
  ret i8 5
done:
  ret i8 undef
}

Or even this:

define i8 @func(i8* %mem) {
  ret i8 5
}

No, neither are semantics preserving because there exists no store to
'%mem'.

Let's start by hoisting both sides of the branch into the entry block.
This will leave you with:
define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  %after.phi = select i1 %test, i8* null, %mem
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  %2 = getelementptr inbounds i8, i8* %after.phi, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

The SSA value '%after.phi' can be trivially simplified to '%mem', this
leaves us with:
define i8 @func(i8* %mem) {
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  %2 = getelementptr inbounds i8, i8* %mem, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

The SSA values '%1' and '%2' are equivalent, this leaves us with:
define i8 @func(i8* %mem) {
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  ret i8 5
}

I do not believe further simplification is possible.

Hi - I've got a question about what optimizations the "inbounds"
keyword of "getelementptr" allows you to use. In the code below, %five
is loaded from and inbounds offset of either a null pointer or %mem:

target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"

define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  br label %done
done:
  %after.phi = phi i8* [ %mem, %mem.is.valid ], [ null, %0 ]
  %2 = getelementptr inbounds i8, i8* %after.phi, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

According to the documentation, "the result value of the getelementptr
is a poison value if the base pointer is not an in bounds address of
an allocated object", so does this mean it's valid to optimise the
function to:

define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
  ret i8 5
done:
  ret i8 undef
}

Or even this:

define i8 @func(i8* %mem) {
  ret i8 5
}

No, neither are semantics preserving because there exists no store to
'%mem'.

Let's start by hoisting both sides of the branch into the entry block.
This will leave you with:
define i8 @func(i8* %mem) {
  %test = icmp eq i8* %mem, null
  %after.phi = select i1 %test, i8* null, %mem
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  %2 = getelementptr inbounds i8, i8* %after.phi, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

The SSA value '%after.phi' can be trivially simplified to '%mem', this
leaves us with:
define i8 @func(i8* %mem) {
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  %2 = getelementptr inbounds i8, i8* %mem, i64 4
  %five = load i8, i8* %2, align 4
  ret i8 %five
}

The SSA values '%1' and '%2' are equivalent, this leaves us with:
define i8 @func(i8* %mem) {
  %1 = getelementptr inbounds i8, i8* %mem, i64 4
  store i8 5, i8* %1, align 4
  ret i8 5
}

I do not believe further simplification is possible.

One final point that I forgot to mention: the transformations that I
performed did not rely on the presence of 'inbounds'.

Thanks - that makes sense. It's interesting that at -O3 the optimizer
can't reduce the below though - I'll dig into it a bit and see if I
can make a patch that fixes it:

target datalayout = "e-i64:64-f80:128-n8:16:32:64-S128"

%struct.my_s = type { i32, i32, [0 x i8*] }

; Function Attrs: noreturn
declare void @__assert_rtn()

define void @func(i8* %mem) {
  %1 = icmp eq i8* %mem, null
  br i1 %1, label %check.zero, label %stash.zero

stash.zero:
  %2 = bitcast i8* %mem to %struct.my_s*
  %3 = getelementptr inbounds i8, i8* %mem, i64 4
  %4 = bitcast i8* %3 to i32*
  store i32 0, i32* %4, align 4
  br label %check.zero

check.zero:
  %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
  %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
  %6 = load i32, i32* %5, align 4
  %7 = icmp eq i32 %6, 0
  br i1 %7, label %success, label %check.first.array.element

check.first.array.element:
  %8 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64
0, i32 2, i64 0
  %9 = load i8*, i8** %8, align 1
  %10 = icmp eq i8* %9, null
  br i1 %10, label %success, label %abort

abort:
  tail call void @__assert_rtn()
  unreachable

success:
  ret void
}

Thanks -

Nick

Thanks - that makes sense. It's interesting that at -O3 the optimizer
can't reduce the below though - I'll dig into it a bit and see if I
can make a patch that fixes it:

I'm unsure what you expect to happen below. It's not quite the same testcase.

GVN will PRE the loads, so you end up with one load.
But i can't see how you expect it to determine anything else.

Can you walk me through the below testcase and epxlain what you expect
to ahppen?

If so, i can probably make it happen for you :slight_smile:

It's not quite the same testcase.

Yes - it's an extension of the first test case that I'd expect to be
optimised out in the same way as my earlier example (i.e., store a
value, read it back and branch on it). If you miss out the
"check.first.array.element" block (changing the branch that jumps to
it to go to the "abort" label instead) like this:

define void @func2(i8* %mem) {
  %1 = icmp eq i8* %mem, null
  br i1 %1, label %check.zero, label %stash.zero

stash.zero:
  %2 = bitcast i8* %mem to %struct.my_s*
  %3 = getelementptr inbounds i8, i8* %mem, i64 4
  %4 = bitcast i8* %3 to i32*
  store i32 0, i32* %4, align 4
  br label %check.zero

check.zero:
  %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
  %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
  %6 = load i32, i32* %5, align 4
  %7 = icmp eq i32 %6, 0
  br i1 %7, label %success, label %abort

abort:
  tail call void @__assert_rtn()
  unreachable

success:
  ret void
}

Then opt -O3 does optimize it down to:

; Function Attrs: nounwind
define void @func2(i8* nocapture %mem) #0 {
stash.zero:
  %0 = getelementptr inbounds i8, i8* %mem, i64 4
  %1 = bitcast i8* %0 to i32*
  store i32 0, i32* %1, align 4
  ret void
}

...so something about the "check.first.array.element" block confuses
whatever analysis opt used to determine %6 was zero in func2.

Can you walk me through the below testcase and epxlain what you expect

to ahppen?
Definitely:

%struct.my_s = type { i32, i32, [0 x i8*] }

We only read and write to the first i32, although a code branch never
taken will read first element of the the variable length array.

; Function Attrs: noreturn
declare void @__assert_rtn()

basically any noreturn function

define void @func(i8* %mem) {
%1 = icmp eq i8* %mem, null
br i1 %1, label %check.zero, label %stash.zero

Checks the input pointer to see if it's null - the C code this is
originally derived from didn't check this return value from malloc.

stash.zero:
%2 = bitcast i8* %mem to %struct.my_s*
%3 = getelementptr inbounds i8, i8* %mem, i64 4

get a pointer to the 4th byte of memory, i.e. the second i32 member of
the struct

%4 = bitcast i8* %3 to i32*
store i32 0, i32* %4, align 4

and put a zero in it - nb. this branch is only taken when %mem is not null

br label %check.zero

check.zero:
%.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
%5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1

get a pointer to the second element of the struct a different way, but
because the control flow from both exits of block %0 end up here the
base pointer is actually always %mem, but we may know whether it's
null or not

%6 = load i32, i32* %5, align 4

the C code loads the value from the second i32 of the struct,
regardless of whether the pointer's null or not. Opt correctly assumes
%5 therefore can't be null.

%7 = icmp eq i32 %6, 0

compare the value of the second element to zero; we stored zero here
(%4) in the previous block (as we can't have taken the null path from
%0 and still got this far). Opt sometimes seems to deduce this (ie %7
== i1 1) as in the func2 example above

br i1 %7, label %success, label %check.first.array.element

...so we'll always go to %success. However, replacing this branch with
an unconditional jump - example func2 above - does triggger the
optimisation.

check.first.array.element:
%8 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64

0, i32 2, i64 0

%9 = load i8*, i8** %8, align 1
%10 = icmp eq i8* %9, null
br i1 %10, label %success, label %abort

I don't think it should matter what this block does

abort:
tail call void @__assert_rtn()
unreachable
success:
ret void
}

I hope you can do something with this. Thanks -

Nick

It's not quite the same testcase.

Yes - it's an extension of the first test case that I'd expect to be
optimised out in the same way as my earlier example (i.e., store a
value, read it back and branch on it). If you miss out the
"check.first.array.element" block (changing the branch that jumps to
it to go to the "abort" label instead) like this:

define void @func2(i8* %mem) {
  %1 = icmp eq i8* %mem, null
  br i1 %1, label %check.zero, label %stash.zero

stash.zero:
  %2 = bitcast i8* %mem to %struct.my_s*
  %3 = getelementptr inbounds i8, i8* %mem, i64 4
  %4 = bitcast i8* %3 to i32*
  store i32 0, i32* %4, align 4
  br label %check.zero

check.zero:
  %.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
  %5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1
  %6 = load i32, i32* %5, align 4
  %7 = icmp eq i32 %6, 0
  br i1 %7, label %success, label %abort

abort:
  tail call void @__assert_rtn()
  unreachable

success:
  ret void
}

Then opt -O3 does optimize it down to:

; Function Attrs: nounwind
define void @func2(i8* nocapture %mem) #0 {
stash.zero:
  %0 = getelementptr inbounds i8, i8* %mem, i64 4
  %1 = bitcast i8* %0 to i32*
  store i32 0, i32* %1, align 4
  ret void
}

...so something about the "check.first.array.element" block confuses
whatever analysis opt used to determine %6 was zero in func2.

Can you walk me through the below testcase and epxlain what you expect

to ahppen?
Definitely:

%struct.my_s = type { i32, i32, [0 x i8*] }

We only read and write to the first i32, although a code branch never
taken will read first element of the the variable length array.

; Function Attrs: noreturn
declare void @__assert_rtn()

basically any noreturn function

define void @func(i8* %mem) {
%1 = icmp eq i8* %mem, null
br i1 %1, label %check.zero, label %stash.zero

Checks the input pointer to see if it's null - the C code this is
originally derived from didn't check this return value from malloc.

stash.zero:
%2 = bitcast i8* %mem to %struct.my_s*
%3 = getelementptr inbounds i8, i8* %mem, i64 4

get a pointer to the 4th byte of memory, i.e. the second i32 member of
the struct

%4 = bitcast i8* %3 to i32*
store i32 0, i32* %4, align 4

and put a zero in it - nb. this branch is only taken when %mem is not null

br label %check.zero

check.zero:
%.0.i = phi %struct.my_s* [ %2, %stash.zero ], [ null, %0 ]
%5 = getelementptr inbounds %struct.my_s, %struct.my_s* %.0.i, i64 0, i32 1

get a pointer to the second element of the struct a different way, but
because the control flow from both exits of block %0 end up here the
base pointer is actually always %mem, but we may know whether it's
null or not

Sure

%6 = load i32, i32* %5, align 4

the C code loads the value from the second i32 of the struct,
regardless of whether the pointer's null or not. Opt correctly assumes
%5 therefore can't be null.

GVN, at least, does not in fact, determine the equality that %5 is not null.

This is because it only propagates equalities from edge conditions,
not from inbounds conditions.

This is the cause of your next failure (at least, insofar as GVN's
ability to optimize it).

%7 = icmp eq i32 %6, 0

compare the value of the second element to zero; we stored zero here
(%4) in the previous block (as we can't have taken the null path from
%0 and still got this far).

It does not understand the null path is unreachable.

Opt sometimes seems to deduce this (ie %7
== i1 1) as in the func2 example above

If it knew the GEP's were the same, and knew that it was not null, it
would be able to determine the value of this.

So, the fix depends on how aggressive you want to be.

First, propagateEquality in GVN really is not meant to understand
non-edge conditions. You could teach it that anything that compares
it against null are false after an inbounds:

IE

  // If the GEP is inbounds, and this is address space 0, we know the pointer is
  // not null.
  if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I))
    if (GEP->isInBounds() && GEP->getAddressSpace() == 0) {
      // Need a propagateInequality that replaces any equality checks against
      // null with false
    }

However, you want to go further here:
What you really want to do in this case is teach GVN that any inbounds
gep that post-dominates a comparison of that pointer against null,
must result in that pointer comparison coming up with non-null.

IE In the above, you would follow through all getPointerOperand->uses,
and if they are icmps against null and post-dominated by the inbounds
GEP block, replace them with false :slight_smile:

You need to use phitransaddr to translate the pointers backwards, to
get that %2 is%mem. In this case, it will work, and is easy, because
the predecessor is the thing you post-dominate. The path walking to
get a correct translation is tricky in some cases.

IE
// If the GEP is inbounds, and this is address space 0, we know the pointer is
  // not null.
  if (GetElementPtrInst *GEP = dyn_cast<GetElementPtrInst>(I))
    if (GEP->isInBounds() && GEP->getAddressSpace() == 0) {
      for (auto Use : GEP->getPointerOperand->uses())
      {
       PHITransAddr AddressTranslator(...)
       if (!AddressTranslator.PHITranslateValue(GEP->getPointerOperand,
predecessor))
         for (auto User : AddressTranslator.getAddr())
          // If this is an icmp eq against null, declare victory and
replace the result with false.
   }

This is easier in NewGVN (and not N^2 like the above).

You also could make a pass that is identical to SCCP, but walks in the
reverse direction (IE operands instead of uses), and walks up the
post-dom tree.

Propagate inbounds assumptions upwards.

That would be O(n)

(because by the definition of post domination, every path from the
icmp to the exit block must go through the inbounds check, which would
result in the gep having an undef value. If it has an undef value, we
can say it's not-null anyway :P)

Thanks - that was very interesting. I've had a go at implementing it
here - http://reviews.llvm.org/D9634 - as an improvement to the SCCP
pass. It seems to work (see the test cases), but I had to make a few
compromises (making PtrUseVisitor look through phi nodes, only working
on CmpInst.isEquality() instructions, and making the SCCP pass
dependent on the PostDominatorTree pass). I think it's O(n);
PtrUseVisitor scans each Use at most once, and PostDominatorTree seems
linear in the number of predecessor blocks. It was easier to implement
by starting at a ptr-vs-null-checking CmpInst and analysing the code
forwards rather than by starting at an inbounds GEP instruction and
analysing the code backwards. What do you think?

Thanks -

Nick

+Philip
Looks pretty good. As i just said on the review, it would be really
nice to know how often this triggers.

Philip may be interested too, if they don't already have a null check
removal pass of their own :slight_smile:

I think this got settled on the review, but inferring anything about reachability from just a gep is wrong. You can (and we do), infer that a *dereference* of a null pointer implies the path is unreachable, but merely doing address computation with a null pointer is entirely legitimate and common.

Going back to the original example:
define i8 @func(i8* %mem) {
   %test = icmp eq i8* %mem, null
   br i1 %test, label %done, label %mem.is.valid
mem.is.valid:
   %1 = getelementptr inbounds i8, i8* %mem, i64 4
   store i8 5, i8* %1, align 4
   br label %done
done:
   %after.phi = phi i8* [ %mem, %mem.is.valid ], [ null, %0 ]
   %2 = getelementptr inbounds i8, i8* %after.phi, i64 4
   %five = load i8, i8* %2, align 4
   ret i8 %five
}

From "%five = load i8, i8* %2, align 4", we can infer that the edge entry->done is never taken. Doing so would result in the unconditional dereference of a null pointer. This would give us:

define i8 @func(i8* %mem) {
   br label %mem.is.valid
mem.is.valid:
   %1 = getelementptr inbounds i8, i8* %mem, i64 4
   store i8 5, i8* %1, align 4
   br label %done
done:
   %2 = getelementptr inbounds i8, i8* %mem, i64 4
   %five = load i8, i8* %2, align 4
   ret i8 %five
}

After store-load forwarding and combining basic blocks, we'd have:
define i8 @func(i8* %mem) {
   %1 = getelementptr inbounds i8, i8* %mem, i64 4
   store i8 5, i8* %1, align 4
   ret i8 5
}

That's as far as we can optimize that.

Well, not quite. We can infer that the argument must be non-null, which can propagate the facts to the caller and allow simplifications there as well. (I don't believe we currently do that, but we could.)

p.s. I haven't been really following this thread. If I've miss something by jumping in late, feel free to correct me.

Philip