Can simplifycfg kill llvm.lifetime intrinsics?

Hi!

I’m working on ASan option that uses llvm.lifetime intrinsics to detect use-after-scope bugs. In short, the idea is to
insert calls into ASan runtime that would mark the memory as “addressable” or “unaddressable”.
I see the following problem with the following “trivial” basic block:

for.body.lr.ph.i: ; preds = %for.body.i310
call void @llvm.lifetime.start(i64 24, i8* %174)
call void @llvm.lifetime.start(i64 4, i8* %175)
call void @llvm.lifetime.start(i64 24, i8* %176)
br label %for.body.i318

r134182 by Rafael explicitly allows simplifycfg pass to merge this block into its successor, and drop “side-effect free” lifetime.start
intrinsics. However, llvm.lifetime.end intrinsics for the same memory are preserved, which is not only weird, but triggers ASan false positives:

  1. function goes into for-loop with local variable declared in it
  2. llvm.lifetime.end() at the end of the loop allows ASan to mark this memory as unaddressable
  3. at the next loop iteration access to this memory will be reported as error.

Shouldn’t simplifycfg somehow preserve / move lifetime intrinsics in its transformations?

This looks like a bug in simplifycfg. We should preserve lifetime intrinsics due to the reasons I described.
The code in //lib/Transforms/Utils/Local.cpp:

if (Succ->getSinglePredecessor()) {
// BB is the only predecessor of Succ, so Succ will end up with exactly
// the same predecessors BB had.

// Copy over any phi, debug or lifetime instruction.
BB->getTerminator()->eraseFromParent();
Succ->getInstList().splice(Succ->getFirstNonPHI(), BB->getInstList());
}

does this only when successor of basic block being removed has a single predecessor.
This is not the case even for simple test in /test/Transforms/SimplifyCFG/lifetime.ll.

That said, I think for now we should just apply the patch attached to this mail. Please take a look.

zdiff.lifetime-simplifycfg (1.2 KB)

This looks like a bug in simplifycfg. We should preserve lifetime intrinsics
due to the reasons I described.
The code in //lib/Transforms/Utils/Local.cpp:

  if (Succ->getSinglePredecessor()) {
    // BB is the only predecessor of Succ, so Succ will end up with exactly
    // the same predecessors BB had.

    // Copy over any phi, debug or lifetime instruction.
    BB->getTerminator()->eraseFromParent();
    Succ->getInstList().splice(Succ->getFirstNonPHI(), BB->getInstList());
  }

does this only when successor of basic block being removed has a single
predecessor.
This is not the case even for simple test in
/test/Transforms/SimplifyCFG/lifetime.ll.

That said, I think for now we should just apply the patch attached to this
mail. Please take a look.

Sorry I missed this email the first time.

From http://llvm.org/docs/LangRef.html it looks to be valid to delete

only one of the functions. The semantics being

* only llvm.lifetime.start: Any loads before the call can be replaced
with undef. Since there is no end, all stores must be kept.
* only llvm.lifetime.end: Any stores after this can be deleted. Since
there is no start, all loads must be kept.

I guess with asan the "replace with undef/ remove store" become
"diagnose those loads/stores".

I am pretty sure we should not avoid optimizing because there is a
llvm.lifetime.* in a BB. If we decide that it is invalid to delete one
but not the other, we should change SimplifyCFG to delete both, not
avoid merging the BBs.

Nick, is the above what you had in mind when you added
llvm.lifetime.*? We should clarify the language ref even if it is
valid to drop only one of the calls.

Cheers,
Rafael

> This looks like a bug in simplifycfg. We should preserve lifetime
intrinsics
> due to the reasons I described.
> The code in //lib/Transforms/Utils/Local.cpp:
>
> if (Succ->getSinglePredecessor()) {
> // BB is the only predecessor of Succ, so Succ will end up with
exactly
> // the same predecessors BB had.
>
> // Copy over any phi, debug or lifetime instruction.
> BB->getTerminator()->eraseFromParent();
> Succ->getInstList().splice(Succ->getFirstNonPHI(),
BB->getInstList());
> }
>
> does this only when successor of basic block being removed has a single
> predecessor.
> This is not the case even for simple test in
> /test/Transforms/SimplifyCFG/lifetime.ll.
>
> That said, I think for now we should just apply the patch attached to
this
> mail. Please take a look.

Sorry I missed this email the first time.

From http://llvm.org/docs/LangRef.html it looks to be valid to delete
only one of the functions. The semantics being

* only llvm.lifetime.start: Any loads before the call can be replaced
with undef. Since there is no end, all stores must be kept.
* only llvm.lifetime.end: Any stores after this can be deleted. Since
there is no start, all loads must be kept.

Ok, suppose you have the following code:
BB1:
  llvm.lifetime.start(%a)
  store to %a
  llvm.lifetime.end(%a)
  br %BB2

BB2:
  <some code>
  br %BB1

If you remove the first "llvm.lifetime.start", then when you enter
%BB1 for the second time, your "store to %a" can be considered invalid,
as you've already called llvm.lifetime.end for this variable.

Ok, suppose you have the following code:
BB1:
  llvm.lifetime.start(%a)
  store to %a
  llvm.lifetime.end(%a)
  br %BB2

BB2:
  <some code>
  br %BB1

If you remove the first "llvm.lifetime.start", then when you enter
%BB1 for the second time, your "store to %a" can be considered invalid,
as you've already called llvm.lifetime.end for this variable.

Oh, I was reading "precedes/following" as having static (dominance)
meaning. That is, in the above example you could not delete the store
since it is not true that
llvm.lifetime.end dominates it.

Nick, is this what you had in mind? If not, then we must delete a
matching llvm.lifetime.end, but it is not clear how we define
"matching". In the following code some executions will hit the first
llvm.lifetime.end and others will hit the second one.

BB0:
...
llvm.lifetime.start(%a)
...
br i1 %foo, label %BB1, label %BB2

BB1:
...
llvm.lifetime.end(%a)
...
br label %bar

BB2:
...
llvm.lifetime.end(%a)
...
br label %zed

Cheers,
Rafael

The llvm.lifetime.end(%a) in your example is invalid, according to the lang ref:

"This intrinsic indicates that after this point in the code, the value of the memory pointed to by ptr is dead. This means that it is known to never be used and has an undefined value. Any stores into the memory object following this intrinsic may be removed as dead."

If you can re-enter the block with a still-defined value of %a, then it's not "known to never be used".

-Krzysztof

Correction: it's not invalid, but if the value that %a points to is indeed never used after the store (as the lifetime.end intrinsic would indicate), then the store was dead from the beginning. Deleting the lifetime.start(%a) wouldn't have any ill effect in that case.

I'm not sure if that's what the author intended though.

-Krzysztof

Also, it's not clear to me what "the value of the memory pointed to by %a" means if %a comes from a phi node. I guess my interpretation may be wrong, but I don't feel comfortable with any interpretation that I can think of at the moment.

-Krzysztof

Ok, suppose you have the following code:
BB1:
   llvm.lifetime.start(%a)
   store to %a
   llvm.lifetime.end(%a)
   br %BB2

BB2:
   <some code>
   br %BB1

If you remove the first "llvm.lifetime.start", then when you enter
%BB1 for the second time, your "store to %a" can be considered invalid,
as you've already called llvm.lifetime.end for this variable.

Oh, I was reading "precedes/following" as having static (dominance)
meaning. That is, in the above example you could not delete the store
since it is not true that
llvm.lifetime.end dominates it.

Nick, is this what you had in mind? If not, then we must delete a
matching llvm.lifetime.end, but it is not clear how we define
"matching". In the following code some executions will hit the first
llvm.lifetime.end and others will hit the second one.

Yes, I meant at runtime.

BB0:
...
llvm.lifetime.start(%a)
...
br i1 %foo, label %BB1, label %BB2

BB1:
...
llvm.lifetime.end(%a)
...
br label %bar

BB2:
...
llvm.lifetime.end(%a)
...
br label %zed

I don't like this because I fear proliferation of all of these intrinsics everywhere. They were really intended for smaller things -- an interpreter with a stack that it wants to 'pop' off of, so that piece of memory is no longer meaningfully defined, just as if you freed it.

Regardless, yes, this behaves the way it looks.

Nick

The intention is that if you have a loop, you could mark the memory lifetime.end at the end of the loop, and operations on the next run of the loop would see 'undef' there -- at least until you call lifetime.start.

Of course you can re-start memory that's been ended... if that isn't clear in the langref, please fix.

Nick

There are several things that aren't clear (at least when I read it).

Case 1:
block:
   %a = phi i32* ..., [ %a.next, %block ]
   llvm.lifetime.begin(4, %a)
   ...
   llvm.lifetime.end(4, %a) <-- (1)
   %a.next = getelementptr i32* %a, 1
   br %block

Does (1) mean the end of lifetime for the 4 bytes that %a happens to be pointing to in the current iteration? My guess is "yes".

Case 2:
   %a = i32* ...
   llvm.lifetime.begin(4, %a)
   ...
   llvm.lifetime.end(4, %a) <-- (2)
   %b = load i32 %ptr <-- (3)

Is (2) a guarantee that *%ptr and *%a are not aliased?
If (2) and (3) were in the opposite order, would it be illegal to move (3) past (2)?

-Krzysztof

Of course you can re-start memory that's been ended... if that isn't
clear in the langref, please fix.

There are several things that aren't clear (at least when I read it).

Case 1:
block:
%a = phi i32* ..., [ %a.next, %block ]
llvm.lifetime.begin(4, %a)
...
llvm.lifetime.end(4, %a) <-- (1)
%a.next = getelementptr i32* %a, 1
br %block

Does (1) mean the end of lifetime for the 4 bytes that %a happens to be
pointing to in the current iteration? My guess is "yes".

Yes. It ends the lifetime of the memory that %a happens to be pointing to, dynamically, at the moment that instruction is executed.

Case 2:
%a = i32* ...
llvm.lifetime.begin(4, %a)
...
llvm.lifetime.end(4, %a) <-- (2)
%b = load i32 %ptr <-- (3)

Is (2) a guarantee that *%ptr and *%a are not aliased?

Nope. Also, (2)+(3) together don't imply that %ptr and %a are NoAlias since loading after lifetime ends gives you undef, not undefined behaviour.

If (2) and (3) were in the opposite order, would it be illegal to move
(3) past (2)?

Not necessarily. If you can show that they're NoAlias, you're free to reorder them. If though some stroke of crazy, you can show that the program has the same behaviour when %b is replaced with undef, then you're free to move them. Otherwise, yes, it would be illegal to move (3) before (2).

Nick

Oh, I was reading "precedes/following" as having static (dominance)
meaning. That is, in the above example you could not delete the store
since it is not true that
llvm.lifetime.end dominates it.

Nick, is this what you had in mind? If not, then we must delete a
matching llvm.lifetime.end, but it is not clear how we define
"matching". In the following code some executions will hit the first
llvm.lifetime.end and others will hit the second one.

Yes, I meant at runtime.

OK, thanks. What is the meaning in the case of aliases? Should this work:

llvm.lifetime.start(%x)

...

llvm.lifetime.end(%y which alias %x sometimes)

If so, I guess that in order to delete a llvm.lifetime.start we have
to delete all llvm.lifetime.end that are "directly" reachable from it
and take an argument that may alias the one passed to
llvm.lifetime.start. Is that it? What about calling
llvm.lifetime.start in one function and llvm.lifetime.end in another?

It seems that deleting llvm.lifetime.start is impossible in general,
but it is safe to add one if at least one already exists, is that the
case?

On the other hand, removing llvm.lifetime.end should always be safe, right?

Cheers,
Rafael

Oh, I was reading "precedes/following" as having static (dominance)
meaning. That is, in the above example you could not delete the store
since it is not true that
llvm.lifetime.end dominates it.

Nick, is this what you had in mind? If not, then we must delete a
matching llvm.lifetime.end, but it is not clear how we define
"matching". In the following code some executions will hit the first
llvm.lifetime.end and others will hit the second one.

Yes, I meant at runtime.

OK, thanks. What is the meaning in the case of aliases? Should this work:

llvm.lifetime.start(%x)

...

llvm.lifetime.end(%y which alias %x sometimes)

That's only a matching pair iff %x == %y at run time.

(If I wanted to require statically analyzable pairings, I would've made start return type {} and have the end intrinsic take that as an argument, like I did for the invariant intrinsics.)

You can almost entirely model lifetime.start and lifetime.end as being a store of undef to the address. However, they're the tiniest bit stronger. With a store of undef, you can delete stores that precede (with no intervening load) and loads that follow (with no intervening store). On top of that, a start lets you delete loads that precede, and an end lets you delete stores that follow.

If so, I guess that in order to delete a llvm.lifetime.start we have
to delete all llvm.lifetime.end that are "directly" reachable from it
and take an argument that may alias the one passed to
llvm.lifetime.start. Is that it? What about calling
llvm.lifetime.start in one function and llvm.lifetime.end in another?

It seems that deleting llvm.lifetime.start is impossible in general,
but it is safe to add one if at least one already exists, is that the
case?

On the other hand, removing llvm.lifetime.end should always be safe, right?

I really only invented them for a specific case, so I haven't thought through all the cases where it may or may not be legal to add or delete them. Here goes!

Suppose you have four lifetime operations on the same address in memory, with loads and stores all around them:

   start1--end1 .. start2--end2

If you remove start1 then you have a bare pointer, the memory came from somewhere and you lose the optimization that loads before start1 become undef, but you don't miscompile.

If you remove end1 then the code between start1 and start2 is in trouble. We would miscompile start1+store+load+start2 by folding the load to undef.

If you remove start2, we miscompile again. Accesses between start2 and end2 could be transformed into loads of undef and dead stores, and deleted.

Removing end2 only means that you get to assume the memory is still live since you haven't been told otherwise.

So ultimately the problem is with removing either part of the end->start transition. We need to make sure we don't remove one of those.

This means that the optimizer can't consider lifetime intrinsics to be no-ops unless it can prove it's looking at the first start or last end of that memory address. That's much worse than I thought it was when I first added these intrinsics. Sorry.

Nick

>> Oh, I was reading "precedes/following" as having static (dominance)
>> meaning. That is, in the above example you could not delete the store
>> since it is not true that
>> llvm.lifetime.end dominates it.
>>
>> Nick, is this what you had in mind? If not, then we must delete a
>> matching llvm.lifetime.end, but it is not clear how we define
>> "matching". In the following code some executions will hit the first
>> llvm.lifetime.end and others will hit the second one.
>
>
> Yes, I meant at runtime.
>

OK, thanks. What is the meaning in the case of aliases? Should this work:

llvm.lifetime.start(%x)

...

llvm.lifetime.end(%y which alias %x sometimes)

I'm not sure if "sometimes" actually happens in the wild.
In recent patches to lifetime handling in ASan
I assumed that argument in llvm.lifetime intrinsic
can come from bitcasts or phi nodes, but all its possible values should
originate from the same alloca instruction.

If so, I guess that in order to delete a llvm.lifetime.start we have
to delete all llvm.lifetime.end that are "directly" reachable from it
and take an argument that may alias the one passed to
llvm.lifetime.start. Is that it? What about calling
llvm.lifetime.start in one function and llvm.lifetime.end in another?

I don't see how this can happen, assuming that arguments of llvm.lifetime
can only come from allocas.

Suppose you have four lifetime operations on the same address in memory,
with loads and stores all around them:

  start1--end1 .. start2--end2

If you remove start1 then you have a bare pointer, the memory came from
somewhere and you lose the optimization that loads before start1 become
undef, but you don't miscompile.

This is assuming no looping after end1 or end2, right?

If you remove end1 then the code between start1 and start2 is in trouble. We
would miscompile start1+store+load+start2 by folding the load to undef.

OK, my understanding was different. I was reading that a store before
all starts was invalid. BTW, can't we handle loads and stores
uniformly? That is, we can model them as

* We ask an oracle if a memory object will be used as an argument to
llvm.lifetime.start or llvm.lifetime.end. If it is, then the address
has an extra valid bit associated with it.

* At the creation of the object (stack or heap allocation) the bit is false.
* llvm.lifetime.start sets the bit to true. Doing it more than once is a nop.
* llvm.lifetime.end sets the bit to false.

With these rules, we can implement:

* Asan doesn't have an oracle, but can start tracking the bit when it
first gets to a llvm.lifetime.*. It can flag as invalid any operation
that touches memory with a false bit.
* Removing a llvm.lifetime.start is impossible in general, as we don't
know if some function will access that address.
* Removing a llvm.lifetime.end is always safe. It just extends the
life of the object, maybe until it is freed or the function that
called alloca returns.
* Adding a llvm.lifetime.start is always safe. It just extends the
life of the object.

If you remove start2, we miscompile again. Accesses between start2 and end2
could be transformed into loads of undef and dead stores, and deleted.

Agreed.

Removing end2 only means that you get to assume the memory is still live
since you haven't been told otherwise.

Agreed.

So ultimately the problem is with removing either part of the end->start
transition. We need to make sure we don't remove one of those.

This means that the optimizer can't consider lifetime intrinsics to be
no-ops unless it can prove it's looking at the first start or last end of
that memory address. That's much worse than I thought it was when I first
added these intrinsics. Sorry.

What do you think of the semantics I proposed above? I think they
still model what we want, but allow the optimizer to do any
optimizations it would do without them, as it can just add
llvm.lifetime.start and drop llvm.lifetime.end as needed.

Nick

Cheers,
Rafael

I don't see how this can happen, assuming that arguments of llvm.lifetime
can only come from allocas.

I think the intention was that it could also be used for
constructor/destructor calls on heap memory.

Cheers,
Rafael

It could happen after a cold section of a function was outlined, for example. Informational intrinsics should not prohibit optimizations, and barring provably wrong cases, we need to assume that an optimization (possibly a future one) can transform code in a way that is not predictable up front.

-Krzysztof

Suppose you have four lifetime operations on the same address in memory,
with loads and stores all around them:

   start1--end1 .. start2--end2

If you remove start1 then you have a bare pointer, the memory came from
somewhere and you lose the optimization that loads before start1 become
undef, but you don't miscompile.

This is assuming no looping after end1 or end2, right?

Right. There's no reason start2 and end2 couldn't be the exact same Instruction*'s as start1 and end1 here, but I'm starting with an example where those are the only starts and ends that run.

If you remove end1 then the code between start1 and start2 is in trouble. We
would miscompile start1+store+load+start2 by folding the load to undef.

OK, my understanding was different. I was reading that a store before
all starts was invalid. BTW, can't we handle loads and stores
uniformly? That is, we can model them as

* We ask an oracle if a memory object will be used as an argument to
llvm.lifetime.start or llvm.lifetime.end. If it is, then the address
has an extra valid bit associated with it.

* At the creation of the object (stack or heap allocation) the bit is false.
* llvm.lifetime.start sets the bit to true. Doing it more than once is a nop.
* llvm.lifetime.end sets the bit to false.

I'm not so sure about this.

First, stack and heap allocation really start with the bit true. It's safe to call alloca or malloc and then start using the pointer. I don't want to teach the frontend to put lifetime.start on every possible allocation point.

Making two llvm.lifetime.start call be a no-op is really bad for the optimizers to work with. It means that just because you see an llvm.lifetime.start, you can't optimize *anything* with it, unless you prove the absence of an earlier lifetime.start -- anywhere in the program, interprocedurally. Consider DSE doing its upwards walk to find dead stores, and what it means when it sees a lifetime.start.

While this has a simpler semantic model, I don't think we will ever be able to optimize much code with it.

As an alternative model, I propose replacing lifetime.start/end with a single intrinsic that does the equivalent of 'store undef, %ptr' and nothing more.

Nick

Couldn't we just insert the store? Unless it's a volatile (or otherwise ordered) location, the optimizer would be free to remove it.

-Krzysztof