lifetime.start/end clarification

The LRM (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic) essentially states that:

  • ptr is dead before a call to “lifetime.start size, ptr”

  • ptr is dead after a call to “lifetime.end size, ptr”

This is all good and fine, and the expected use case is that all “lifetime.end size, ptr” markers are matched with a preceding “lifetime.start size, ptr” in the CFG.

What is supposed to happen when a “lifetime.end size, ptr” is not matched with a “lifetime.start size, ptr” ? I think there are a few possible answers:

  • the memory area pointed to by ptr is assumed to be alive since function entry

  • the memory area pointed to by ptr is assumed to be dead since function entry, as it has not been marked alive

  • this is an unexpected situation

I think this ambiguity should be cleared in the LRM, because today’s implicit assumption may be broken at any time.

This is not a theoretical question: clang can generate such cases. For example, the following testcase:

struct X {

void doSomething();

char b[33];

};

void bar(X &);

void baz();

void test(int i) {

if (i==9) {

X x;

x.doSomething();

label:

bar(x);

} else {

baz();

if (i==0)

goto label;

}

}

Produces:

%struct.X = type { [33 x i8] }

define void @_Z4testi(i32 %i) {

entry:

%x = alloca %struct.X, align 1

%cmp = icmp eq i32 %i, 9

br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %entry

%0 = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0

call void @llvm.lifetime.start(i64 33, i8* %0)

call void @_ZN1X11doSomethingEv(%struct.X* %x)

br label %label

label: ; preds = %if.else.label_crit_edge, %if.then

%.pre-phi = phi i8* [ %.pre, %if.else.label_crit_edge ], [ %0, %if.then ]

call void @_Z3barR1X(%struct.X* dereferenceable(33) %x)

call void @llvm.lifetime.end(i64 33, i8* %.pre-phi)

br label %if.end3

if.else: ; preds = %entry

tail call void @_Z3bazv()

%cmp1 = icmp eq i32 %i, 0

br i1 %cmp1, label %if.else.label_crit_edge, label %if.end3

if.else.label_crit_edge: ; preds = %if.else

%.pre = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0

br label %label

if.end3: ; preds = %if.else, %label

ret void

}

Note that the path thru if.else.label_crit_edge has no lifetime start.

Cheers,

Hi Arnaud.

In that path, X x seems to be undefined, so the behaviour is anyone's
guess. If I'm not mistaken, the standard says that kind of code is
ill-formed (6.7-p3).

Clang folks might know better.

cheers,
--renato

Hi Renato,

Just so this conversation does not get too side-tracked, from what Arnaud told me at the developers' meeting, this general issue breaks self-hosting of Clang/LLVM if we enable lifetime intrinsics for smaller objects. So even if the specific example is not really a well-defined program, the general problem still needs to be resolved for well-defined programs.

-Hal

Hey Renato,

As a reduced testcase, I agree this looks weird, but I do not think this is undefined behaviour: this is allowed iff x have trivial ctors and dtors (and clang enforces it).

Bar(X &) may be able to (and even designed to) handle the uninitialized object.

Cheers,
Arnaud

From: "Arnaud A. de Grandmaison" <arnaud.degrandmaison@arm.com>
To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, November 4, 2014 5:59:28 AM
Subject: [LLVMdev] lifetime.start/end clarification

The LRM
(http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic)
essentially states that:

- ptr is dead before a call to “lifetime.start size, ptr”

- ptr is dead after a call to “lifetime.end size, ptr”

This is all good and fine, and the expected use case is that all
“lifetime.end size, ptr” markers are matched with a preceding
“lifetime.start size, ptr” in the CFG.

What is supposed to happen when a “lifetime.end size, ptr” is not
matched with a “lifetime.start size, ptr” ? I think there are a few
possible answers:

- the memory area pointed to by ptr is assumed to be alive since
function entry

- the memory area pointed to by ptr is assumed to be dead since
function entry, as it has not been marked alive

- this is an unexpected situation

When you say "unmatched", do you mean not visible in any dominating block? Assuming we realize that the lifetime begin could also be inside a function called by any of these dominating blocks (assuming the pointer is an argument or is captured prior to the call), it seems most logical to say that the ptr is alive (and invariant) from the function entry.

I think this ambiguity should be cleared in the LRM, because today’s
implicit assumption

What is today's implicit assumption?

Thanks again,
Hal

From: Hal Finkel [mailto:hfinkel@anl.gov]
Sent: 04 November 2014 17:16
To: Arnaud De Grandmaison
Cc: LLVM Developers Mailing List
Subject: Re: [LLVMdev] lifetime.start/end clarification

> From: "Arnaud A. de Grandmaison" <arnaud.degrandmaison@arm.com>
> To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
> Sent: Tuesday, November 4, 2014 5:59:28 AM
> Subject: [LLVMdev] lifetime.start/end clarification
>
> The LRM
> (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic)
> essentially states that:
>
> - ptr is dead before a call to “lifetime.start size, ptr”
>
> - ptr is dead after a call to “lifetime.end size, ptr”
>
>
>
> This is all good and fine, and the expected use case is that all
> “lifetime.end size, ptr” markers are matched with a preceding
> “lifetime.start size, ptr” in the CFG.
>
>
>
> What is supposed to happen when a “lifetime.end size, ptr” is not
> matched with a “lifetime.start size, ptr” ? I think there are a few
> possible answers:
>
> - the memory area pointed to by ptr is assumed to be alive since
> function entry
>
> - the memory area pointed to by ptr is assumed to be dead since
> function entry, as it has not been marked alive
>
> - this is an unexpected situation
>

When you say "unmatched", do you mean not visible in any dominating
block? Assuming we realize that the lifetime begin could also be inside a
function called by any of these dominating blocks (assuming the pointer is an
argument or is captured prior to the call), it seems most logical to say that the
ptr is alive (and invariant) from the function entry.

Yes, I meant not visible in any dominating block. I believe assuming the ptr is alive from the function entry is conservatively correct, but not optimal. In my testcase, a lifetime.start could be inserted in the crit_edge bb.

Do we really want to handle cross-function stack lifetime analysis ? That's an interesting point. Until now, I supposed this all stayed inside the same function.

>
>
> I think this ambiguity should be cleared in the LRM, because today’s
> implicit assumption

What is today's implicit assumption?

To be honest: I don't know. I have to dig it out of the stack coloring pass, which I think (but I may be wrong) is the only user of the lifetime markers.

From: "Arnaud A. de Grandmaison" <arnaud.degrandmaison@arm.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, November 4, 2014 10:39:45 AM
Subject: RE: [LLVMdev] lifetime.start/end clarification

> From: Hal Finkel [mailto:hfinkel@anl.gov]
> Sent: 04 November 2014 17:16
> To: Arnaud De Grandmaison
> Cc: LLVM Developers Mailing List
> Subject: Re: [LLVMdev] lifetime.start/end clarification
>
> > From: "Arnaud A. de Grandmaison" <arnaud.degrandmaison@arm.com>
> > To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
> > Sent: Tuesday, November 4, 2014 5:59:28 AM
> > Subject: [LLVMdev] lifetime.start/end clarification
> >
> > The LRM
> > (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic)
> > essentially states that:
> >
> > - ptr is dead before a call to “lifetime.start size, ptr”
> >
> > - ptr is dead after a call to “lifetime.end size, ptr”
> >
> >
> >
> > This is all good and fine, and the expected use case is that all
> > “lifetime.end size, ptr” markers are matched with a preceding
> > “lifetime.start size, ptr” in the CFG.
> >
> >
> >
> > What is supposed to happen when a “lifetime.end size, ptr” is not
> > matched with a “lifetime.start size, ptr” ? I think there are a
> > few
> > possible answers:
> >
> > - the memory area pointed to by ptr is assumed to be alive since
> > function entry
> >
> > - the memory area pointed to by ptr is assumed to be dead since
> > function entry, as it has not been marked alive
> >
> > - this is an unexpected situation
> >
>
> When you say "unmatched", do you mean not visible in any dominating
> block? Assuming we realize that the lifetime begin could also be
> inside a
> function called by any of these dominating blocks (assuming the
> pointer is an
> argument or is captured prior to the call), it seems most logical
> to say that the
> ptr is alive (and invariant) from the function entry.

Yes, I meant not visible in any dominating block. I believe assuming
the ptr is alive from the function entry is conservatively correct,
but not optimal. In my testcase, a lifetime.start could be inserted
in the crit_edge bb.

I think that the conservative behavior is what we want for the definition of the intrinsics themselves. Now if we can do better in some cases by deducing subset of the CFG where the allocation is actually used (either in the backend or by using some cleanup pass that adds them), that's fine.

Do we really want to handle cross-function stack lifetime analysis ?
That's an interesting point. Until now, I supposed this all stayed
inside the same function.

As currently defined, that's unavoidable. Philip, as I recall, ran into this recently. If you've not already done so, I recommend you read the recent thread on 'Optimization hints for "constant" loads'.

-Hal

Don't you mean "not visible from at least one dominating path"?

You may have many paths through (perhaps the same) blocks from the
beginning of the function to a lifetime.end call, and if at least one
of them doesn't have lifetime.start(), you're doomed. You'd have to
scan all paths, not just the blocks, where lifetime.start could be
defined.

I agree function entry is a safe assumption.

cheers,
--renato

From: Renato Golin [mailto:renato.golin@linaro.org]
Sent: 04 November 2014 18:17
To: Arnaud De Grandmaison
Cc: Hal Finkel; LLVM Developers Mailing List
Subject: Re: [LLVMdev] lifetime.start/end clarification

> Yes, I meant not visible in any dominating block. I believe assuming the ptr
is alive from the function entry is conservatively correct, but not optimal. In
my testcase, a lifetime.start could be inserted in the crit_edge bb.

Don't you mean "not visible from at least one dominating path"?

Ooops. Yes, I meant if there exists at least one path from function entry to lifetime.end without a the lifetime.start.

Short version: I think Clang needs some analysis capabilities to widen the lifetime.

Here are some comments.

It seems to me there are 2 (mostly separate) aspects:

  1. Teaching clang how to do domination / post-domination analysis, so that the lifetime information (alloca/dealloca, or lifetime markers) can be set in the right locations in the IR. Such an analysis should be able to understand the language scopes, as well as labels / gotos and EH. The results of this analysis would be used at IR codegen.

Caveat: I have no expertise in this area of Clang, so take everything with a grain of salt and please feel free to correct me.

Where should this analysis be run ? Presumably at the beginning of each function’s codegen’s time.

This analysis looks a bit special (at least to me), as it will work over the AST, but it also requires a pretty good understanding of LLVM’s IR, which sets it apart from other clang analyses.

Maybe another option (some may call it a poor’s man option) would be to enforce at IR codegen time the dominators / post-dominators on the multiple paths (normal & EH control flows) by inserting basic blocks around each statement which is codegened. Those would be fall-thru most of the time, the llvm optimizers can remove them easily. The obvious drawback is that it will insert lots of small or fall-thru BBs.

  1. How liveranges are represented in LLVM’s IR.

I like the idea of pairing alloca / dealloca (and removing the lifetime markers, at least for stack coloring) and I think it could even ease / improve some analysis.

Currently, allocas have to be located in the entry BB in order to get a chance to be promoted to registers by the Mem2Reg pass. Allocas in other BBs are considered to be dynamic. I have no idea how difficult it would be to teach Mem2Reg to consider alloca/dealloca in other basic blocks.

With the alloca / dealloca solution, in order to do stack colouring, the alloca must not be in the entry block, because all allocas defined there are alive at the same time and cannot be merged. All LLVM passes would need to be teached that those alloca / dealloca pairs correspond to stack slots — as the alloca in the entry. The pairing would also have to be preserved across transformations (same as lifetime.start/end).

Cheers,

Arnaud

Would one of you mind taking a step back and explaining what you believe the “stack colouring problem” to be? I’m familiar with the general meaning of the term and even some of LLVM’s implementation; I’m just not sure what specific issue you’re referring to. Having the context would make it much easier to assess your proposals.

The goal of stack coloring is to reduce stack usage by storing user data
with non-overlapping lifetimes in the same stack memory.

C/C++ source programs usually have a naturally scoped structure, where the
lifetime of data is bounded by curly braces. This information reduces the
lifetime that stack coloring sees, so it saves stack memory.

When we go to LLVM IR, we lose all that information. We currently try to
recapture it with calls to @lifetime.start / end intrinsic calls at the
point of declaration and when the variable falls out of scope. Basically
we're trying to figure out how to put that scoping information back into
the IR without turning it back into a tree.

Furthermore, if we had better information about this in the IR, we could
augment ASan to detect use-after-scope.

My reading of the current spec is that your first option is the only valid interpretation. My interpretation of the spec wording would be that each path in the program may contain either a lifetime.start, a lifetime.end, or both. If the path contains no marker, the location is assumed live (unless proven otherwise). The lifetime markers segment the path into live, and dead regions, but only on that specific path. To reason about global properties, the optimizer must form “for-all-paths” facts. (Dominance being the easiest form of this generally.) Just to note, this is rather different than the “invariant.*” family. In particular, the invariant family requires the “start” as an argument to the “end” intrinsic. I believe the existing implementation matches the semantics I specified above. If you find a case where it doesn’t, that’s a bug. This seems fine to me. The optimizer can (soundly) conclude that %p is dead after the “lifetime.end” (for the two instructions), and dead before the “lifetime.start” (for the single instruction in that basic block, not for the previous BB). This seems like the proper result for this example, am I missing something? Philip

Everything you say here is general goodness. What part of this is problematic today? My belief is that the lifetime markers give you exactly the support you need. Where does this break down? Is the analysis too hard? Is Clang getting the semantics wrong? What’s the actually blocking issue? Philip

This seems fine to me. The optimizer can (soundly) conclude that %p is
dead after the "lifetime.end" (for the two instructions), and dead before
the "lifetime.start" (for the *single* instruction in that basic block,
*not* for the previous BB). This seems like the proper result for this
example, am I missing something?

What if I put that in a loop, unroll it once, and prove that the
lifetime.start is unreachable? We would end up with IR like:

loop:
  ... use %p
  call void @lifetime.end( %p )
  ... use %p
  call void @lifetime.end( %p )
  br i1 %c, label %loop, label %exit

Are the second uses of %p uses of dead memory?

We have similar issues if the optimizer somehow removes the lifetime end
and keeps the start:

loop:
  call void @lifetime.start( %p )
  ... use %p
  call void @lifetime.start( %p )
  ... use %p
  br i1 %c, label %loop, label %exit

For this reason, it has been suggested that these intrinsics are horribly
broken, and both should be remodeled to just mean "store of undef bytes to
this memory". If "use %p" is a load, for example, in both cases we can
safely say it returns undef, because it's a use-after-scope.

I think coming up with a new representation with simpler semantics is the
way to go. One allocation or lifetime start, and one deallocation and end.

Implementing this in Clang will be tricky, though. Clang's IRGen is
supposed to be a dumb AST walk, but it has already strayed from that path.
Needs more thought...

The LRM (http://llvm.org/docs/LangRef.html#llvm-lifetime-start-intrinsic) essentially states that:

  • ptr is dead before a call to “lifetime.start size, ptr”

  • ptr is dead after a call to “lifetime.end size, ptr”

This is all good and fine, and the expected use case is that all “lifetime.end size, ptr” markers are matched with a preceding “lifetime.start size, ptr” in the CFG.

What is supposed to happen when a “lifetime.end size, ptr” is not matched with a “lifetime.start size, ptr” ? I think there are a few possible answers:

  • the memory area pointed to by ptr is assumed to be alive since function entry

  • the memory area pointed to by ptr is assumed to be dead since function entry, as it has not been marked alive

  • this is an unexpected situation

My reading of the current spec is that your first option is the only valid interpretation.

My interpretation of the spec wording would be that each path in the program may contain either a lifetime.start, a lifetime.end, or both. If the path contains no marker, the location is assumed live (unless proven otherwise). The lifetime markers segment the path into live, and dead regions, but only on that specific path. To reason about global properties, the optimizer must form “for-all-paths” facts. (Dominance being the easiest form of this generally.)

You worded the above as your interpretation of the spec. I agree your interpretation is the sanest one.

Just to note, this is rather different than the “invariant.*” family. In particular, the invariant family requires the “start” as an argument to the “end” intrinsic.

I think this ambiguity should be cleared in the LRM, because today’s implicit assumption may be broken at any time.

I believe the existing implementation matches the semantics I specified above. If you find a case where it doesn’t, that’s a bug.

I checked the stack coloring (the only real user of the lifetime markers), and it seems to match this semantics.

This is not a theoretical question: clang can generate such cases. For example, the following testcase:

struct X {

void doSomething();

char b[33];

};

void bar(X &);

void baz();

void test(int i) {

if (i==9) {

X x;

x.doSomething();

label:

bar(x);

} else {

baz();

if (i==0)

goto label;

}

}

Produces:

%struct.X = type { [33 x i8] }

define void @_Z4testi(i32 %i) {

entry:

%x = alloca %struct.X, align 1

%cmp = icmp eq i32 %i, 9

br i1 %cmp, label %if.then, label %if.else

if.then: ; preds = %entry

%0 = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0

call void @llvm.lifetime.start(i64 33, i8* %0)

call void @_ZN1X11doSomethingEv(%struct.X* %x)

br label %label

label: ; preds = %if.else.label_crit_edge, %if.then

%.pre-phi = phi i8* [ %.pre, %if.else.label_crit_edge ], [ %0, %if.then ]

call void @_Z3barR1X(%struct.X* dereferenceable(33) %x)

call void @llvm.lifetime.end(i64 33, i8* %.pre-phi)

br label %if.end3

if.else: ; preds = %entry

tail call void @_Z3bazv()

%cmp1 = icmp eq i32 %i, 0

br i1 %cmp1, label %if.else.label_crit_edge, label %if.end3

if.else.label_crit_edge: ; preds = %if.else

%.pre = getelementptr inbounds %struct.X* %x, i64 0, i32 0, i64 0

br label %label

if.end3: ; preds = %if.else, %label

ret void

}

Note that the path thru if.else.label_crit_edge has no lifetime start.

This seems fine to me. The optimizer can (soundly) conclude that %p is dead after the “lifetime.end” (for the two instructions), and dead before the “lifetime.start” (for the single instruction in that basic block, not for the previous BB). This seems like the proper result for this example, am I missing something?

With the clarification you made on the semantics, the above IR is correct, but could be improved when clang generates it: the label_crit_edge block should contain a lifetime.start for the alloca.

Philip

Would one of you mind taking a step back and explaining what you believe the “stack colouring problem” to be? I’m familiar with the general meaning of the term and even some of LLVM’s implementation; I’m just not sure what specific issue you’re referring to. Having the context would make it much easier to assess your proposals.

The goal of stack coloring is to reduce stack usage by storing user data with non-overlapping lifetimes in the same stack memory.

C/C++ source programs usually have a naturally scoped structure, where the lifetime of data is bounded by curly braces. This information reduces the lifetime that stack coloring sees, so it saves stack memory.

When we go to LLVM IR, we lose all that information. We currently try to recapture it with calls to @lifetime.start / end intrinsic calls at the point of declaration and when the variable falls out of scope. Basically we’re trying to figure out how to put that scoping information back into the IR without turning it back into a tree.

Furthermore, if we had better information about this in the IR, we could augment ASan to detect use-after-scope.

Everything you say here is general goodness. What part of this is problematic today? My belief is that the lifetime markers give you exactly the support you need. Where does this break down? Is the analysis too hard? Is Clang getting the semantics wrong? What’s the actually blocking issue?

Here is the actually blocking issue. Today, the lifetime markers are only inserted if the alloca is big enough (32+ bytes). I tried several times to add the lifetime markers for unnamed temporaries; the patches have been reviewed each time by people knowledgeable with this part of clang. But when we remove the size limit for the lifetime marker insertion, we get miscompiles when bootstrapping clang. This is why I was asking for clarification of the specification, as this is a part I found left room for interpretation.

Philip

It’s hard to discuss this without being specific about the starting IR and transforms. My general response is that either a) such a transform wouldn’t be valid or b) the behaviour of the original program was undefined. In the particular final IR result you gave, the second use of %p would be dead. Moreover, if the optimizer can prove this loop iterates at least once, all access to %p in the loop is dead. This is exactly the semantics you want. But we can’t just drop arbitrary calls. Doing so is unsound. I’m not actually intrinsically opposed to just an alternative. I would like to see a concrete example justifying the proposal though. Nothing said in this thread to date warrants redefining these intrinsics. Worth stating explicitly: your proposed semantics are strictly less powerful than the current ones. They may be “simpler”, but you loose optimization possibilities. At least so far, I disagree. I am open to being convinced though. :slight_smile: Do you have motivating cases other than a goto into a scoped region? That seems like a fairly straight forward “special case”. How far would you get by special casing a few cases that matter and leaving the general problem unsolved? At worst, you’re missing an optimization here.

Another approach would be have LLVM prove that %x is uninitialized along the path through if.else.label_crit_edge and then combine that with the lifetime.start along the other path. This would require reasoning about pointer escapes, but that’s fairly tractable. It actually doesn’t seem unreasonable that GVN would get this case. Do you have a counter example where an optimization is missed only when the lifetime.start is not present? More generally, are there specific optimizations which aren’t kicking in with the current placement strategy that you’d like to see? Philip