GlobalsModRef (and thus LTO) is completely broken

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA in a separate thread. But we need to sort out the completely and horribly broken aspects of GlobalsModRef today, and the practical steps forward. This email is totally about the practical stuff.

Now, as to why I emailed this group of people and with this subject, the only pass pipeline that includes GlobalsModRef, is the LTO pipeline. So we have significantly less testing here than we do for stuff in the main pipeline. Also, I don’t have any benchmarks I can effectively run to tell me if my changes impacted performance. =/ So I may need your help to evaluate some of this. Now, onto the challenges…

First, GlobalsModRef as currently implemented completely abuses a loophole in the current pass manager to incorrectly stick around even while it is being “invalidated”. I don’t know of any way to fix this in the current pass manager without completely defeating the purpose of the analysis pass. The consequence is that whether passes claim to preserve AA or not is irrelevant, GlobalsModRef will be preserved anyways! =[[[[ So the only way to make things work correctly is to make GlobalsModRef survive any per-function changes to the IR. We cannot rely on AA updates at all.

Most of the updates that GlobalsModRef needs can be provided by a ValueHandle now that we have them. This will prevent ABA-style issues in its caches, etc. I plan to send out a patch soon that switches it over to this strategy.

It is also relying on a precomputed set of global variables whose address is never used by an instruction other than some very small set (gep, bitcast) as “non-address-taken”. It then runs GetUnderlyingObject on the two pointers in alias queries, and if that finds one of these “non-address-taken” globals for one of the memory locations but not the other, it concludes no-alias! This is broken for a number of reasons.

a) If the two locations merely have a different depth of instruction stack, because GetUnderlyingObject has a recursion cap, one side can fail while the other succeeds, and we erroneously produce no-alias.

b) If instcombine or any other pass for any reason introduces on one path an instruction that GetUnderlyingObject can’t look through (select, phi, load, …), we incorrectly conclude no-alias. This is what addEscapingUse was intended to solve, but we would literally have to call it from every pass because we can’t rely on analysis invalidation!

c) If any pass actually escapes a pointer from one function into another, we invalidate the underlying assumption of ‘non-address-taken’ that it relies upon.

Now, as I argued in my general AA thread, I think we might be able to assume that (c) doesn’t happen today. But both (a) and (b) seem like active nightmares to try to fix. I can see hacky ways to avoid (a) where we detect why GetUnderlyingObject fails, but I don’t see how to fix both (a) and (b) (or to fix (a) well) without just disabling this specific aspect of GloblasModRef.

So that’s what I’d like to do. It shouldn’t impact the mod/ref information provided by the analysis, just the alias sets.

However, even this may not be necessary. We may just not in practice see these issues, and I don’t really want to perturb the LTO generated code quality for a hypothetical issue until we actually have the tools in place to handle things reasonably.

So my plan is:

  1. Fix obvious issues with GloblasModRef and switch it to ValueHandles
  2. Mail out a patch to disable this part of GlobalsModRef. I can put it behind a flag or however folks would like it to work.
  3. Remove addEscapingUse() update API, which without #2 may regress some LTO test case I don’t have (because I don’t have any other than bootstrap)

Thoughts?
-Chandler

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA in a separate thread. But we need to sort out the completely and horribly broken aspects of GlobalsModRef today, and the practical steps forward. This email is totally about the practical stuff.

Now, as to why I emailed this group of people and with this subject, the only pass pipeline that includes GlobalsModRef, is the LTO pipeline.

Ah, so it is just an LTO enabled benchmark hack then.

It is also relying on a precomputed set of global variables whose address is never used by an instruction other than some very small set (gep, bitcast) as "non-address-taken". It then runs GetUnderlyingObject on the two pointers in alias queries, and if that finds one of these "non-address-taken" globals for one of the memory locations but not the other, it concludes no-alias! This is broken for a number of reasons.

a) If the two locations merely have a different *depth* of instruction stack, because GetUnderlyingObject has a recursion cap, one side can fail while the other succeeds, and we erroneously produce no-alias.

Interesting. I’m sure it is no consolation, but GlobalsModRef probably predated the recursion cap :slight_smile:

b) If instcombine or any other pass for any reason introduces on one path an instruction that GetUnderlyingObject can't look through (select, phi, load, ....), we incorrectly conclude no-alias. This is what addEscapingUse was intended to solve, but we would literally have to call it from every pass because we can't rely on analysis invalidation!

c) If any pass actually escapes a pointer from one function into another, we invalidate the underlying assumption of 'non-address-taken' that it relies upon.

Yep, all of these are pretty nasty.

Now, as I argued in my general AA thread, I think we might be able to assume that (c) doesn't happen today. But both (a) and (b) seem like active nightmares to try to fix. I can see hacky ways to avoid (a) where we detect *why* GetUnderlyingObject fails, but I don't see how to fix both (a) and (b) (or to fix (a) well) without just disabling this specific aspect of GloblasModRef.

Ok, but we need performance information to make sure this doesn’t cause a regression in practice for LTO builds. For example, Spec 2K and 2K6 are a reasonable place to start.

1) Fix obvious issues with GloblasModRef and switch it to ValueHandles
2) Mail out a patch to disable this part of GlobalsModRef. I can put it behind a flag or however folks would like it to work.
3) Remove addEscapingUse() update API, which without #2 may regress some LTO test case I don't have (because I don't have any other than bootstrap)

Sounds great if we can do this without causing a regression in practice. Are you aware of any miscompiles that might be attributed to this, or are these “theoretical" concerns?

-Chris

From: "Chris Lattner" <clattner@apple.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, "Hal Finkel" <hfinkel@anl.gov>, "Justin Bogner"
<mail@justinbogner.com>, "Duncan Exon Smith" <dexonsmith@apple.com>, "Rafael Espíndola" <rafael.espindola@gmail.com>
Sent: Tuesday, July 14, 2015 12:18:36 AM
Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken

>
> Ok folks,
>
> I wrote up the general high-level thoughts I have about stateful AA
> in a separate thread. But we need to sort out the completely and
> horribly broken aspects of GlobalsModRef today, and the practical
> steps forward. This email is totally about the practical stuff.
>
> Now, as to why I emailed this group of people and with this
> subject, the only pass pipeline that includes GlobalsModRef, is
> the LTO pipeline.

Ah, so it is just an LTO enabled benchmark hack then.

> It is also relying on a precomputed set of global variables whose
> address is never used by an instruction other than some very small
> set (gep, bitcast) as "non-address-taken". It then runs
> GetUnderlyingObject on the two pointers in alias queries, and if
> that finds one of these "non-address-taken" globals for one of the
> memory locations but not the other, it concludes no-alias! This is
> broken for a number of reasons.
>
> a) If the two locations merely have a different *depth* of
> instruction stack, because GetUnderlyingObject has a recursion
> cap, one side can fail while the other succeeds, and we
> erroneously produce no-alias.

Interesting. I’m sure it is no consolation, but GlobalsModRef
probably predated the recursion cap :slight_smile:

> b) If instcombine or any other pass for any reason introduces on
> one path an instruction that GetUnderlyingObject can't look
> through (select, phi, load, ....), we incorrectly conclude
> no-alias. This is what addEscapingUse was intended to solve, but
> we would literally have to call it from every pass because we
> can't rely on analysis invalidation!
>
> c) If any pass actually escapes a pointer from one function into
> another, we invalidate the underlying assumption of
> 'non-address-taken' that it relies upon.

Yep, all of these are pretty nasty.

> Now, as I argued in my general AA thread, I think we might be able
> to assume that (c) doesn't happen today. But both (a) and (b) seem
> like active nightmares to try to fix. I can see hacky ways to
> avoid (a) where we detect *why* GetUnderlyingObject fails, but I
> don't see how to fix both (a) and (b) (or to fix (a) well) without
> just disabling this specific aspect of GloblasModRef.

Ok, but we need performance information to make sure this doesn’t
cause a regression in practice for LTO builds. For example, Spec 2K
and 2K6 are a reasonable place to start.

> 1) Fix obvious issues with GloblasModRef and switch it to
> ValueHandles
> 2) Mail out a patch to disable this part of GlobalsModRef. I can
> put it behind a flag or however folks would like it to work.
> 3) Remove addEscapingUse() update API, which without #2 may regress
> some LTO test case I don't have (because I don't have any other
> than bootstrap)

Sounds great if we can do this without causing a regression in
practice.

Personally, I'm comfortable with moving to a contractual obligation for the escaping uses situation: No pass may capture the address of a (previously-uncaptured) global (even locally) without notifying the AA infrastructure. I can't think of any in-tree pass that does this now, although we might certainly have some in the future. Can you think of any we have now?

I'd really like to get the AA pass that Sam Parker has been working on (http://reviews.llvm.org/D10059) in, but it will add measurable compile-time overhead if it can't cache, and efficient caching seems to depend on the same property.

-Hal

I don’t really have any way of benchmarking SPEC with LTO. If that’s a configuration that is super important to people, I’m hoping they’ll let me know about any particular performance issues.

Ultimately, I’m happy making no performance impacting changes here. The thing is, I need to remove a broken abstraction (addEscapingUse) that was half-way added without any test case (literally, I have no test case that fails if I delete it. The regression tests don’t fail if I assert(0) in it). =/ I don’t want to do that if it causes miscompiles for folks that are relying on LTO. So my plan is to essentially post the patch that will fix the miscompiles but may regress performance. If the miscompiles show up for those relying on LTO and this (somewhat unsound) pass, they are in the best position to evaluate the cost/benefit of a performance reducing patch.

Honsetly, my hope is that it won’t even impact performance. But I have no realistic way to measure it in any useful way. And for my platforms, I would be perfectly happy to trade some performance for correctness here. That’s one of the reasons we’re not relying on LTO yet. But I understand that others in the community have more strict performance needs and so I’m doing what I can to give them options.

From: “Chris Lattner” <clattner@apple.com>
To: “Chandler Carruth” <chandlerc@gmail.com>
Cc: “LLVM Developers Mailing List” <llvmdev@cs.uiuc.edu>, “Hal Finkel” <hfinkel@anl.gov>, “Justin Bogner”
<mail@justinbogner.com>, “Duncan Exon Smith” <dexonsmith@apple.com>, “Rafael Espíndola” <rafael.espindola@gmail.com>
Sent: Tuesday, July 14, 2015 12:18:36 AM
Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA
in a separate thread. But we need to sort out the completely and
horribly broken aspects of GlobalsModRef today, and the practical
steps forward. This email is totally about the practical stuff.

Now, as to why I emailed this group of people and with this
subject, the only pass pipeline that includes GlobalsModRef, is
the LTO pipeline.

Ah, so it is just an LTO enabled benchmark hack then.

It is also relying on a precomputed set of global variables whose
address is never used by an instruction other than some very small
set (gep, bitcast) as “non-address-taken”. It then runs
GetUnderlyingObject on the two pointers in alias queries, and if
that finds one of these “non-address-taken” globals for one of the
memory locations but not the other, it concludes no-alias! This is
broken for a number of reasons.

a) If the two locations merely have a different depth of
instruction stack, because GetUnderlyingObject has a recursion
cap, one side can fail while the other succeeds, and we
erroneously produce no-alias.

Interesting. I’m sure it is no consolation, but GlobalsModRef
probably predated the recursion cap :slight_smile:

b) If instcombine or any other pass for any reason introduces on
one path an instruction that GetUnderlyingObject can’t look
through (select, phi, load, …), we incorrectly conclude
no-alias. This is what addEscapingUse was intended to solve, but
we would literally have to call it from every pass because we
can’t rely on analysis invalidation!

c) If any pass actually escapes a pointer from one function into
another, we invalidate the underlying assumption of
‘non-address-taken’ that it relies upon.

Yep, all of these are pretty nasty.

Now, as I argued in my general AA thread, I think we might be able
to assume that (c) doesn’t happen today. But both (a) and (b) seem
like active nightmares to try to fix. I can see hacky ways to
avoid (a) where we detect why GetUnderlyingObject fails, but I
don’t see how to fix both (a) and (b) (or to fix (a) well) without
just disabling this specific aspect of GloblasModRef.

Ok, but we need performance information to make sure this doesn’t
cause a regression in practice for LTO builds. For example, Spec 2K
and 2K6 are a reasonable place to start.

  1. Fix obvious issues with GloblasModRef and switch it to
    ValueHandles
  2. Mail out a patch to disable this part of GlobalsModRef. I can
    put it behind a flag or however folks would like it to work.
  3. Remove addEscapingUse() update API, which without #2 may regress
    some LTO test case I don’t have (because I don’t have any other
    than bootstrap)

Sounds great if we can do this without causing a regression in
practice.

Personally, I’m comfortable with moving to a contractual obligation for the escaping uses situation: No pass may capture the address of a (previously-uncaptured) global (even locally) without notifying the AA infrastructure.

Note that I said escape, and you say capture here. I’ve not quite thought about it enough to generalize this to capture.

Anyways, I don’t generally disagree (see my lengthier mail about how I’m envisioning AA updates). However, this is not sufficient to fix GloblasModRef. You also have to somehow prevent asymmetric evaluation of GetUnderlyingObject on the two pointers queried. To fix this would require a substantially different implementation of GlobalsModRef which I think is possible, but which I don’t realistically have the time to implement right now.

I can’t think of any in-tree pass that does this now, although we might certainly have some in the future. Can you think of any we have now?

If you LTO the ASan runtime along with the instrumentation, then the instrumentation pass violates this today. Most instrumentation passes violate this if the optimizer can see the runtime.

I’m not aware of any non-instrumentation passes that violate it today.

I’d really like to get the AA pass that Sam Parker has been working on (http://reviews.llvm.org/D10059) in, but it will add measurable compile-time overhead if it can’t cache, and efficient caching seems to depend on the same property.

Sorry I haven’t chimed in there. I’m worried that there is no sound way to do this in the current pass manager, but I’ll read that patch and chime in with specific suggestions.

-Chandler

From: "Chandler Carruth" <chandlerc@google.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Chris Lattner" <clattner@apple.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Tuesday, July 14, 2015 12:55:37 AM
Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken

> From: "Chris Lattner" < clattner@apple.com >
> To: "Chandler Carruth" < chandlerc@gmail.com >
> Cc: "LLVM Developers Mailing List" < llvmdev@cs.uiuc.edu >, "Hal
> Finkel" < hfinkel@anl.gov >, "Justin Bogner"
> < mail@justinbogner.com >, "Duncan Exon Smith" <
> dexonsmith@apple.com >, "Rafael Espíndola" <
> rafael.espindola@gmail.com >
> Sent: Tuesday, July 14, 2015 12:18:36 AM
> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely
> broken
>
>
> >
> > Ok folks,
> >
> > I wrote up the general high-level thoughts I have about stateful
> > AA
> > in a separate thread. But we need to sort out the completely and
> > horribly broken aspects of GlobalsModRef today, and the practical
> > steps forward. This email is totally about the practical stuff.
> >
> > Now, as to why I emailed this group of people and with this
> > subject, the only pass pipeline that includes GlobalsModRef, is
> > the LTO pipeline.
>
> Ah, so it is just an LTO enabled benchmark hack then.
>
> > It is also relying on a precomputed set of global variables whose
> > address is never used by an instruction other than some very
> > small
> > set (gep, bitcast) as "non-address-taken". It then runs
> > GetUnderlyingObject on the two pointers in alias queries, and if
> > that finds one of these "non-address-taken" globals for one of
> > the
> > memory locations but not the other, it concludes no-alias! This
> > is
> > broken for a number of reasons.
> >
> > a) If the two locations merely have a different *depth* of
> > instruction stack, because GetUnderlyingObject has a recursion
> > cap, one side can fail while the other succeeds, and we
> > erroneously produce no-alias.
>
> Interesting. I’m sure it is no consolation, but GlobalsModRef
> probably predated the recursion cap :slight_smile:
>
> > b) If instcombine or any other pass for any reason introduces on
> > one path an instruction that GetUnderlyingObject can't look
> > through (select, phi, load, ....), we incorrectly conclude
> > no-alias. This is what addEscapingUse was intended to solve, but
> > we would literally have to call it from every pass because we
> > can't rely on analysis invalidation!
> >
> > c) If any pass actually escapes a pointer from one function into
> > another, we invalidate the underlying assumption of
> > 'non-address-taken' that it relies upon.
>
> Yep, all of these are pretty nasty.
>
> > Now, as I argued in my general AA thread, I think we might be
> > able
> > to assume that (c) doesn't happen today. But both (a) and (b)
> > seem
> > like active nightmares to try to fix. I can see hacky ways to
> > avoid (a) where we detect *why* GetUnderlyingObject fails, but I
> > don't see how to fix both (a) and (b) (or to fix (a) well)
> > without
> > just disabling this specific aspect of GloblasModRef.
>
> Ok, but we need performance information to make sure this doesn’t
> cause a regression in practice for LTO builds. For example, Spec 2K
> and 2K6 are a reasonable place to start.
>
> > 1) Fix obvious issues with GloblasModRef and switch it to
> > ValueHandles
> > 2) Mail out a patch to disable this part of GlobalsModRef. I can
> > put it behind a flag or however folks would like it to work.
> > 3) Remove addEscapingUse() update API, which without #2 may
> > regress
> > some LTO test case I don't have (because I don't have any other
> > than bootstrap)
>
> Sounds great if we can do this without causing a regression in
> practice.

Personally, I'm comfortable with moving to a contractual obligation
for the escaping uses situation: No pass may capture the address of
a (previously-uncaptured) global (even locally) without notifying
the AA infrastructure.

Note that I said escape, and you say capture here.

That was intentional. The problem is that I think you need to pick up all captures for a reasonably-general case, because otherwise you need separate logic to handle local queries. If you think that some local variable can't alias some global because you *know* that the global's address has not been captured, then you need to know if you generate a new local capture.

I've not quite
thought about it enough to generalize this to capture.

Anyways, I don't generally disagree (see my lengthier mail about how
I'm envisioning AA updates). However, this is *not sufficient* to
fix GloblasModRef. You also have to somehow prevent asymmetric
evaluation of GetUnderlyingObject on the two pointers queried. To
fix this would require a substantially different implementation of
GlobalsModRef which I think is possible, but which I don't
realistically have the time to implement right now.

You can turn off the depth limit on GetUnderlyingObject.

I can't think of any in-tree pass that does this now, although we
might certainly have some in the future. Can you think of any we
have now?

If you LTO the ASan runtime along with the instrumentation, then the
instrumentation pass violates this today. Most instrumentation
passes violate this if the optimizer can see the runtime.

I'm not aware of any non-instrumentation passes that violate it
today.

Okay, fair enough. Updating those passes to invalidate something in this regard seems manageable.

I'd really like to get the AA pass that Sam Parker has been working
on ( http://reviews.llvm.org/D10059 ) in, but it will add measurable
compile-time overhead if it can't cache, and efficient caching seems
to depend on the same property.

Sorry I haven't chimed in there. I'm worried that there is *no* sound
way to do this in the current pass manager, but I'll read that patch
and chime in with specific suggestions.

Thanks!

-Hal

From: "Chris Lattner" <clattner@apple.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, "Hal Finkel" <hfinkel@anl.gov>, "Justin Bogner"
<mail@justinbogner.com>, "Duncan Exon Smith" <dexonsmith@apple.com>, "Rafael Espíndola" <rafael.espindola@gmail.com>
Sent: Tuesday, July 14, 2015 12:18:36 AM
Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA
in a separate thread. But we need to sort out the completely and
horribly broken aspects of GlobalsModRef today, and the practical
steps forward. This email is totally about the practical stuff.

Now, as to why I emailed this group of people and with this
subject, the only pass pipeline that includes GlobalsModRef, is
the LTO pipeline.

Ah, so it is just an LTO enabled benchmark hack then.

It is also relying on a precomputed set of global variables whose
address is never used by an instruction other than some very small
set (gep, bitcast) as "non-address-taken". It then runs
GetUnderlyingObject on the two pointers in alias queries, and if
that finds one of these "non-address-taken" globals for one of the
memory locations but not the other, it concludes no-alias! This is
broken for a number of reasons.

a) If the two locations merely have a different *depth* of
instruction stack, because GetUnderlyingObject has a recursion
cap, one side can fail while the other succeeds, and we
erroneously produce no-alias.

Interesting. I’m sure it is no consolation, but GlobalsModRef
probably predated the recursion cap :slight_smile:

b) If instcombine or any other pass for any reason introduces on
one path an instruction that GetUnderlyingObject can't look
through (select, phi, load, ....), we incorrectly conclude
no-alias. This is what addEscapingUse was intended to solve, but
we would literally have to call it from every pass because we
can't rely on analysis invalidation!

c) If any pass actually escapes a pointer from one function into
another, we invalidate the underlying assumption of
'non-address-taken' that it relies upon.

Yep, all of these are pretty nasty.

Now, as I argued in my general AA thread, I think we might be able
to assume that (c) doesn't happen today. But both (a) and (b) seem
like active nightmares to try to fix. I can see hacky ways to
avoid (a) where we detect *why* GetUnderlyingObject fails, but I
don't see how to fix both (a) and (b) (or to fix (a) well) without
just disabling this specific aspect of GloblasModRef.

Ok, but we need performance information to make sure this doesn’t
cause a regression in practice for LTO builds. For example, Spec 2K
and 2K6 are a reasonable place to start.

1) Fix obvious issues with GloblasModRef and switch it to
ValueHandles
2) Mail out a patch to disable this part of GlobalsModRef. I can
put it behind a flag or however folks would like it to work.
3) Remove addEscapingUse() update API, which without #2 may regress
some LTO test case I don't have (because I don't have any other
than bootstrap)

Sounds great if we can do this without causing a regression in
practice.

Personally, I'm comfortable with moving to a contractual obligation for the escaping uses situation: No pass may capture the address of a (previously-uncaptured) global (even locally) without notifying the AA infrastructure. I can't think of any in-tree pass that does this now, although we might certainly have some in the future. Can you think of any we have now?

MergeGlobals creates new globals which could escape. This is probably going to be treated differently from updating AA as there’s nothing tracking the new global yet, so nothing to update. But still best to make sure this is ok.

Pete

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA in a
separate thread. But we need to sort out the completely and horribly broken
aspects of GlobalsModRef today, and the practical steps forward. This email
is totally about the practical stuff.

Now, as to why I emailed this group of people and with this subject, the
only pass pipeline that includes GlobalsModRef, is the LTO pipeline. So we
have significantly less testing here than we do for stuff in the main
pipeline. Also, I don't have any benchmarks I can effectively run to tell
me if my changes impacted performance. =/ So I may need your help to
evaluate some of this. Now, onto the challenges....

First, GlobalsModRef as currently implemented completely abuses a loophole
in the current pass manager to incorrectly stick around even while it is
being "invalidated". I don't know of any way to fix this in the current
pass manager without completely defeating the purpose of the analysis pass.
The consequence is that whether passes claim to preserve AA or not is
irrelevant, GlobalsModRef will be preserved anyways! =[[[[ So the only way
to make things work correctly is to make GlobalsModRef survive *any*
per-function changes to the IR. We cannot rely on AA updates at all.

Most of the updates that GlobalsModRef needs can be provided by a
ValueHandle now that we have them. This will prevent ABA-style issues in
its caches, etc. I plan to send out a patch soon that switches it over to
this strategy.

It is also relying on a precomputed set of global variables whose address
is never used by an instruction other than some very small set (gep,
bitcast) as "non-address-taken". It then runs GetUnderlyingObject on the
two pointers in alias queries, and if that finds one of these
"non-address-taken" globals for one of the memory locations but not the
other, it concludes no-alias! This is broken for a number of reasons.

a) If the two locations merely have a different *depth* of instruction
stack, because GetUnderlyingObject has a recursion cap, one side can fail
while the other succeeds, and we erroneously produce no-alias.

How about adding an optional argument to the interface to ignore limit?

b) If instcombine or any other pass for any reason introduces on one path
an instruction that GetUnderlyingObject can't look through (select, phi,
load, ....), we incorrectly conclude no-alias. This is what addEscapingUse
was intended to solve, but we would literally have to call it from every
pass because we can't rely on analysis invalidation!

I am not sure if this matters. If a pointer is loaded from the memory, then
either pointer points to heap or the the underlying object is address
taken. For the phi case, I wonder what transformation can introduce it
while the original source construct does not escape/addr-take the global
already.

c) If any pass actually escapes a pointer from one function into another,
we invalidate the underlying assumption of 'non-address-taken' that it
relies upon.

This can probably happen with function outlining etc.

thanks,

David

From: "Xinliang David Li" <xinliangli@gmail.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, "Hal Finkel" <hfinkel@anl.gov>, "Justin Bogner"
<mail@justinbogner.com>, "Duncan Exon Smith" <dexonsmith@apple.com>, "Rafael Espíndola" <rafael.espindola@gmail.com>
Sent: Tuesday, July 14, 2015 12:59:29 PM
Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA
in a separate thread. But we need to sort out the completely and
horribly broken aspects of GlobalsModRef today, and the practical
steps forward. This email is totally about the practical stuff.

Now, as to why I emailed this group of people and with this subject,
the only pass pipeline that includes GlobalsModRef, is the LTO
pipeline. So we have significantly less testing here than we do for
stuff in the main pipeline. Also, I don't have any benchmarks I can
effectively run to tell me if my changes impacted performance. =/ So
I may need your help to evaluate some of this. Now, onto the
challenges....

First, GlobalsModRef as currently implemented completely abuses a
loophole in the current pass manager to incorrectly stick around
even while it is being "invalidated". I don't know of any way to fix
this in the current pass manager without completely defeating the
purpose of the analysis pass. The consequence is that whether passes
claim to preserve AA or not is irrelevant, GlobalsModRef will be
preserved anyways! =[[[[ So the only way to make things work
correctly is to make GlobalsModRef survive *any* per-function
changes to the IR. We cannot rely on AA updates at all.

Most of the updates that GlobalsModRef needs can be provided by a
ValueHandle now that we have them. This will prevent ABA-style
issues in its caches, etc. I plan to send out a patch soon that
switches it over to this strategy.

It is also relying on a precomputed set of global variables whose
address is never used by an instruction other than some very small
set (gep, bitcast) as "non-address-taken". It then runs
GetUnderlyingObject on the two pointers in alias queries, and if
that finds one of these "non-address-taken" globals for one of the
memory locations but not the other, it concludes no-alias! This is
broken for a number of reasons.

a) If the two locations merely have a different *depth* of
instruction stack, because GetUnderlyingObject has a recursion cap,
one side can fail while the other succeeds, and we erroneously
produce no-alias.

How about adding an optional argument to the interface to ignore
limit?

We have this already.

b) If instcombine or any other pass for any reason introduces on one
path an instruction that GetUnderlyingObject can't look through
(select, phi, load, ....), we incorrectly conclude no-alias. This is
what addEscapingUse was intended to solve, but we would literally
have to call it from every pass because we can't rely on analysis
invalidation!

I am not sure if this matters. If a pointer is loaded from the
memory, then either pointer points to heap or the the underlying
object is address taken. For the phi case, I wonder what
transformation can introduce it while the original source construct
does not escape/addr-take the global already.

The problem is determining whether both pointers derive from the same global. For that, you need to track dependencies through loads/stores. You could, however, just give up on those cases.

c) If any pass actually escapes a pointer from one function into
another, we invalidate the underlying assumption of
'non-address-taken' that it relies upon.

This can probably happen with function outlining etc.

Yes, exactly (and this is why I said we'd likely have such things in the future). As Chandler pointed out, our instrumentation passes already do this as well.

-Hal

Ok folks,

I wrote up the general high-level thoughts I have about stateful AA in a separate thread. But we need to sort out the completely and horribly broken aspects of GlobalsModRef today, and the practical steps forward. This email is totally about the practical stuff.

Now, as to why I emailed this group of people and with this subject, the only pass pipeline that includes GlobalsModRef, is the LTO pipeline.

Ah, so it is just an LTO enabled benchmark hack then.

It is also relying on a precomputed set of global variables whose address is never used by an instruction other than some very small set (gep, bitcast) as “non-address-taken”. It then runs GetUnderlyingObject on the two pointers in alias queries, and if that finds one of these “non-address-taken” globals for one of the memory locations but not the other, it concludes no-alias! This is broken for a number of reasons.

a) If the two locations merely have a different depth of instruction stack, because GetUnderlyingObject has a recursion cap, one side can fail while the other succeeds, and we erroneously produce no-alias.

Interesting. I’m sure it is no consolation, but GlobalsModRef probably predated the recursion cap :slight_smile:

Why not return the conservative result when the cap is hit?

b) If instcombine or any other pass for any reason introduces on one path an instruction that GetUnderlyingObject can’t look through (select, phi, load, …), we incorrectly conclude no-alias. This is what addEscapingUse was intended to solve, but we would literally have to call it from every pass because we can’t rely on analysis invalidation!

Dito

c) If any pass actually escapes a pointer from one function into another, we invalidate the underlying assumption of ‘non-address-taken’ that it relies upon.

Yep, all of these are pretty nasty.

Now, as I argued in my general AA thread, I think we might be able to assume that (c) doesn’t happen today. But both (a) and (b) seem like active nightmares to try to fix. I can see hacky ways to avoid (a) where we detect why GetUnderlyingObject fails, but I don’t see how to fix both (a) and (b) (or to fix (a) well) without just disabling this specific aspect of GloblasModRef.

Ok, but we need performance information to make sure this doesn’t cause a regression in practice for LTO builds. For example, Spec 2K and 2K6 are a reasonable place to start.

  1. Fix obvious issues with GloblasModRef and switch it to ValueHandles
  2. Mail out a patch to disable this part of GlobalsModRef. I can put it behind a flag or however folks would like it to work.
  3. Remove addEscapingUse() update API, which without #2 may regress some LTO test case I don’t have (because I don’t have any other than bootstrap)

Sounds great if we can do this without causing a regression in practice. Are you aware of any miscompiles that might be attributed to this, or are these “theoretical" concerns?

I don’t really have any way of benchmarking SPEC with LTO. If that’s a configuration that is super important to people, I’m hoping they’ll let me know about any particular performance issues.

We’ll get that data for ARM.

Ultimately, I’m happy making no performance impacting changes here. The thing is, I need to remove a broken abstraction (addEscapingUse) that was half-way added without any test case (literally, I have no test case that fails if I delete it. The regression tests don’t fail if I assert(0) in it). =/ I don’t want to do that if it causes miscompiles for folks that are relying on LTO. So my plan is to essentially post the patch that will fix the miscompiles but may regress performance. If the miscompiles show up for those relying on LTO and this (somewhat unsound) pass, they are in the best position to evaluate the cost/benefit of a performance reducing patch.

Honsetly, my hope is that it won’t even impact performance. But I have no realistic way to measure it in any useful way. And for my platforms, I would be perfectly happy to trade some performance for correctness here. That’s one of the reasons we’re not relying on LTO yet. But I understand that others in the community have more strict performance needs and so I’m doing what I can to give them options.

I wouldn’t be willing to give up performance for hypothetical issues. Please protect all your changes with options. For some of your concerns it is probably hard to provide a test case that shows an/the actual issue.

> From: "Xinliang David Li" <xinliangli@gmail.com>
> To: "Chandler Carruth" <chandlerc@gmail.com>
> Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>, "Hal Finkel" <
hfinkel@anl.gov>, "Justin Bogner"
> <mail@justinbogner.com>, "Duncan Exon Smith" <dexonsmith@apple.com>,
"Rafael Espíndola" <rafael.espindola@gmail.com>
> Sent: Tuesday, July 14, 2015 12:59:29 PM
> Subject: Re: [LLVMdev] GlobalsModRef (and thus LTO) is completely broken
>
>
>
>
>
>
>
>
>
> Ok folks,
>
>
> I wrote up the general high-level thoughts I have about stateful AA
> in a separate thread. But we need to sort out the completely and
> horribly broken aspects of GlobalsModRef today, and the practical
> steps forward. This email is totally about the practical stuff.
>
>
> Now, as to why I emailed this group of people and with this subject,
> the only pass pipeline that includes GlobalsModRef, is the LTO
> pipeline. So we have significantly less testing here than we do for
> stuff in the main pipeline. Also, I don't have any benchmarks I can
> effectively run to tell me if my changes impacted performance. =/ So
> I may need your help to evaluate some of this. Now, onto the
> challenges....
>
>
> First, GlobalsModRef as currently implemented completely abuses a
> loophole in the current pass manager to incorrectly stick around
> even while it is being "invalidated". I don't know of any way to fix
> this in the current pass manager without completely defeating the
> purpose of the analysis pass. The consequence is that whether passes
> claim to preserve AA or not is irrelevant, GlobalsModRef will be
> preserved anyways! =[[[[ So the only way to make things work
> correctly is to make GlobalsModRef survive *any* per-function
> changes to the IR. We cannot rely on AA updates at all.
>
>
> Most of the updates that GlobalsModRef needs can be provided by a
> ValueHandle now that we have them. This will prevent ABA-style
> issues in its caches, etc. I plan to send out a patch soon that
> switches it over to this strategy.
>
>
> It is also relying on a precomputed set of global variables whose
> address is never used by an instruction other than some very small
> set (gep, bitcast) as "non-address-taken". It then runs
> GetUnderlyingObject on the two pointers in alias queries, and if
> that finds one of these "non-address-taken" globals for one of the
> memory locations but not the other, it concludes no-alias! This is
> broken for a number of reasons.
>
>
> a) If the two locations merely have a different *depth* of
> instruction stack, because GetUnderlyingObject has a recursion cap,
> one side can fail while the other succeeds, and we erroneously
> produce no-alias.
>
>
> How about adding an optional argument to the interface to ignore
> limit?
>

We have this already.

>
> b) If instcombine or any other pass for any reason introduces on one
> path an instruction that GetUnderlyingObject can't look through
> (select, phi, load, ....), we incorrectly conclude no-alias. This is
> what addEscapingUse was intended to solve, but we would literally
> have to call it from every pass because we can't rely on analysis
> invalidation!
>
>
>
>
> I am not sure if this matters. If a pointer is loaded from the
> memory, then either pointer points to heap or the the underlying
> object is address taken. For the phi case, I wonder what
> transformation can introduce it while the original source construct
> does not escape/addr-take the global already.
>

The problem is determining whether both pointers derive from the same
global. For that, you need to track dependencies through loads/stores. You
could, however, just give up on those cases.

My point is that if access-1 has untractable/unanalyzable access pattern
involving load, it can not possibly come from a non-address taken global
variable.

> c) If any pass actually escapes a pointer from one function into
> another, we invalidate the underlying assumption of
> 'non-address-taken' that it relies upon.
>
>
> This can probably happen with function outlining etc.

Yes, exactly (and this is why I said we'd likely have such things in the
future). As Chandler pointed out, our instrumentation passes already do
this as well.

yes, tsan does that -- however LTOing instrumented program together with
runtime library source is a very unlikely scenario though.

David

>
> Ok folks,
>
> I wrote up the general high-level thoughts I have about stateful AA in
a separate thread. But we need to sort out the completely and horribly
broken aspects of GlobalsModRef today, and the practical steps forward.
This email is totally about the practical stuff.
>
> Now, as to why I emailed this group of people and with this subject,
the only pass pipeline that includes GlobalsModRef, is the LTO pipeline.

Ah, so it is just an LTO enabled benchmark hack then.

> It is also relying on a precomputed set of global variables whose
address is never used by an instruction other than some very small set
(gep, bitcast) as "non-address-taken". It then runs GetUnderlyingObject on
the two pointers in alias queries, and if that finds one of these
"non-address-taken" globals for one of the memory locations but not the
other, it concludes no-alias! This is broken for a number of reasons.
>
> a) If the two locations merely have a different *depth* of instruction
stack, because GetUnderlyingObject has a recursion cap, one side can fail
while the other succeeds, and we erroneously produce no-alias.

Interesting. I’m sure it is no consolation, but GlobalsModRef probably
predated the recursion cap :slight_smile:

Why not return the conservative result when the cap is hit?

> b) If instcombine or any other pass for any reason introduces on one
path an instruction that GetUnderlyingObject can't look through (select,
phi, load, ....), we incorrectly conclude no-alias. This is what
addEscapingUse was intended to solve, but we would literally have to call
it from every pass because we can't rely on analysis invalidation!

Dito

That will make it too conservative. The underlying assumption is that if
you can not track the origin of a pointer, the pointer can not point to a
non-address taken global var. The real question is whether a
transformation can turn a tractable access to a non-address taken global
into an unanalyzable form ?

David

Chandler,

Given you say explicitly that this only effects the LTO pipeline, I was curious if you thought this is an issue that we could skip past for the new pass manager work. Getting the normal optimization pass manager - which doesn't have this issue - working seems like a very reasonable first step. Even if we had to add some hack to the old pass manager - like say, separating out the problematic interface and making the few passes that use it go through hoops to get to GlobalsModRef specifically as opposed to any AA pass with the interface - that seems like a worthwhile tradeoff. Would this type of approach work? Or am I missing something?

Philip

Chandler,

Given you say explicitly that this only effects the LTO pipeline, I was curious if you thought this is an issue that we could skip past for the new pass manager work. Getting the normal optimization pass manager - which doesn’t have this issue - working seems like a very reasonable first step. Even if we had to add some hack to the old pass manager - like say, separating out the problematic interface and making the few passes that use it go through hoops to get to GlobalsModRef specifically as opposed to any AA pass with the interface - that seems like a worthwhile tradeoff. Would this type of approach work? Or am I missing something?

This is one of the other approaches I tried first. =/ It didn’t work well.

The problem is that we use inheritance for all aspects of managing AA in LLVM today. It both serves as the tool for composing different aspects of AA, the tool for composing different AA passes, and as the common interface that the rest of the optimizer accepts on its interface boundaries. The problem I ran into with just leaving GlobalsModRef alone is that I need to change the interface that is threaded through the rest of the optimizer to be compatible with what the new PM uses.

I think it may be possible to do this by introducing some pretty horrible hacks in the AliasAnalysis base class that allow it to essentially behave as a shim for the new pass manager or as the integral part of the old pass manager. But I think that’ll be pretty horrible, hard to craft, brittle, and might fall apart at any point as I’m going when I hit some aspect that breaks the trick. =/ So I’m hoping to not go this route if its at all possible.

Some early comments from folks in IRC benchmarking with GlobalsModRef are encouraging that it may not actually be a particularly problematic performance regression for any benchmarks to disable the broken bits here.

-Chandler

Replying here, but several of the questions raised boil down to “couldn’t you make the usage of GetUnderlyingObject conservatively correct?”. I’ll try and address that.

I think this is the right approach, but I think it is very hard to do without effectively disabling this part of GlobalsModRef. That is, the easy ways are likely to fire very frequently IMO.

The core idea is to detect a “no information” state coming out of GetUnderlyingObject (likely be providing a custom version just for GlobalsModRef and tailored to its use cases). This is particularly effective at avoiding the problems with the recursion limit. But let’s look at what cases we wouldn’t return that. Here are the cases I see when I thought about this last night with Hal, roughly in descending likelihood I would guess:

  1. We detect some global or an alloca. In that case, even BasicAA would be sufficient to provide no-alias. GMR shouldn’t be relevant.

  2. We detect a phi, select, or inttoptr, and stop there.

  3. We detect a load and stop there.

  4. We detect a return from a function.

  5. We detect an argument to the function.

I strongly suspect the vast majority of queries hit #1. That’s why BasicAA is so effective. Both #4 and #5 I think are actually reasonable places for GMR to potentially say “no-alias” and provide useful definitive information. But I also suspect these are the least common.

So let’s look at #2 and #3 because I think they’re interesting. For these, I think it is extremely hard to return “no-alias”. It seems extremely easy for a reasonable and innocuous change to the IR to introduce a phi or a select into one side of the GetUnderlyingObject but not the other. If that ever happens, we can’t return “no-alias” for #2, or we need to add really expensive updates. It also seems reasonable (if not as likely) to want adding a store and load to the IR to not trigger a miscompile. If it is possible for a valid optimization pass to do reg2mem on an SSA value, then that could happen to only one side of the paired GetUnderlyingObject and break GMR with #3. If that seems like an unreasonable thing to do, consider loop re-rolling or other transformations which may need to take things in SSA form at move them out of SSA form. Even if we then try immediately to put it back into SSA form, before we do that we create a point where GMR cannot correctly return no-alias.

So ultimately, I don’t think we want to rely on GMR returning “no-alias” for either #2 or #3 because of the challenge of actually updating it in all of the circumstances that could break them. That means that only #4 and #5 are going to return “no-alias” usefully. And even then, function inlining and function outlining both break #4 and #5, so you have to preclude those transforms while GMR is active. And I have serious doubts about these providing enough value to be worth the cost.

I think the better way to approach this is the other way around. Rather than doing a backwards analysis to see if one location reaches and global and the other location doesn’t reach a global, I think it would be much more effective to re-cast this as a forward analysis that determines all the memory locations in a function that come from outside the function, and use that to drive the no-alias responses.

I wouldn’t be willing to give up performance for hypothetical issues. Please protect all your changes with options. For some of your concerns it is probably hard to provide a test case that shows an/the actual issue.

I certainly agree that it will be very hard to provide a test case and extremely rare to see this in the wild for most of these issues. As long as I can remove the problematic update API we currently have (which as its an API change can’t really be put behind flags), I’m happy to have flags control whether or not GMR uses the unsound / stale information to try to answer alias queries. Do you have any opinion about what the default value of the flags should be?

I’ll go ahead and prepare the patches, as it seems like we’re all ending up in the same position, and just wondering about the precise tradeoffs we want to settle on.

I’ve fixed the obvious bugs I spotted in r242281. These should be pure correctness improvements.

I’ve sent the two patches I’m imagining to address the core issue here:
http://reviews.llvm.org/D11213

http://reviews.llvm.org/D11214

Currently, I have the unsafe alias results disabled by default, but with a flag that can re-enable them if needed. I don’t feel really strongly about which way the default is set – but that may be because I don’t have lots of users relying on LTO. I’ll let others indicate which way they would be most comfortable.

Some IRC conversations indicated that early benchmark results with GMR completely disabled weren’t showing really significant swings, so maybe this relatively small reduction in power of GMR won’t be too problematic for folks. Either way, I’m open to the different approaches. It’s D11214 that I care a lot about. =]

Thanks for all the thoughts here!
-Chandler

Hi Chandler,

I would like to run some benchmarks on ARM hardware and to look at impact of your patches on LTO.

Kind regards,

Evgeny Astigeevich

Hi Chandler,

I ran couple benchmarks with LTO turned on and your patches on ARM hardware.

There were no performance degradation of one benchmark and 2% slowdown of another benchmark.

Kind regards,

Evgeny Astigeevich

Hey, thanks for benchmarking.

How stable is the 2% regression?

Michael ran some benchmarks with GlobalsModRef completely disabled and the only differences were in the noise. This was a complete spec2k6 run along with some others. Based on the number of benchmarks run there, I’m going to go ahead and submit these patches, but if you can clarify the impact here, we can look at potentially some other tradeoff. I’m not particularly set on one set of defaults, etc, I just don’t want to keep patches held up based on that. We can flip the default back and forth as new data arrives.

I’ve fixed the obvious bugs I spotted in r242281. These should be pure correctness improvements.

I’ve sent the two patches I’m imagining to address the core issue here:
http://reviews.llvm.org/D11213

http://reviews.llvm.org/D11214

Currently, I have the unsafe alias results disabled by default, but with a flag that can re-enable them if needed. I don’t feel really strongly about which way the default is set – but that may be because I don’t have lots of users relying on LTO. I’ll let others indicate which way they would be most comfortable.

Some IRC conversations indicated that early benchmark results with GMR completely disabled weren’t showing really significant swings, so maybe this relatively small reduction in power of GMR won’t be too problematic for folks. Either way, I’m open to the different approaches. It’s D11214 that I care a lot about. =]

Just FYI, I’m landing these patches based on reviews and some initial benchmarking. There is one report of some trouble on this thread though that after investigation may need further tweaks though. Just wanted to close the loop that this is moving forward based on the general sentiment of the thread. Happy to keep discussing and revising if more concerns come up!

-Chandler