[PM] I think that the new PM needs to learn about inter-analysis dependencies...

With D21921 and a couple other pending patches, we now have the full LTO pipeline converted to the new PM. I was trying it out on test-suite and SPEC2006. Yay!

This email is about one issue that I ran into testing the pipeline on test-suite. The issue arose in the wild as an interaction between lcssa and gvn. But the issue is extremely general.

What happened is that BasicAA uses TargetLibraryInfo. If it makes a change, then LCSSA marks BasicAA as preserved (if it made no change, it would preserve all). The new PM then invalidates everything not marked as preserved by LCSSA. So it does not invalidate BasicAA but it does invalidate TargetLibraryInfo. Now BasicAA is holding dangling pointers to TargetLibraryInfo. GVN then goes to query BasicAA and things explode.

I don’t think this is going to be maintainable. Essentially, it requires all passes to be aware of the transitive dependencies of every analysis they preserve and manually preserve all dependencies. And if an analysis A starts using a new analysis B, then every pass that preserves a pass that transitively depends on A must be updated or else there are use-after-free bugs. Not to mention out-of-tree passes.

Perhaps the worst part is that without some sort of transitive preservation (or the opposite, transitive invalidation (with the transitivity in the opposite direction)) these sorts of bugs are a dynamic property of both the pass pipeline and the code being run on. For example, the reproducer I reduced for this particular bug involved a situation where:

  1. A pass had to preserve BasicAA
  2. lcssa would make a change and thus only preserve a subset of passes (if it made no changes it would have preserved all)
  3. then LICM and MergedLoadStoreMotion had to run and make no changes (and hence preserve all).
  4. then GVN had to run and query BasicAA

(presumably LICM or MergedLoadStoreMotion didn’t make a query to BasicAA, or that query ended up not accessing the dangling TargetLibraryInfo).

How should we solve this? I see two potential solutions:

  1. Analyses must somehow list the analyses they depend on (either by overriding “invalidate” to make sure that they invalidate them, or something “declarative” that would allow the AnalysisManager to walk the transitive dependencies).
  2. The AnalysisManager must do a somewhat complicated dance to track when analyses call back into it in order to get other analyses.

Any other ideas? Any thoughts about how best to fix this?

For those interested, a reduced test case is
/home/sean/pg/release-asan/bin/opt -debug-pass-manager ‘-passes=require,invalidate,gvn’ -aa-pipeline=basic-aa $1 -disable-output

target datalayout = “e-m:e-i64:64-f80:128-n8:16:32:64-S128”
target triple = “x86_64-unknown-linux-gnu”

declare void @foo(i32)

define void @sendMTFValues(i32 %arg) {
entry:
call void @foo(i32 0)
%0 = load i32, i32
%arg
call void @foo(i32 %0)
ret void
}

This was reduced out of 401.bzip2, but I saw many other failures in the test-suite due to this issue (for some reason they seem to manifest as the “!Scanned” assertion in AssumptionCache::scanFunction or a ValueHandle issue in AssumptionCache (nondeterministically)).

– Sean Silva

From: "Sean Silva" <chisophugis@gmail.com>
To: "llvm-dev" <llvm-dev@lists.llvm.org>
Cc: "Chandler Carruth" <chandlerc@gmail.com>, "Davide Italiano"
<dccitaliano@gmail.com>, "David Li" <davidxl@google.com>, "Tim Amini
Golling" <mehdi.amini@apple.com>, "Hal Finkel" <hfinkel@anl.gov>,
"Sanjoy Das" <sanjoy@playingwithpointers.com>, "Pete Cooper"
<peter_cooper@apple.com>
Sent: Tuesday, July 12, 2016 8:15:22 PM
Subject: [PM] I think that the new PM needs to learn about
inter-analysis dependencies...

With D21921 and a couple other pending patches, we now have the full
LTO pipeline converted to the new PM. I was trying it out on
test-suite and SPEC2006. Yay!

Nice :slight_smile:

This email is about one issue that I ran into testing the pipeline on
test-suite. The issue arose in the wild as an interaction between
lcssa and gvn. But the issue is extremely general.

What happened is that BasicAA uses TargetLibraryInfo. If it makes a
change, then LCSSA marks BasicAA as preserved (if it made no change,
it would preserve all). The new PM then invalidates everything not
marked as preserved by LCSSA. So it does not invalidate BasicAA but
it does invalidate TargetLibraryInfo. Now BasicAA is holding
dangling pointers to TargetLibraryInfo. GVN then goes to query
BasicAA and things explode.

I don't think this is going to be maintainable. Essentially, it
requires all passes to be aware of the transitive dependencies of
every analysis they preserve and manually preserve all dependencies.
And if an analysis A starts using a new analysis B, then every pass
that preserves a pass that transitively depends on A must be updated
or else there are use-after-free bugs. Not to mention out-of-tree
passes.

Perhaps the worst part is that without some sort of transitive
preservation (or the opposite, transitive invalidation (with the
transitivity in the opposite direction)) these sorts of bugs are a
dynamic property of both the pass pipeline and the code being run
on. For example, the reproducer I reduced for this particular bug
involved a situation where:
1. A pass had to preserve BasicAA
2. lcssa would make a change and thus only preserve a subset of
passes (if it made no changes it would have preserved all)
2. then LICM and MergedLoadStoreMotion had to run and make no changes
(and hence preserve all).
3. then GVN had to run and query BasicAA

(presumably LICM or MergedLoadStoreMotion didn't make a query to
BasicAA, or that query ended up not accessing the dangling
TargetLibraryInfo).

How should we solve this? I see two potential solutions:
1. Analyses must somehow list the analyses they depend on (either by
overriding "invalidate" to make sure that they invalidate them, or
something "declarative" that would allow the AnalysisManager to walk
the transitive dependencies).
2. The AnalysisManager must do a somewhat complicated dance to track
when analyses call back into it in order to get other analyses.

Any other ideas? Any thoughts about how best to fix this?

I think that (2) is the right answer, but I'm not sure that the dance needs to be all that complicated. Each analysis can contain a list (small vector) of other analysis that depend on it, and the interface to get an analysis can take a pointer to add to this list. When an analysis is invalidated, the manager can queue the invalidation of the others in the list.

Thanks again,
Hal

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses which hold references to other analyses. While this isn’t unheard of, it isn’t as common as it could be. Still, definitely something we need to address.

Some ideas about mitigating and fixing it below.

How should we solve this? I see two potential solutions:

  1. Analyses must somehow list the analyses they depend on (either by overriding “invalidate” to make sure that they invalidate them, or something “declarative” that would allow the AnalysisManager to walk the transitive dependencies).

I think this is the right approach. I would personally start by overriding the invalidate callback everywhere that it is necessary, and see how bad that becomes.

If it becomes common and burdensome, then we can change the way invalidation works such that the analysis manager is aware of the preserved analysis set in more detail, and have it build up the necessary data structures to know in-advance whether it must make an explicit invalidate call.

However, I suspect this may not be too bad for two reasons:

a) As I mentioned above, I’m hoping there aren’t too many handles between different analyses. But I’ve not done a careful examination, so we can check this.

b) For many analyses that might trigger this, I think we have a simpler option. If the analysis is immutable for any reason – that is, it overrides its invalidate routine to always return “false” the way TargetLibraryInfo should (although I’m not sure it does currently), we shouldn’t need to do this as it shouldn’t be getting cleared out. Does this make sense? Do others see anything I’m missing with that approach?

  1. The AnalysisManager must do a somewhat complicated dance to track when analyses call back into it in order to get other analyses.

I would really rather avoid this, as currently the analysis manager’s logic here is very simple, and in many cases we only need the analyses to compute our result, not to embed it. I’m tihnking of stuff like Dominators is used to build LoopInfo, but there isn’t a stale handle there.

There is another aspect of course in that if something is preserving LoopInfo, it really should be preserving Dominators too…

------------------------------

*From: *"Sean Silva" <chisophugis@gmail.com>
*To: *"llvm-dev" <llvm-dev@lists.llvm.org>
*Cc: *"Chandler Carruth" <chandlerc@gmail.com>, "Davide Italiano" <
dccitaliano@gmail.com>, "David Li" <davidxl@google.com>, "Tim Amini
Golling" <mehdi.amini@apple.com>, "Hal Finkel" <hfinkel@anl.gov>, "Sanjoy
Das" <sanjoy@playingwithpointers.com>, "Pete Cooper" <
peter_cooper@apple.com>
*Sent: *Tuesday, July 12, 2016 8:15:22 PM
*Subject: *[PM] I think that the new PM needs to learn about
inter-analysis dependencies...

With D21921 and a couple other pending patches, we now have the full LTO
pipeline converted to the new PM. I was trying it out on test-suite and
SPEC2006. Yay!

Nice :slight_smile:

This email is about one issue that I ran into testing the pipeline on
test-suite. The issue arose in the wild as an interaction between lcssa and
gvn. But the issue is extremely general.

What happened is that BasicAA uses TargetLibraryInfo. If it makes a
change, then LCSSA marks BasicAA as preserved (if it made no change, it
would preserve all). The new PM then invalidates everything not marked as
preserved by LCSSA. So it does not invalidate BasicAA but it does
invalidate TargetLibraryInfo. Now BasicAA is holding dangling pointers to
TargetLibraryInfo. GVN then goes to query BasicAA and things explode.

I don't think this is going to be maintainable. Essentially, it requires
all passes to be aware of the transitive dependencies of every analysis
they preserve and manually preserve all dependencies. And if an analysis A
starts using a new analysis B, then every pass that preserves a pass that
transitively depends on A must be updated or else there are use-after-free
bugs. Not to mention out-of-tree passes.

Perhaps the worst part is that without some sort of transitive
preservation (or the opposite, transitive invalidation (with the
transitivity in the opposite direction)) these sorts of bugs are a dynamic
property of both the pass pipeline and the code being run on. For example,
the reproducer I reduced for this particular bug involved a situation where:
1. A pass had to preserve BasicAA
2. lcssa would make a change and thus only preserve a subset of passes (if
it made no changes it would have preserved all)
2. then LICM and MergedLoadStoreMotion had to run and make no changes (and
hence preserve all).
3. then GVN had to run and query BasicAA

(presumably LICM or MergedLoadStoreMotion didn't make a query to BasicAA,
or that query ended up not accessing the dangling TargetLibraryInfo).

How should we solve this? I see two potential solutions:
1. Analyses must somehow list the analyses they depend on (either by
overriding "invalidate" to make sure that they invalidate them, or
something "declarative" that would allow the AnalysisManager to walk the
transitive dependencies).
2. The AnalysisManager must do a somewhat complicated dance to track when
analyses call back into it in order to get other analyses.

Any other ideas? Any thoughts about how best to fix this?

I think that (2) is the right answer, but I'm not sure that the dance
needs to be all that complicated. Each analysis can contain a list (small
vector) of other analysis that depend on it, and the interface to get an
analysis can take a pointer to add to this list. When an analysis is
invalidated, the manager can queue the invalidation of the others in the
list.

At least with the current arrangement, there is not just one
AnalysisManager, but rather one per IRUnitT (with the
<X>AnalysisManager<Y>Proxy analyses to go between them). So a pointer to
the analysis is not enough; you also need a pointer to the AnalysisManager
that is caching the result of that analysis. Since the invalidation method
is actually typed to take the IRUnitT, you now need a way to store a
reference to the other analysis manager in a type-erased way (since e.g. a
function analysis can depend on a module analysis or vice versa).

It's sort of hairy and potentially requires quite a bit of storage
(relative to the other PM data structures). E.g. a module analysis that
queries various function analyses on each function will create
O(NumFunctionPassesUsed * NumFunctionsInModule) back pointers to itself.
Also, what happens if the module analysis is invalidated? Do we need to
store even more pointers from the module analysis back to the function
analyses so that it can "unregister" itself as a dependency?

Having a single AnalysisManager that stores analyses for all the different
IRUnitT's might somewhat simplify this, but it would still be pretty
complicated I think.

-- Sean Silva

Yea, I largely agree. I’d try out overriding the invalidate method and triggering on the necessary set first, and see how that goes before we go down this road.

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses
which hold references to other analyses. While this isn't unheard of, it
isn't as common as it could be. Still, definitely something we need to
address.

We can call this type of dependencies (holding references) hard-dependency.
The soft dependency refers to the case where analysis 'A' depends on 'B'
during computation, but does not need 'B' once it is computed.

There are actually quite a few examples of hard-dependency case. For
instance LoopAccessInfo, LazyValueInfo etc which hold references to other
analyses.

Problem involving hard-dependency is actually easier to detect, as it is
usually a compile time problem. Issues involving soft dependencies are more
subtle and can lead to wrong code gen.

David

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses
which hold references to other analyses. While this isn't unheard of, it
isn't as common as it could be. Still, definitely something we need to
address.

We can call this type of dependencies (holding references)
hard-dependency. The soft dependency refers to the case where analysis 'A'
depends on 'B' during computation, but does not need 'B' once it is
computed.

There are actually quite a few examples of hard-dependency case. For
instance LoopAccessInfo, LazyValueInfo etc which hold references to other
analyses.

Problem involving hard-dependency is actually easier to detect, as it is
usually a compile time problem. Issues involving soft dependencies are more
subtle and can lead to wrong code gen.

Did you mean to say that soft-dependency problems are easier to detect? At
least my intuition is that soft-dependency is easier because there is no
risk of dangling pointers to other analyses.

-- Sean Silva

The issue is that the fact that there is any dependency isn’t clear.

However, I think the only real problem here are these “hard dependencies” (I don’t really like that term though). For others, only an analysis that is explicitly preserved survives. So I’m not worried about the fact that people have to remember this.

The question is how often there are cross-data-structure references. David mentions a few examples, and I’m sure there are more, but it isn’t clear to me yet whether this is pervasive or occasional.

And even then it isn’t clear how onerous explicitly managing this in invalidate overrides will be.

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses
which hold references to other analyses. While this isn't unheard of, it
isn't as common as it could be. Still, definitely something we need to
address.

We can call this type of dependencies (holding references)
hard-dependency. The soft dependency refers to the case where analysis 'A'
depends on 'B' during computation, but does not need 'B' once it is
computed.

There are actually quite a few examples of hard-dependency case. For
instance LoopAccessInfo, LazyValueInfo etc which hold references to other
analyses.

Problem involving hard-dependency is actually easier to detect, as it is
usually a compile time problem. Issues involving soft dependencies are more
subtle and can lead to wrong code gen.

Did you mean to say that soft-dependency problems are easier to detect?
At least my intuition is that soft-dependency is easier because there is no
risk of dangling pointers to other analyses.

The issue is that the fact that there is *any* dependency isn't clear.

However, I think the only real problem here are these "hard dependencies"
(I don't really like that term though). For others, only an analysis that
is *explicitly* preserved survives. So I'm not worried about the fact that
people have to remember this.

The question is how often there are cross-data-structure references. David
mentions a few examples, and I'm sure there are more, but it isn't clear to
me yet whether this is pervasive or occasional.

I just did a quick run-through of PassRegistry.def and this is what I found:

Module analyses: 0/5 hold pointers to other analyses
CallGraph: No pointers to other analyses.
LazyCallGraph: No pointers to other analyses.
ProfileSummaryAnalysis: No pointers to other analyses.
TargetLibraryAnalysis: No pointers to other analyses.
VerifierAnalysis: No pointers to other analyses.

Module alias analyses: 1/1 keeps pointer to other analysis.
GlobalsAA: Result keeps pointer to TLI (this is a function analysis).

Function analyses: 9/17 keep pointers to other analysis
AAManager: Its Result holds TLI pointer and pointers to individual AA
result objects.
AssumptionAnalysis: No pointers to other analyses.
BlockFrequencyAnalysis: Its Result holds pointers to LoopInfo and BPI.
BranchProbabilityAnalysis: Stores no pointers to other analyses. (uses
LoopInfo to "recalculate" though)
DominatorTreeAnalysis: Stores no pointers to other analyses.
PostDominatorTreeAnalysis: Stores no pointers to other analyses.
DemandedBitsAnalysis: Stores pointers to AssumptionCache and DominatorTree
DominanceFrontierAnalysis: Stores no pointers to other analyses.
(uses DominatorTreeAnalysis for "recalculate" though).
LoopInfo: Uses DominatorTreeAnalysis for "recalculate" but stores no
pointers.
LazyValueAnalysis: Stores pointers to AssumptionCache, TargetLibraryInfo,
DominatorTree.
DependenceAnalysis: Stores pointers to AliasAnalysis, ScalarEvolution,
LoopInfo
MemoryDependenceAnalysis: Stores pointers to AliasAnalysis,
AssumptionCache, TargetLibraryInfo, DominatorTree
MemorySSAAnalysis: Stores pointers to AliasAnalysis, DominatorTree
RegionInfoAnalysis: Stores pointers to DomTree, PostDomTree, DomFrontier
ScalarEvolutionAnalysis: Stores pointers to TargetLibraryInfo,
AssumptionCache, DominatorTree, LoopInfo
TargetLibraryAnalysis: Has no dependencies
TargetIRAnalysis: Has no dependencies.

Function alias analyses: 3/5 keep pointers to other analyses
BasicAA: Keeps pointers to TargetLibraryInfo, AssumptionCache,
DominatorTree, LoopInfo
CFLAA: Keeps pointer to TargetLibraryInfo
SCEVAA: Keeps pointer to ScalarEvolution
ScopedNoAliasAA: No dependencies
TypeBasedAA: No dependencies

Total: 13/28 analyses (~50%) hold pointers to other analyses.
Of the 15/28 analyses that don't hold pointers, 12/15 simply have no
dependencies. Only 3/15 (BPI, LoopInfo, DominanceFrontier) have
dependencies that are used just for a "recalculate" step that retains no
pointers.
So I think it is fair to say that analyses which hold pointers to other
analyses is not an exceptional case. In fact, analyses that use other
analyses just for a "recalculate" step seems to be the exceptional case
(only 3/28 or about 10%)

Since I like to visualize things, here is a quick graph of the dependencies
between analyses which hold pointers to each other.
Edge A -> B indicates "A can/does hold a pointer to B".
dot file: http://reviews.llvm.org/P6603
rendering: http://reviews.llvm.org/F2161058

(I've been a bit loose with terminology here. A lot of the times that I say
"analysis" I mean "the analysis' result object" or I use the name of the
analysis interchangeably with the analysis result object)

-- Sean Silva

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses
which hold references to other analyses. While this isn't unheard of, it
isn't as common as it could be. Still, definitely something we need to
address.

Some ideas about mitigating and fixing it below.

How should we solve this? I see two potential solutions:
1. Analyses must somehow list the analyses they depend on (either by
overriding "invalidate" to make sure that they invalidate them, or
something "declarative" that would allow the AnalysisManager to walk the
transitive dependencies).

I think this is the right approach. I would personally start by overriding
the invalidate callback everywhere that it is necessary, and see how bad
that becomes.

If it becomes common and burdensome, then we can change the way
invalidation works such that the analysis manager is aware of the preserved
analysis set in more detail, and have it build up the necessary data
structures to know in-advance whether it must make an explicit invalidate
call.

However, I suspect this may not be *too* bad for two reasons:

a) As I mentioned above, I'm hoping there aren't *too* many handles
between different analyses. But I've not done a careful examination, so we
can check this.

I've replied to one of the posts downthread with some info about this.

b) For many analyses that might trigger this, I think we have a simpler
option. If the analysis is *immutable* for any reason -- that is, it
overrides its invalidate routine to always return "false" the way
TargetLibraryInfo should (although I'm not sure it does currently),

TargetLibraryAnalysis has `run` overloaded for both Module and Function,
but TargetLibraryInfo only has the "return false" invalidate override for
Module. Probably just an oversight.

we shouldn't need to do this as it shouldn't be getting cleared out. Does

this make sense? Do others see anything I'm missing with that approach?

That makes sense, but I think this only applies to a relatively small
subset of analyses based on my run-through of PassRegistry.def.
TargetLibraryAnalysis and TargetIRAnalysis I think are the only ones.

-- Sean Silva

Interesting!

Most of these look like they hold a pointer to the root analysis as opposed to detailed objects inside the analysis?

It might make sense to try to handle this very specific pattern in a special way of overriding the invalidate routines is too error prone… We could try to make this work “automatically” but I’m worried this would be challenging to get right. Open to suggestions of course.

Any other ideas about what would make sense to handle this?

Does it make sense to override the invalidate routines now and iterate from there? I feel like you’ve done a lot of the research necessary for this already…

From: "Sean Silva" <chisophugis@gmail.com>
To: "Chandler Carruth" <chandlerc@gmail.com>
Cc: "Xinliang David Li" <davidxl@google.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "Davide Italiano"
<dccitaliano@gmail.com>, "Tim Amini Golling"
<mehdi.amini@apple.com>, "Hal Finkel" <hfinkel@anl.gov>, "Sanjoy
Das" <sanjoy@playingwithpointers.com>, "Pete Cooper"
<peter_cooper@apple.com>
Sent: Wednesday, July 13, 2016 2:25:52 AM
Subject: Re: [PM] I think that the new PM needs to learn about
inter-analysis dependencies...

>

> >
>

> > > > Yea, this is a nasty problem.
> > >
> >
>

> > > > One important thing to understand is that this is specific to
> > > > analyses which hold references to other analyses. While this
> > > > isn't
> > > > unheard of, it isn't as common as it could be. Still,
> > > > definitely
> > > > something we need to address.
> > >
> >
>

> > > We can call this type of dependencies (holding references)
> > > hard-dependency. The soft dependency refers to the case where
> > > analysis 'A' depends on 'B' during computation, but does not
> > > need
> > > 'B' once it is computed.
> >
>

> > > There are actually quite a few examples of hard-dependency
> > > case.
> > > For
> > > instance LoopAccessInfo, LazyValueInfo etc which hold
> > > references
> > > to
> > > other analyses.
> >
>

> > > Problem involving hard-dependency is actually easier to detect,
> > > as
> > > it
> > > is usually a compile time problem. Issues involving soft
> > > dependencies are more subtle and can lead to wrong code gen.
> >
>

> > Did you mean to say that soft-dependency problems are easier to
> > detect? At least my intuition is that soft-dependency is easier
> > because there is no risk of dangling pointers to other analyses.
>

> The issue is that the fact that there is *any* dependency isn't
> clear.

> However, I think the only real problem here are these "hard
> dependencies" (I don't really like that term though). For others,
> only an analysis that is *explicitly* preserved survives. So I'm
> not
> worried about the fact that people have to remember this.

> The question is how often there are cross-data-structure
> references.
> David mentions a few examples, and I'm sure there are more, but it
> isn't clear to me yet whether this is pervasive or occasional.

I just did a quick run-through of PassRegistry.def and this is what I
found:

Module analyses: 0/5 hold pointers to other analyses
CallGraph: No pointers to other analyses.
LazyCallGraph: No pointers to other analyses.
ProfileSummaryAnalysis: No pointers to other analyses.
TargetLibraryAnalysis: No pointers to other analyses.

VerifierAnalysis: No pointers to other analyses.

Module alias analyses: 1/1 keeps pointer to other analysis.
GlobalsAA: Result keeps pointer to TLI (this is a function analysis).

Function analyses: 9/17 keep pointers to other analysis
AAManager: Its Result holds TLI pointer and pointers to individual AA
result objects.
AssumptionAnalysis: No pointers to other analyses.

BlockFrequencyAnalysis: Its Result holds pointers to LoopInfo and
BPI.

BranchProbabilityAnalysis: Stores no pointers to other analyses.
(uses LoopInfo to "recalculate" though)

DominatorTreeAnalysis: Stores no pointers to other analyses.

PostDominatorTreeAnalysis: Stores no pointers to other analyses.
DemandedBitsAnalysis: Stores pointers to AssumptionCache and
DominatorTree

DominanceFrontierAnalysis: Stores no pointers to other analyses.
(uses DominatorTreeAnalysis for "recalculate" though).

LoopInfo: Uses DominatorTreeAnalysis for "recalculate" but stores no
pointers.

LazyValueAnalysis: Stores pointers to AssumptionCache,
TargetLibraryInfo, DominatorTree.

DependenceAnalysis: Stores pointers to AliasAnalysis,
ScalarEvolution, LoopInfo
MemoryDependenceAnalysis: Stores pointers to AliasAnalysis,
AssumptionCache, TargetLibraryInfo, DominatorTree

MemorySSAAnalysis: Stores pointers to AliasAnalysis, DominatorTree

RegionInfoAnalysis: Stores pointers to DomTree, PostDomTree,
DomFrontier

ScalarEvolutionAnalysis: Stores pointers to TargetLibraryInfo,
AssumptionCache, DominatorTree, LoopInfo
TargetLibraryAnalysis: Has no dependencies

TargetIRAnalysis: Has no dependencies.

Function alias analyses: 3/5 keep pointers to other analyses
BasicAA: Keeps pointers to TargetLibraryInfo, AssumptionCache,
DominatorTree, LoopInfo
CFLAA: Keeps pointer to TargetLibraryInfo
SCEVAA: Keeps pointer to ScalarEvolution
ScopedNoAliasAA: No dependencies

TypeBasedAA: No dependencies

Total: 13/28 analyses (~50%) hold pointers to other analyses.
Of the 15/28 analyses that don't hold pointers, 12/15 simply have no
dependencies. Only 3/15 (BPI, LoopInfo, DominanceFrontier) have
dependencies that are used just for a "recalculate" step that
retains no pointers.
So I think it is fair to say that analyses which hold pointers to
other analyses is not an exceptional case. In fact, analyses that
use other analyses just for a "recalculate" step seems to be the
exceptional case (only 3/28 or about 10%)

Interesting. I'm not sure this is the right metric, however. There are lots of analyses that hold pointers to other analyses but don't need to. The analysis handle itself can be reacquired lazily if we care to do so. What's truly problematic is holding pointers into another analysis's data structures. To be concrete, holding a pointer to ScalarEvolution is not a fundamental problem because we could make the analysis reacquire the pointer at the start of every query. Holding SCEV* is the problem.

FWIW, I still think this is common enough to design a solution that makes it easy to get this right.

-Hal

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses
which hold references to other analyses. While this isn't unheard of, it
isn't as common as it could be. Still, definitely something we need to
address.

We can call this type of dependencies (holding references)
hard-dependency. The soft dependency refers to the case where analysis 'A'
depends on 'B' during computation, but does not need 'B' once it is
computed.

There are actually quite a few examples of hard-dependency case. For
instance LoopAccessInfo, LazyValueInfo etc which hold references to other
analyses.

Problem involving hard-dependency is actually easier to detect, as it is
usually a compile time problem. Issues involving soft dependencies are more
subtle and can lead to wrong code gen.

Did you mean to say that soft-dependency problems are easier to detect? At
least my intuition is that soft-dependency is easier because there is no
risk of dangling pointers to other analyses.

I meant it is harder to detect. If 'A' soft-depends on 'B', when 'B' gets
invalidated, but 'A' survives (can be used without compile time problem
such as dangling pointer) -- we don't really know if 'A' is in fact still
in valid state -- as it may need to be recalculated too.

David

From: "Chandler Carruth" <chandlerc@gmail.com>
To: "Sean Silva" <chisophugis@gmail.com>
Cc: "Xinliang David Li" <davidxl@google.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "Davide Italiano"
<dccitaliano@gmail.com>, "Tim Amini Golling"
<mehdi.amini@apple.com>, "Hal Finkel" <hfinkel@anl.gov>, "Sanjoy
Das" <sanjoy@playingwithpointers.com>, "Pete Cooper"
<peter_cooper@apple.com>
Sent: Wednesday, July 13, 2016 2:34:35 AM
Subject: Re: [PM] I think that the new PM needs to learn about
inter-analysis dependencies...

>

> >
>

> > >
> >
>

> > > > > Yea, this is a nasty problem.
> > > >
> > >
> >
>

> > > > > One important thing to understand is that this is specific
> > > > > to
> > > > > analyses which hold references to other analyses. While
> > > > > this
> > > > > isn't
> > > > > unheard of, it isn't as common as it could be. Still,
> > > > > definitely
> > > > > something we need to address.
> > > >
> > >
> >
>

> > > > We can call this type of dependencies (holding references)
> > > > hard-dependency. The soft dependency refers to the case where
> > > > analysis 'A' depends on 'B' during computation, but does not
> > > > need
> > > > 'B' once it is computed.
> > >
> >
>

> > > > There are actually quite a few examples of hard-dependency
> > > > case.
> > > > For
> > > > instance LoopAccessInfo, LazyValueInfo etc which hold
> > > > references
> > > > to
> > > > other analyses.
> > >
> >
>

> > > > Problem involving hard-dependency is actually easier to
> > > > detect,
> > > > as
> > > > it
> > > > is usually a compile time problem. Issues involving soft
> > > > dependencies are more subtle and can lead to wrong code gen.
> > >
> >
>

> > > Did you mean to say that soft-dependency problems are easier to
> > > detect? At least my intuition is that soft-dependency is easier
> > > because there is no risk of dangling pointers to other
> > > analyses.
> >
>

> > The issue is that the fact that there is *any* dependency isn't
> > clear.
>

> > However, I think the only real problem here are these "hard
> > dependencies" (I don't really like that term though). For others,
> > only an analysis that is *explicitly* preserved survives. So I'm
> > not
> > worried about the fact that people have to remember this.
>

> > The question is how often there are cross-data-structure
> > references.
> > David mentions a few examples, and I'm sure there are more, but
> > it
> > isn't clear to me yet whether this is pervasive or occasional.
>

> I just did a quick run-through of PassRegistry.def and this is what
> I
> found:

> Module analyses: 0/5 hold pointers to other analyses

> CallGraph: No pointers to other analyses.

> LazyCallGraph: No pointers to other analyses.

> ProfileSummaryAnalysis: No pointers to other analyses.

> TargetLibraryAnalysis: No pointers to other analyses.

> VerifierAnalysis: No pointers to other analyses.

> Module alias analyses: 1/1 keeps pointer to other analysis.

> GlobalsAA: Result keeps pointer to TLI (this is a function
> analysis).

> Function analyses: 9/17 keep pointers to other analysis

> AAManager: Its Result holds TLI pointer and pointers to individual
> AA
> result objects.

> AssumptionAnalysis: No pointers to other analyses.

> BlockFrequencyAnalysis: Its Result holds pointers to LoopInfo and
> BPI.

> BranchProbabilityAnalysis: Stores no pointers to other analyses.
> (uses LoopInfo to "recalculate" though)

> DominatorTreeAnalysis: Stores no pointers to other analyses.

> PostDominatorTreeAnalysis: Stores no pointers to other analyses.

> DemandedBitsAnalysis: Stores pointers to AssumptionCache and
> DominatorTree

> DominanceFrontierAnalysis: Stores no pointers to other analyses.
> (uses DominatorTreeAnalysis for "recalculate" though).

> LoopInfo: Uses DominatorTreeAnalysis for "recalculate" but stores
> no
> pointers.

> LazyValueAnalysis: Stores pointers to AssumptionCache,
> TargetLibraryInfo, DominatorTree.

> DependenceAnalysis: Stores pointers to AliasAnalysis,
> ScalarEvolution, LoopInfo

> MemoryDependenceAnalysis: Stores pointers to AliasAnalysis,
> AssumptionCache, TargetLibraryInfo, DominatorTree

> MemorySSAAnalysis: Stores pointers to AliasAnalysis, DominatorTree

> RegionInfoAnalysis: Stores pointers to DomTree, PostDomTree,
> DomFrontier

> ScalarEvolutionAnalysis: Stores pointers to TargetLibraryInfo,
> AssumptionCache, DominatorTree, LoopInfo

> TargetLibraryAnalysis: Has no dependencies

> TargetIRAnalysis: Has no dependencies.

> Function alias analyses: 3/5 keep pointers to other analyses

> BasicAA: Keeps pointers to TargetLibraryInfo, AssumptionCache,
> DominatorTree, LoopInfo

> CFLAA: Keeps pointer to TargetLibraryInfo

> SCEVAA: Keeps pointer to ScalarEvolution

> ScopedNoAliasAA: No dependencies

> TypeBasedAA: No dependencies

> Total: 13/28 analyses (~50%) hold pointers to other analyses.

> Of the 15/28 analyses that don't hold pointers, 12/15 simply have
> no
> dependencies. Only 3/15 (BPI, LoopInfo, DominanceFrontier) have
> dependencies that are used just for a "recalculate" step that
> retains no pointers.

> So I think it is fair to say that analyses which hold pointers to
> other analyses is not an exceptional case. In fact, analyses that
> use other analyses just for a "recalculate" step seems to be the
> exceptional case (only 3/28 or about 10%)

Interesting!

Most of these look like they hold a pointer to the root analysis as
opposed to detailed objects *inside* the analysis?

It might make sense to try to handle this very specific pattern in a
special way of overriding the invalidate routines is too error
prone.... We could try to make this work "automatically" but I'm
worried this would be challenging to get right. Open to suggestions
of course.

I think that Sean laid out pretty clearly in response to my initial comment what would be required, in terms of infrastructure (type-erased invalidation objects and a registration scheme to cancel them).

Regardless of how we move forward here, we should probably try to limit the scope of the problem by, as a matter of policy, only having passes that hold pointers to analysis-generated data structures (e.g. SCEV*) hold pointers to the analysis objects themselves (in between queries).

Any other ideas about what would make sense to handle this?

Does it make sense to override the invalidate routines now and
iterate from there?

I'm happy to see what this looks like.

Thanks again,
Hal

Interesting. I’m not sure this is the right metric, however. There are lots of analyses that hold pointers to other analyses but don’t need to. The analysis handle itself can be reacquired lazily if we care to do so. What’s truly problematic is holding pointers into another analysis’s data structures. To be concrete, holding a pointer to ScalarEvolution is not a fundamental problem because we could make the analysis reacquire the pointer at the start of every query. Holding SCEV* is the problem.

Agreed. I suspect this is the dominant pattern.

FWIW, I still think this is common enough to design a solution that makes it easy to get this right.

I somewhat suspect that it would be better to pass these in along the query path rather than have results cache them. The reason is that if the user of the analysis is aware of the other analyses it depends on, it will also be aware of that cost and the reality of potentially wanting to preserve them.

I’d be interested in us spending some time trying to make this refactoring and at least understanding how hard it is before we engineer a solution that makes the analysis management more complex.

So as you seemed to agree in your other email (sorry for the heavily branching thread) perhaps we should start with the potentially error prone approach now, and see how much iteration can move toward “nothing to do” by lifting things into the query path rather than the result object.

From: "Chandler Carruth" <chandlerc@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>, "Sean Silva"
<chisophugis@gmail.com>
Cc: "Xinliang David Li" <davidxl@google.com>, "llvm-dev"
<llvm-dev@lists.llvm.org>, "Davide Italiano"
<dccitaliano@gmail.com>, "Tim Amini Golling"
<mehdi.amini@apple.com>, "Sanjoy Das"
<sanjoy@playingwithpointers.com>, "Pete Cooper"
<peter_cooper@apple.com>
Sent: Wednesday, July 13, 2016 3:05:26 AM
Subject: Re: [PM] I think that the new PM needs to learn about
inter-analysis dependencies...

> Interesting. I'm not sure this is the right metric, however. There
> are lots of analyses that hold pointers to other analyses but don't
> need to. The analysis handle itself can be reacquired lazily if we
> care to do so. What's truly problematic is holding pointers into
> another analysis's data structures. To be concrete, holding a
> pointer to ScalarEvolution is not a fundamental problem because we
> could make the analysis reacquire the pointer at the start of every
> query. Holding SCEV* is the problem.

Agreed. I suspect this is the dominant pattern.

> FWIW, I still think this is common enough to design a solution that
> makes it easy to get this right.

I somewhat suspect that it would be better to pass these in along the
query path rather than have results cache them. The reason is that
if the user of the analysis is aware of the other analyses it
depends on, it will also be aware of that cost and the reality of
potentially wanting to preserve them.

I'd be interested in us spending some time trying to make this
refactoring and at least understanding how hard it is before we
engineer a solution that makes the analysis management more complex.

I'm not too concerned about making analysis management more complex for this purpose; handling transitive analysis invalidation was always going to have been necessary.

So as you seemed to agree in your other email (sorry for the heavily
branching thread) perhaps we should start with the potentially error
prone approach now, and see how much iteration can move toward
"nothing to do" by lifting things into the query path rather than
the result object.

Yes, I'm happy to see where this goes. The more I think about it, the more I suspect that regardless of how easy the scheme is to use the main conceptual challenge will be remembering/knowing that it needs to be used.

-Hal

The only way that ‘A’ is still around is if a pass specifically said it preserved ‘A’. So I think it is reasonable to say that even if ‘B’ is gone, ‘A’ remains trustworthy here.

The issue with the “hard dependency” that Sean pointed out is that there are analyses which are trivial to update, but somewhat incidentally have references to other analyses stashed away that are no longer valid. This isn’t actually a “hard dependency”, in that there is no fundamental reason why this layering was enforced.

Yet another reason to prefer passing auxiliary analyses into the query path rather than modeling these as transitive invalidation is dramatically less invalidation.

------------------------------

*From: *"Sean Silva" <chisophugis@gmail.com>
*To: *"Chandler Carruth" <chandlerc@gmail.com>
*Cc: *"Xinliang David Li" <davidxl@google.com>, "llvm-dev" <
llvm-dev@lists.llvm.org>, "Davide Italiano" <dccitaliano@gmail.com>, "Tim
Amini Golling" <mehdi.amini@apple.com>, "Hal Finkel" <hfinkel@anl.gov>,
"Sanjoy Das" <sanjoy@playingwithpointers.com>, "Pete Cooper" <
peter_cooper@apple.com>
*Sent: *Wednesday, July 13, 2016 2:25:52 AM
*Subject: *Re: [PM] I think that the new PM needs to learn about
inter-analysis dependencies...

Yea, this is a nasty problem.

One important thing to understand is that this is specific to analyses
which hold references to other analyses. While this isn't unheard of, it
isn't as common as it could be. Still, definitely something we need to
address.

We can call this type of dependencies (holding references)
hard-dependency. The soft dependency refers to the case where analysis 'A'
depends on 'B' during computation, but does not need 'B' once it is
computed.

There are actually quite a few examples of hard-dependency case. For
instance LoopAccessInfo, LazyValueInfo etc which hold references to other
analyses.

Problem involving hard-dependency is actually easier to detect, as it
is usually a compile time problem. Issues involving soft dependencies are
more subtle and can lead to wrong code gen.

Did you mean to say that soft-dependency problems are easier to detect?
At least my intuition is that soft-dependency is easier because there is no
risk of dangling pointers to other analyses.

The issue is that the fact that there is *any* dependency isn't clear.

However, I think the only real problem here are these "hard dependencies"
(I don't really like that term though). For others, only an analysis that
is *explicitly* preserved survives. So I'm not worried about the fact that
people have to remember this.

The question is how often there are cross-data-structure references.
David mentions a few examples, and I'm sure there are more, but it isn't
clear to me yet whether this is pervasive or occasional.

I just did a quick run-through of PassRegistry.def and this is what I
found:

Module analyses: 0/5 hold pointers to other analyses
CallGraph: No pointers to other analyses.
LazyCallGraph: No pointers to other analyses.
ProfileSummaryAnalysis: No pointers to other analyses.
TargetLibraryAnalysis: No pointers to other analyses.
VerifierAnalysis: No pointers to other analyses.

Module alias analyses: 1/1 keeps pointer to other analysis.
GlobalsAA: Result keeps pointer to TLI (this is a function analysis).

Function analyses: 9/17 keep pointers to other analysis
AAManager: Its Result holds TLI pointer and pointers to individual AA
result objects.
AssumptionAnalysis: No pointers to other analyses.
BlockFrequencyAnalysis: Its Result holds pointers to LoopInfo and BPI.
BranchProbabilityAnalysis: Stores no pointers to other analyses. (uses
LoopInfo to "recalculate" though)
DominatorTreeAnalysis: Stores no pointers to other analyses.
PostDominatorTreeAnalysis: Stores no pointers to other analyses.
DemandedBitsAnalysis: Stores pointers to AssumptionCache and DominatorTree
DominanceFrontierAnalysis: Stores no pointers to other analyses.
(uses DominatorTreeAnalysis for "recalculate" though).
LoopInfo: Uses DominatorTreeAnalysis for "recalculate" but stores no
pointers.
LazyValueAnalysis: Stores pointers to AssumptionCache, TargetLibraryInfo,
DominatorTree.
DependenceAnalysis: Stores pointers to AliasAnalysis, ScalarEvolution,
LoopInfo
MemoryDependenceAnalysis: Stores pointers to AliasAnalysis,
AssumptionCache, TargetLibraryInfo, DominatorTree
MemorySSAAnalysis: Stores pointers to AliasAnalysis, DominatorTree
RegionInfoAnalysis: Stores pointers to DomTree, PostDomTree, DomFrontier
ScalarEvolutionAnalysis: Stores pointers to TargetLibraryInfo,
AssumptionCache, DominatorTree, LoopInfo
TargetLibraryAnalysis: Has no dependencies
TargetIRAnalysis: Has no dependencies.

Function alias analyses: 3/5 keep pointers to other analyses
BasicAA: Keeps pointers to TargetLibraryInfo, AssumptionCache,
DominatorTree, LoopInfo
CFLAA: Keeps pointer to TargetLibraryInfo
SCEVAA: Keeps pointer to ScalarEvolution
ScopedNoAliasAA: No dependencies
TypeBasedAA: No dependencies

Total: 13/28 analyses (~50%) hold pointers to other analyses.
Of the 15/28 analyses that don't hold pointers, 12/15 simply have no
dependencies. Only 3/15 (BPI, LoopInfo, DominanceFrontier) have
dependencies that are used just for a "recalculate" step that retains no
pointers.
So I think it is fair to say that analyses which hold pointers to other
analyses is not an exceptional case. In fact, analyses that use other
analyses just for a "recalculate" step seems to be the exceptional case
(only 3/28 or about 10%)

Interesting. I'm not sure this is the right metric, however. There are
lots of analyses that hold pointers to other analyses but don't need to.
The analysis handle itself can be reacquired lazily if we care to do so.

Are you thinking of instead holding a pointer to the analysis manager?

What's truly problematic is holding pointers into another analysis's data
structures. To be concrete, holding a pointer to ScalarEvolution is not a
fundamental problem because we could make the analysis reacquire the
pointer at the start of every query. Holding SCEV* is the problem.

Looks like SCEV* at least is held only by LoopAccessInfo. (Looks like LAA
holds Loop* too)
New updated rendering at http://reviews.llvm.org/F2161258
(DependenceAnalysis was missing some edges in my previous rendering and I
didn't have and I've added LoopAccessAnalysis; I've updated
http://reviews.llvm.org/P6603). Which other analyses vend data objects that
others might hold pointers to?

-- Sean Silva

I'll keep pushing forward tomorrow with building test-suite successfully
using the new PM for the LTO pipeline (I was doing some unrelated LLD stuff
for most of today). It will be interesting to see how many `invalidate`
overrides will be needed to avoid these issues for just the LTO pipeline on
test-suite.

-- Sean Silva

I’m really concerned with using this approach as the common case. It triggers the run of the analyses at very strange points (mid-query of some other analysis) and forces us to pay the analysis manager lookup overhead on every query. For many analyses, this overhead is larger than the actual query.

There may be cases where this is the only sane way to manage things, but this should be the exception rather than the rule IMO.

Note that Loop (and SCC) are somewhat special as they are IRUnitTs and might as a consequence be more reasonable to hold on to and expect definitive invalidation to occur. But I say “might”. I think this will be case-by-case depending on how they’re being used.

SCEV, Loop, SCC, DomTreeNode, and Region leap immediately to mind. and 3 of those are what would be IRUnitTs (Region being the third, and its weird and likely won’t be in the new PM ever).