Existing studies on the benefits of pointer analysis

Side note: The key limiting factor for load hoisting is most often being able to prove dereferenceability. I only mention that because the OP had asked for where other optimizer changes might help. Er, this is actually fairly close to what the code does. It just does it in a really oddly structured manner. But yes, I agree, this code could be radically improved. Out of curiosity, what would you suggest instead? For context, I have a patch in my tree which falls back to O(n^2) queries for small loops. We found this to be rather important for the optimization of IR derived from our Java frontend. Philip

It’s found more and more like “get CFL-AA turned on by default” might be a viable GSoC project for the right student. It would require someone with existing knowledge of AA and a willingness to debug nasty problems, but it sounds like there’s definitely interest in the community in seeing this happen.

If the student finished early (unlikely), they could start on SCEV-AA as well.

Philip

FWIW, I’d be happy to help mentoring such a project if a student is interested.

During the last LLVM Dev Meeting (the US one), I think better AA was number one in the list of requested features during the BoF on “Sophisticated Program Analysis on LLVM IR” ( http://devmtg15.llvm.org/event/4VNY/sophisticated-program-analysis-on-llvm-ir ).
CC John, I don’t remember if there was a summary of the BoF posted online.

Also out of curiosity, why LLVM choose to expose pointer analysis information as alias-query APIs rather than APIs that let the client query points-to sets? This question has bothered me ever since I started to look into LLVM’s alias analysis framework. It seems to me that alias queries may yield less precise results than points-to queries. To put it in another way, it is easy to convert points-to information to aliasing information (just check for emptiness of points-to set intersection), but the reverse is much harder and may not be possible sometimes. The alias set tracker also introduce an additional source of imprecision: if a may alias b and b may alias c, it is not necessary that a may alias c but the AST would merge them all into one set. It also doesn’t seem like alias information is cheaper to compute (e.g. andersen) and is cheaper to query (e.g. the O(n^2) query problem). – Best Regards, – Jia Chen

It’s something that I am certainly interested in and qualified to do.

However, the way I read Daniel’s reply in this thread is: “LLVM, in its current form, is unlikely to benefit from a more precise aa”. He did mention that cfl-aa is “more understandable and maintainable”, and is “fast enough”, but nothing is said about the benefits. There was some discussions I could find back in 2014, which basically says that cfl-aa offers no significant improvement in performance. Things may got greatly improved since then, yet it is not clear to me what the current situation is.

With the benefits being unclear a GSoC proposal on this topic may not look motivated enough.

– Best Regards, – Jia Chen

No pointer analysis will show any benefits until it is on by default and tuning starts :slight_smile:

or everything is tuned and then it’s turned on, whichever way you want to do it.

From: "Jia Chen via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Philip Reames" <listmail@philipreames.com>, dberlin@dberlin.org
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Tuesday, March 22, 2016 4:33:34 PM
Subject: Re: [llvm-dev] Existing studies on the benefits of pointer analysis

It's something that I am certainly interested in and qualified to do.

However, the way I read Daniel's reply in this thread is: "LLVM, in
its current form, is unlikely to benefit from a more precise aa". He
did mention that cfl-aa is "more understandable and maintainable",
and is "fast enough", but nothing is said about the benefits. There
was some discussions I could find back in 2014, which basically says
that cfl-aa offers no significant improvement in performance. Things
may got greatly improved since then, yet it is not clear to me what
the current situation is.

With the benefits being unclear a GSoC proposal on this topic may not
look motivated enough.

I think there's wide interest in getting CFL-AA into a usable state. As I recall, the percentage of noalias results during self hosting went up by a huge amount (for example). More noalias results obviously does not guarantee better code performance (the performance could even be worse because of register allocation / scheduling deficiencies). Nevertheless, we'll never do better if we're hamstrung by poor aliasing results.

Moreover, we currently rely heavily on BasicAA, which is a huge collection of local heuristics that catches many things, but is quite slow. It is slow because it needs to do local recursive walks for each pointwise query. If getting CFL-AA, in addition to increasing precision, provides us with a path away from our current BasicAA design toward something with better complexity, that's a definite win.

-Hal

Great! Are they published somewhere? Can the data be shared somehow?

My results are published here:
http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html

Sadly, they are over a decade out of date, and some things have changed since then :slight_smile:

I agree that nothing takes advantage of context sensitivity. But I would argue against flow sensitivity, field sensitivity, heap model and external function models. Flow sensitivity is helpful when the optimization pass itself is flow-sensitive (e.g. adce, gvn), and field sensitivity is helpful when a struct contains multiple pointers. Heap sensitivity is basically what motivates Chris Lattner’s PLDI’07 paper, and external function models are helpful since without them the analysis has to be extremely conservative and concludes everything that external functions touch all may-alias each other.

I’m still a big fan of context sensitive, flow insensitive, unification based models. Contrary to your claim, context sensitivity is useful for mod-ref analysis, e.g. “can I hoist a load across this call”? Context sensitivity improves the precision of the mod/ref set of the call.

-Chris

I’m still a big fan of context sensitive, flow insensitive, unification
based models.

CFL can emulate this in the same time bound.

Contrary to your claim, context sensitivity *is* useful for mod-ref
analysis, e.g. “can I hoist a load across this call”? Context sensitivity
improves the precision of the mod/ref set of the call.

-Chris

Yeah.
It depends entirely on your goal. In reality, often what you really want is
something to say "hey, i've got this pointer over here, and i really want
to hoist it up here. Do something, tell me if that is possible".

(This is actually why i'm a fan of CFL-AA. You can essentially make it as
expensive or not expensive as you want, and it still does really well in
pracftice in time)

I’m still a big fan of context sensitive, flow insensitive, unification based models.

Interestingly I find the unification approach quite unsatisfactory sometime. What happens there is pointers with the same “depth” are too often clobbered together unless they are really unrelated to each other.

Contrary to your claim, context sensitivity is useful for mod-ref analysis, e.g. “can I hoist a load across this call”? Context sensitivity improves the precision of the mod/ref set of the call.

I’m not sure about that. How often does mod-ref information change across callsites? Isn’t a good context-insensitive function summary good enough?

-Jia

Yeah.
It depends entirely on your goal. In reality, often what you really want is something to say "hey, i've got this pointer over here, and i really want to hoist it up here. Do something, tell me if that is possible".

And this is one motivation of my current research: how can various precision dimensions of a pointer analysis be effectively driven by its client.

(This is actually why i'm a fan of CFL-AA. You can essentially make it as expensive or not expensive as you want, and it still does really well in pracftice in time)

Again, "making it as expensive or not expensive as you want" is not something unique about cfl-aa. With the right tweak one can also do it with a traditional solver. The real interesting question here is how to find out what locations are most likely to matter and worth making expensive.

- Jia

With no offense meant, I’ve been writing solvers for 10+ years. What you are suggesting is not just a tweak. Doing it on a per pointer basis is neither trivial nor easy to keep sound. If you think it is, I would suggest taking GCC’s implementation and trying to do it. If what you say was true, production compilers would be doing it. But they don’t.

The other problem you mention is, IMHO, not actually as interesting. We already have traditional methods (value profiling, etc) of knowing which things matter. Static prediction of this has a long history of over promise and under delivery.

With no offense meant, I've been writing solvers for 10+ years. What you are suggesting is not just a tweak. Doing it on a per pointer basis is neither trivial nor easy to keep sound. If you think it is, I would suggest taking GCC's implementation and trying to do it. If what you say was true, production compilers would be doing it. But they don't.

None taken. I didn't say it is an easy problem. Pointer analysis on a C-ish language is always hard no matter what approach one takes, let alone tuning the precision on a per pointer basis.

The other problem you mention is, IMHO, not actually as interesting. We already have traditional methods (value profiling, etc) of knowing which things matter. Static prediction of this has a long history of over promise and under delivery.

I agree that it is easier to get some quick hints with profiling, yet like any other dynamic approaches profiling has its own drawbacks: benchmark dependent, no soundness guarantees, etc. I believe it is still not the time to conclude that we've already got a perfect solution on this problem and declare death sentence to any attempts to seek a good static prediction mechanism.

- Jia

I’m still a big fan of context sensitive, flow insensitive, unification based models.

Interestingly I find the unification approach quite unsatisfactory sometime. What happens there is pointers with the same “depth” are too often clobbered together unless they are really unrelated to each other.

Contrary to your claim, context sensitivity is useful for mod-ref analysis, e.g. “can I hoist a load across this call”? Context sensitivity improves the precision of the mod/ref set of the call.

I’m not sure about that. How often does mod-ref information change across callsites? Isn’t a good context-insensitive function summary good enough?

It changes all the time. Here’s a trivial example, assume no inlining and no AA other than the one in question:

std::vector V1 = { 1, 2, 3 };

std::vector V2 = { 4, 5, 6 };

V1.pop_back(); // Mutates *this

auto length = V1.size();

V2.pop_back(); // Mutates *this

auto zero = length - V1.size()

In this case, the compiler should “obviously” be able to CSE length, allowing further simplification to substitute zero with 0.

However, with a context sensitive AA, both &V1 and &V2 end up aliasing the “this” pointer in std::vector::pop_back. As such, without context sensitivity, you would falsely assume that “V2.pop_back();” could modify “V1”. This is unfortunate, particularly for OO languages that frequently use static dispatch (like C++, Swift, and others).

That said, I have no idea what you’re referring to by "context-insensitive function summary”. If you’re talking about something context sensitive, then ya, it can handle this. :slight_smile:

-Chris

For the example to work here the CSE pass itself needs to be flow-sensitive and context-sensitive. I don’t think that’s how most optimizations in LLVM work. If it is, then I agree with all you said. But if it isn’t, there’s no point in bumping up the context sensitivity just for the pointer analysis. As Daniel mentioned earlier in this thread, the analysis analysis framework in LLVM doesn’t provide any APIs for flow-sensitive queries as well as context-sensitive queries. This design choice almost eliminate any possibilities for a flow-sensitive or context-sensitive pointer analysis to be useful. Strangely, the set of APIs does support 1-CFA context-sensitive mod-ref queries (so I guess one could somehow reap some context-sensitive benefits out of them after all). To me that design incoherence looks confusing, but I’m pretty sure you know much better than me why it should work that way :slight_smile: - Jia

For the example to work here the CSE pass itself needs to be
flow-sensitive and context-sensitive. I don't think that's how most
optimizations in LLVM work. If it is, then I agree with all you said. But
if it isn't, there's no point in bumping up the context sensitivity just
for the pointer analysis.

As Daniel mentioned earlier in this thread, the analysis analysis
framework in LLVM doesn't provide any APIs for flow-sensitive queries as
well as context-sensitive queries. This design choice almost eliminate any
possibilities for a flow-sensitive or context-sensitive pointer analysis to
be useful. Strangely, the set of APIs does support 1-CFA context-sensitive
mod-ref queries (so I guess one could somehow reap some context-sensitive
benefits out of them after all). To me that design incoherence looks
confusing, but I'm pretty sure you know much better than me why it should
work that way :slight_smile:

In general, I would never assume someone sat down and thought *really hard*
about the design of an API in a piece of software to get it right for all
time. It happens, but it's actually pretty rare. One of the most limited
resources most software teams have is time, and they use often use that
time to get done what they need, and design APIs based on current and
easily predictable future needs.

The same thing is true about LLVM's APIs. You shouldn't assume we sat
down, thought really hard about AA by meditating upon a mountain, and then
designed this API. Instead, some thought was put into it, based on current
and easily predicted needs, and then people moved on.

I can't say this was the wrong choice, either, given how long it has been
since someone has even come along considering implementing context
sensitive or flow-sensitive AA (and having talked to tons of people over
the years, this is *not* a chicken and egg problem. Literally nobody has
had a *strong* desire). It wouldn't make much sense to try to design an API
without any implementations that want to use those API's.

As that changes, i expect the AA API will be redesigned and reworked to
accomodate the needs of those things.

- Jia

From: "Jia Chen via llvm-dev" <llvm-dev@lists.llvm.org>
To: "Chris Lattner" <clattner@apple.com>
Cc: "llvm-dev" <llvm-dev@lists.llvm.org>
Sent: Monday, March 28, 2016 10:10:12 AM
Subject: Re: [llvm-dev] Existing studies on the benefits of pointer
analysis

> It changes all the time. Here’s a trivial example, assume no
> inlining
> and no AA other than the one in question:

> > std::vector<int> V1 = { 1, 2, 3 };
>

> > std::vector<int> V2 = { 4, 5, 6 };
>

> > V1.pop_back(); // Mutates *this
>

> > auto length = V1.size();
>

> > V2.pop_back(); // Mutates *this
>

> > auto zero = length - V1.size()
>

> In this case, the compiler should “obviously” be able to CSE
> length,
> allowing further simplification to substitute zero with 0.

> However, with a context sensitive AA, both &V1 and &V2 end up
> aliasing the “this” pointer in std::vector::pop_back. As such,
> without context sensitivity, you would falsely assume that
> “V2.pop_back();” could modify “V1”. This is unfortunate,
> particularly for OO languages that frequently use static dispatch
> (like C++, Swift, and others).

> That said, I have no idea what you’re referring to by
> "context-insensitive function summary”. If you’re talking about
> something context sensitive, then ya, it can handle this. :slight_smile:

For the example to work here the CSE pass itself needs to be
flow-sensitive and context-sensitive. I don't think that's how most
optimizations in LLVM work. If it is, then I agree with all you
said. But if it isn't, there's no point in bumping up the context
sensitivity just for the pointer analysis.

Can you elaborate on what you mean by flow sensitive? We have a mod/ref query interface that can return answers specific to a particular instruction/call pair. The code above could easily live in a single basic block, and if we had function attribute deduction on the 'argmemonly' attribute, we could probably do this now.

-Hal

What I meant is that the CSE needs to be aware of the execution order, i.e. the call to V1.pop_back() should not be in the middle of the two V1.size() for zero to be 0. If there exists more complicated control flows, CSE needs to be able to make the same kind of argument across basic blocks.

I didn’t follow LLVM development very closely to be familiar with how LLVM handles CSE. If what I said above is exactly how it works today, then yes we could probably do this now.

But still, there is no APIs that answers “are p and q aliases before this instruction x?”. The same can be done for mod-ref today (if I remembered correctly this isn’t even the case before the AAResult class came into existence), but not for aliases.

Can you elaborate on what you mean by flow sensitive? We have a mod/ref query interface that can return answers specific to a particular instruction/call pair. The code above could easily live in a single basic block, and if we had function attribute deduction on the ‘argmemonly’ attribute, we could probably do this now.

-Hal

What I meant is that the CSE needs to be aware of the execution order, i.e. the call to V1.pop_back() should not be in the middle of the two V1.size() for zero to be 0. If there exists more complicated control flows, CSE needs to be able to make the same kind of argument across basic blocks.

I didn’t follow LLVM development very closely to be familiar with how LLVM handles CSE. If what I said above is exactly how it works today, then yes we could probably do this now.

The existing LLVM passes and infrastructure already do this.

But still, there is no APIs that answers “are p and q aliases before this instruction x?”. The same can be done for mod-ref today (if I remembered correctly this isn’t even the case before the AAResult class came into existence), but not for aliases.

This is already handled correctly by LLVM. The sequence in question is:

auto length = V1.size();

“V1.size()” (which is effectively a load) is available after this instruction.

V2.pop_back(); // Mutates *this

Alias analysis is queried to say “does std::vector::pop_back(&V2) mod/ref &V1”?

If it returns mod, then “length” is removed from the available set.

auto zero = length - V1.size()

The access to V1.size() also a load, if it is in the available set, it is replaced with the SSA value for “length”. A later simplification pass turns “length-length” into 0, allowing the original load to be removed as dead.

-Chris

FWIW, DS-AA is context sensitive in this way, and LLVM definitely does take advantage of it. Even circa 2005. This is the whole point of chapter 4 of http://llvm.org/pubs/2005-05-04-LattnerPHDThesis.html

-Chris