GSoC 2012 Ideas - Alias Analysis

Hi guys,

I'm an undergraduate computer science student and I've been working
with pointer analysis this semester under the orientation of professor
Calvin Lin, at The University of Texas at Austin. I'm interested on
helping to develop LLVM's alias analysis infrastructure. I have two
ideas for a possible proposal and I'd like to know if you have
interest on any of them.

My first idea is to improve LLVM's existing alias analysis
infrastructure. I saw on the open projects page that there are a
couple of things that need to be done on LLVM's Pointer and Alias
Analysis[1]. However, some things seem to be solved
(http://llvm.org/PR1604 and http://llvm.org/PR452) and it made me a
little confused because I don't know exactly what should be addressed
by a possible proposal.

My second idea is to implement a pointer analysis algorithm in LLVM
using Value Flow Graph, as described by the paper "Boosting the
Performance of Flow-sensitive Points-to Analysis using Value Flow"[2].

Thus, I'd like to know whether or not you think improving LLVM's alias
analysis infrastructure using one of those two suggestions would be a
reasonable thing to do as a GSoC.

[1] http://llvm.org/OpenProjects.html#pointeranalysis
[2] http://labs.oracle.com/docs/2011/150-299/2011-0189/esec053-li.pdf

Best,

Douglas

Hi guys,

I'm an undergraduate computer science student and I've been working
with pointer analysis this semester under the orientation of professor
Calvin Lin, at The University of Texas at Austin. I'm interested on
helping to develop LLVM's alias analysis infrastructure. I have two
ideas for a possible proposal and I'd like to know if you have
interest on any of them.

My first idea is to improve LLVM's existing alias analysis
infrastructure. I saw on the open projects page that there are a
couple of things that need to be done on LLVM's Pointer and Alias
Analysis[1]. However, some things seem to be solved
(http://llvm.org/PR1604 and http://llvm.org/PR452) and it made me a
little confused because I don't know exactly what should be addressed
by a possible proposal.

My second idea is to implement a pointer analysis algorithm in LLVM
using Value Flow Graph, as described by the paper "Boosting the
Performance of Flow-sensitive Points-to Analysis using Value Flow"[2].

Thus, I'd like to know whether or not you think improving LLVM's alias
analysis infrastructure using one of those two suggestions would be a
reasonable thing to do as a GSoC.

If you want to work on points-to analysis, then you need to have some feature which will benefit from using that points-to analysis. Example applications of points-to analysis would be optimization and static backwards slicing.

If you write a proposal to work on points-to analysis, then I recommend that your proposal also include work on code that will use that points-to analysis. For example, you could implement a points-to analysis and then add an AliasAnalysis interface to it so that LLVM optimizations can use it without modification. Alternatively, you could implement some new optimization that uses the points-to results from your analysis directly.

If you can provide evidence in your proposal that LLVM is missing optimization opportunities because of a lack of good points-to analysis, that would make your proposal even stronger.

-- John T.

Hi John, thanks for your feedback. I'll try to answer your comments bellow.

Hi guys,

I'm an undergraduate computer science student and I've been working
with pointer analysis this semester under the orientation of professor
Calvin Lin, at The University of Texas at Austin. I'm interested on
helping to develop LLVM's alias analysis infrastructure. I have two
ideas for a possible proposal and I'd like to know if you have
interest on any of them.

My first idea is to improve LLVM's existing alias analysis
infrastructure. I saw on the open projects page that there are a
couple of things that need to be done on LLVM's Pointer and Alias
Analysis[1]. However, some things seem to be solved
(http://llvm.org/PR1604 and http://llvm.org/PR452) and it made me a
little confused because I don't know exactly what should be addressed
by a possible proposal.

My second idea is to implement a pointer analysis algorithm in LLVM
using Value Flow Graph, as described by the paper "Boosting the
Performance of Flow-sensitive Points-to Analysis using Value Flow"[2].

Thus, I'd like to know whether or not you think improving LLVM's alias
analysis infrastructure using one of those two suggestions would be a
reasonable thing to do as a GSoC.

If you want to work on points-to analysis, then you need to have some
feature which will benefit from using that points-to analysis. Example
applications of points-to analysis would be optimization and static
backwards slicing.

I think implementing a program slicer would be doable. We have a
prototype implemented; extending it and porting it to the newer
version of LLVM shouldn't be hard.

As for the optimizations, is there any optimization (maybe a client of
pointer information, maybe not) that you would like to see implemented
in LLVM that is not implemented yet?

I'd like to point out that, although it'd be easier for me working
with pointer analysis, if there is anything that you think has more
priority or is more appealing for the LLVM community, I'd be happy to
give it a try.

If you write a proposal to work on points-to analysis, then I recommend that
your proposal also include work on code that will use that points-to
analysis. For example, you could implement a points-to analysis and then
add an AliasAnalysis interface to it so that LLVM optimizations can use it
without modification.

Yes, that would be part of the proposal.

Alternatively, you could implement some new
optimization that uses the points-to results from your analysis directly.

If you can provide evidence in your proposal that LLVM is missing
optimization opportunities because of a lack of good points-to analysis,
that would make your proposal even stronger.

Well, off the top of my head, I can't think of any missing
optimizations related to points-to analysis. However, I'm developing a
study right now to investigate which clients would benefit from a
stronger pointer analysis. There is a famous paper about this "Which
Pointer Analysis Should I Use?" where they conclude that most of the
clients do not need a really strong pointer analysis. However, it has
been 12 years since they developed that study, they only analyzed very
small benchmarks, and they experimented just a few clients. So,
hopefully, we'll be able to find potencial clients that benefit from a
strong pointer analysis.

Douglas

Hi guys,

I've gone ahead and written my proposal to GSoC. Comments are welcome! :slight_smile:

=== Introduction ===

Pointer analysis has the power of increasing the effectiveness and the
efficiency of other compiler optimizations, and it is a prerequisite
for many of them. The quality of pointer information can considerably
affect the performance and the quality of those optimizations. The
most precise kind of pointer analysis is context-sensitive and
flow-sensitive. However, it is hard to scale context-sensitive pointer
analysis to big programs. On the other hand, it has been shown that it
is possible to scale flow-sensitive pointer analysis to programs with
millions of lines of code[1].

There are several passes on LLVM to perform pointer analysis: -no-aa,
-basic-aa, -globalsmodref-aa, -scev-aa, ds-aa, and steens-aa -- the
last two ones are not part of the LLVM main tree; they are part of
poolalloc and work as a separated module of LLVM. None of these passes
implements flow-sensitive pointer analysis, though.

Thus, I propose to implement a flow-sensitive pointer analysis
algorithm on LLVM, as part of the Google Summer of Code program. The
implementation that I propose is based on the value flow graph and was
created by Li et. al.[2] Additionally to the pointer analysis
implementation, I propose to implement a program slicer that uses the
results of flow-sensitive pointer analysis. I'll explain bellow the
details about both implementations. Finally, I propose to add an
AliasAnalysis interface to the new implementation of pointer analysis
so that other optimizations can use its results.

=== Implementation ===

== Value-Flow-Based Pointer Analysis ==

The paper "Boosting the performance of flow-sensitive points-to
analysis using value flow"[2] introduces a novel method to perform
flow-sensitive pointer analysis. This method transforms the points-to
analysis problem to a general graph reachability problem. And
according to the authors, the runtime of their implementation is
comparable to the state-of-the-art flow-insensitive points-to
analysis. Implementing such a technique on LLVM would potentially
benefit several optimizations. Besides, the authors claim that their
approach is intuitive and easy to implement, which, in my opinion, is
good for a summer project.

== The Alias Analysis Interface ==

All of the LLVM's alias analysis implementations have a common
interface to which the clients make queries regarding to pointer
information. One of the objectives of this work would be implement
this same interface in the proposed flow-sensitive pointer analysis
algorithm. It would make possible to chain the other LLVM's alias
analysis implementations with the flow-sensitive implementation and
get the best of both.

== The Program Slicer ==

A program slicer aims to find a set of program instructions that may
affect a variable at a given program point. Program slicing uses
points-to information. And its performance and precision depend on the
performance and precision of pointer analysis. A program slicer builds
a System Dependence Graph (SDG) based on the points-to information. If
that information is imprecise -- points-to set bigger than necessary
-- the SDG become too big, making program slicing inefficient. For
example, one of the students in our research group performed an
experiment with program slicing on the Vim benchmark, using a
Andersen-style pointer analysis. It produced a SDG with more than 30
millions of nodes. We believe that with a more precise pointer
information we can reduce the size of the graph and therefore make
program slicing more efficient.

=== Timeline ===

* Weeks 1 - 4: Implement the value-flow-based pointer analysis algorithm;

* Weeks 5 - 6: Implement the AliasAnalysis interface;

* Weeks 7 - 10: Implement the program slicer;

* Week 11: Run tests to compare the impact of the flow-sensitive
pointer analysis and the other LLVM pointer analyses on the
effectiveness and efficiency of the clients. As clients, I'd suggest:
  - Agressive Dead Code Elimination;
  - Loop Invariant Code Motion;
  - Global Value Numbering;
  - Program Slicing.

* Week 12: Write a report with the results of the experiments and
explaining the implementation.

=== Background ===

I'm a fourth year undergraduate student at the Federal University of
Minas Gerais (UFMG), in Brazil. I've been working with C++ for almost
four years and with LLVM for almost three years. I've worked on UFMG's
compiler group in a project to design and implement an algorithm to
perform range analysis, and in a number of other small projects using
LLVM.

Nowadays I'm working as research assistant at The University of Texas
at Austin, under the orientation of professor Calvin Lin. I'm working
on two projects at UT: one is to profile several pointer analysis
algorithms and explore algorithmic improvements, and the other is to
compare the impact of different pointer analysis algorithms in the
precision and efficiency of other analyses and optimizations.

=== References ===

[1] Hardekopf and Lin: The Ant and the Grasshopper: Fast and Accurate
Pointer Analysis for Millions of Lines of Code

[2] Li et. al.: Boosting the performance of flow-sensitive points-to
analysis using value flow

[3] LLVM Alias Analysis Infrastructure: http://llvm.org/docs/AliasAnalysis.html