Hi guys,
I've gone ahead and written my proposal to GSoC. Comments are welcome! 
=== Introduction ===
Pointer analysis has the power of increasing the effectiveness and the
efficiency of other compiler optimizations, and it is a prerequisite
for many of them. The quality of pointer information can considerably
affect the performance and the quality of those optimizations. The
most precise kind of pointer analysis is context-sensitive and
flow-sensitive. However, it is hard to scale context-sensitive pointer
analysis to big programs. On the other hand, it has been shown that it
is possible to scale flow-sensitive pointer analysis to programs with
millions of lines of code[1].
There are several passes on LLVM to perform pointer analysis: -no-aa,
-basic-aa, -globalsmodref-aa, -scev-aa, ds-aa, and steens-aa -- the
last two ones are not part of the LLVM main tree; they are part of
poolalloc and work as a separated module of LLVM. None of these passes
implements flow-sensitive pointer analysis, though.
Thus, I propose to implement a flow-sensitive pointer analysis
algorithm on LLVM, as part of the Google Summer of Code program. The
implementation that I propose is based on the value flow graph and was
created by Li et. al.[2] Additionally to the pointer analysis
implementation, I propose to implement a program slicer that uses the
results of flow-sensitive pointer analysis. I'll explain bellow the
details about both implementations. Finally, I propose to add an
AliasAnalysis interface to the new implementation of pointer analysis
so that other optimizations can use its results.
=== Implementation ===
== Value-Flow-Based Pointer Analysis ==
The paper "Boosting the performance of flow-sensitive points-to
analysis using value flow"[2] introduces a novel method to perform
flow-sensitive pointer analysis. This method transforms the points-to
analysis problem to a general graph reachability problem. And
according to the authors, the runtime of their implementation is
comparable to the state-of-the-art flow-insensitive points-to
analysis. Implementing such a technique on LLVM would potentially
benefit several optimizations. Besides, the authors claim that their
approach is intuitive and easy to implement, which, in my opinion, is
good for a summer project.
== The Alias Analysis Interface ==
All of the LLVM's alias analysis implementations have a common
interface to which the clients make queries regarding to pointer
information. One of the objectives of this work would be implement
this same interface in the proposed flow-sensitive pointer analysis
algorithm. It would make possible to chain the other LLVM's alias
analysis implementations with the flow-sensitive implementation and
get the best of both.
== The Program Slicer ==
A program slicer aims to find a set of program instructions that may
affect a variable at a given program point. Program slicing uses
points-to information. And its performance and precision depend on the
performance and precision of pointer analysis. A program slicer builds
a System Dependence Graph (SDG) based on the points-to information. If
that information is imprecise -- points-to set bigger than necessary
-- the SDG become too big, making program slicing inefficient. For
example, one of the students in our research group performed an
experiment with program slicing on the Vim benchmark, using a
Andersen-style pointer analysis. It produced a SDG with more than 30
millions of nodes. We believe that with a more precise pointer
information we can reduce the size of the graph and therefore make
program slicing more efficient.
=== Timeline ===
* Weeks 1 - 4: Implement the value-flow-based pointer analysis algorithm;
* Weeks 5 - 6: Implement the AliasAnalysis interface;
* Weeks 7 - 10: Implement the program slicer;
* Week 11: Run tests to compare the impact of the flow-sensitive
pointer analysis and the other LLVM pointer analyses on the
effectiveness and efficiency of the clients. As clients, I'd suggest:
- Agressive Dead Code Elimination;
- Loop Invariant Code Motion;
- Global Value Numbering;
- Program Slicing.
* Week 12: Write a report with the results of the experiments and
explaining the implementation.
=== Background ===
I'm a fourth year undergraduate student at the Federal University of
Minas Gerais (UFMG), in Brazil. I've been working with C++ for almost
four years and with LLVM for almost three years. I've worked on UFMG's
compiler group in a project to design and implement an algorithm to
perform range analysis, and in a number of other small projects using
LLVM.
Nowadays I'm working as research assistant at The University of Texas
at Austin, under the orientation of professor Calvin Lin. I'm working
on two projects at UT: one is to profile several pointer analysis
algorithms and explore algorithmic improvements, and the other is to
compare the impact of different pointer analysis algorithms in the
precision and efficiency of other analyses and optimizations.
=== References ===
[1] Hardekopf and Lin: The Ant and the Grasshopper: Fast and Accurate
Pointer Analysis for Millions of Lines of Code
[2] Li et. al.: Boosting the performance of flow-sensitive points-to
analysis using value flow
[3] LLVM Alias Analysis Infrastructure: http://llvm.org/docs/AliasAnalysis.html