Dear all,
I am Raghesh, a student pursuing M.Tech at Indian Institute of
Technology, Madras, India.
I would like to make contribution to the Polly project
(http://wiki.llvm.org/Polyhedral_optimization_framework) as part of
GSOC 2011. I have gained some experience working in OpenMP Code
generation for Polly. This is almost stable now and planning to test
with the polybench benchmarks.
Some of the ideas that can be implemented in Polly is listed below.
Please let me know your comments on this.
Hi Raghesh,
great that you decided to apply for the summer of code. I put a list of Polly related ideas on the LLVM wiki, in case someone needs further ideas (http://wiki.llvm.org/Polly).
In respect of your ideas.
1. Increasing the Stability and Coverage of Polly
-------------------------------------------------------------
Polly can show good speedup on several test cases. there are still
many programs that cannot be optimized by it. One reason is that Polly
does not yet support casts like (sext, zext, trunk). Those often
appear implicit in programs, as they often have 32 bit induction
variables, but require 64 bit indexes for array subscripts.
For example:
for (int i = 0; i< N; i++)
A[i] =
If we translate this to LLVM-IR and keep i an i32 but use an i64 to
calculate the access to A[i] there will be a sext necessary. Polly
currently do not handle this.
I think this is a very important part. As Polly now shows some benefits, we should work on increasing its impact. Especially the extension of our handling of affine expressions is important for other interesting and important features, like the detection of variable size, multi dimensional arrays.
2. Testing Real Programs.
---------------------------------
Testing with well known benchmarks like SPEC CPU2006. This will help
us to find out cases where Polly cannot optimize and improve the
coverage of Polly on existing Programs.
I also like this part and we should combine it with some public Polly regression testers. I had one at Ohio State, but it was just testing make polly-test, but not the test-suite. (Which is just tested internally be me and Andreas). As I now move back to Europe, we should make sure the new tester runs both and is located at a public place.
3. Profiling in polly
-----------------------
The idea is explained below with a few examples. Consider the following code.
scanf ( ”%d” ,&b ) ;
for ( i = 0 ; i< N; i +=b) {
body ;
}
Polly may not detect this as a SCoP because the variable b is read as
an user input. So to detect this as a SCoP we instrument the IR with
the information provided by profiling.
Suppose using profiling we figure out that most of the time the value
of b is say 2. we can convert the
above code as follows.
scanf ( ”%d” ,&b ) ;
if ( b == 2 ) {
for ( i = 0 ; i< N; i += 2 ) {
body ;
}
} else {
f o r ( i = 0 ; i< N; i += b ) {
body ;
}
}
Now with the transformed code the for loop inside ’if’ will be
detected as a SCoP and can be parallelised.
Since value of N is 100 most of the time, the overall performance will
be improved.
Consider another scenario.
f o r ( i = 0 ; i< N; i ++) {
body ;
}
Suppose using profiling we know that N is always very small. So there
wont be much gain from parallelising
it. So we have to tell polly that don’t detect this as a SCoP if N is
less than a specific value.
Some other immediate applications would be
* Automatially derive the best scheduling strategy supported by OpenMP.
* Adding simple profiling support, to understand how much time is
spent inside each scop.
Andreas Simbuerger has done some significant work on this and can be extended.
Yes, profiling is also interesting, though I personally have not thought too much about this topic, yet. You should probably get input from other LLVM developers. We may also try to not be too Polly centric. Those kind of optimizations seem to be useful in other parts of LLVM too. It would be great to work on a generic infrastructure, where Polly is just one place where we do function/loop/region versioning.
5. Porting Polly to Various architectures.
-------------------------------------------------
Currently Polly generates everything as 64 bit integer, which is
problamatic for embedded platforms.
Yes. See the answer to ether.
6. Vectorization in Polly
------------------------------
Lot of work needed to be done in this area.
You are right, even if we already got basic vectorization, this needs still a lot of work. My personal plan is to first integrate Pluto as a library into Polly. Like this we already get a good locality optimizer. Than we improve the Pluto algorithm to take SIMDization and cache line locality into account. (And than we consider alignment, data layout transformations, ...)
I propose you choose one or two related topics and you extend the application a little. It would be good to see what problems exist at the moment and how you plan to solve them. I think it is also helpful to have a time line that gives an impression how much time you estimate for each feature.
Cheers
Tobi