Contributing to Polly with GSOC 2011

Dear all,

I am Raghesh, a student pursuing M.Tech at Indian Institute of
Technology, Madras, India.

I would like to make contribution to the Polly project
(http://wiki.llvm.org/Polyhedral_optimization_framework) as part of
GSOC 2011. I have gained some experience working in OpenMP Code
generation for Polly. This is almost stable now and planning to test
with the polybench benchmarks.

Some of the ideas that can be implemented in Polly is listed below.
Please let me know your comments on this.

1. Increasing the Stability and Coverage of Polly

hi raghesh,

5. Porting Polly to Various architectures.
-------------------------------------------------

Currently Polly generates everything as 64 bit integer, which is
problamatic for embedded platforms.

you may try something like this:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2011-January/037277.html

I am already planning to implement this, and it is greate if you join :slight_smile:

best regards
ether

Great. How do you plan to derive the minimal size needed for an induction variable? Do you already have concrete plans?

Cheers
Tobi

hi tobi,

Great. How do you plan to derive the minimal size needed for an
induction variable?

hmm? never thought about this, can we compute the minimal size of the
induction variable from the size of iterate domain and the wrapping
information?

Do you already have concrete plans?

Plans for derive the minimal size needed for an induction variable? or
allow polly (or even llvm) support more Parallel runtime library?

best regards
ether

hi tobi,

Great. How do you plan to derive the minimal size needed for an
induction variable?

hmm? never thought about this, can we compute the minimal size of the
induction variable from the size of iterate domain and the wrapping
information?

I think this is a good approach.

Here an example

   for (int i = 0; i < N; i++)
     Stmt(i);

We can use the type information to restrict the loop dimensions to the limits of the types. This means we add for each loop something like
-(2^31) <= i < (2^31)-1 to the domain.

This will yield to such a code after code generation.

Here we can now derive that an i32 is sufficient, to store s. However the code is still overly complex. To ensure CLooG simplifies it, we can
store the constraints for N in the context.

Context = N {[]: N < (2^32)-1);

This allows to simplify the code, but to still derive the optimal type for s.

for (type? s = 0; s < N; i++)
   Stmt(s);

In this case this was simple, however for more complex schedules I expect this will become pretty interesting.

We also need to see how this affects compile time performance.

Do you already have concrete plans?

Plans for derive the minimal size needed for an induction variable? or
allow polly (or even llvm) support more Parallel runtime library?

I was talking about the minimizing the type of the generated induction variables. However, I am also interested in adding support for mpc.sf.net as another OpenMP library. Do you plan to add other parallel libraries like MPI?

Cheers
Tobi

Dear all,

I am Raghesh, a student pursuing M.Tech at Indian Institute of
Technology, Madras, India.

I would like to make contribution to the Polly project
(http://wiki.llvm.org/Polyhedral_optimization_framework) as part of
GSOC 2011. I have gained some experience working in OpenMP Code
generation for Polly. This is almost stable now and planning to test
with the polybench benchmarks.

Some of the ideas that can be implemented in Polly is listed below.
Please let me know your comments on this.

Hi Raghesh,

great that you decided to apply for the summer of code. I put a list of Polly related ideas on the LLVM wiki, in case someone needs further ideas (http://wiki.llvm.org/Polly).

In respect of your ideas.

1. Increasing the Stability and Coverage of Polly
-------------------------------------------------------------

Polly can show good speedup on several test cases. there are still
many programs that cannot be optimized by it. One reason is that Polly
does not yet support casts like (sext, zext, trunk). Those often
appear implicit in programs, as they often have 32 bit induction
variables, but require 64 bit indexes for array subscripts.

For example:

for (int i = 0; i< N; i++)
  A[i] =

If we translate this to LLVM-IR and keep i an i32 but use an i64 to
calculate the access to A[i] there will be a sext necessary. Polly
currently do not handle this.

I think this is a very important part. As Polly now shows some benefits, we should work on increasing its impact. Especially the extension of our handling of affine expressions is important for other interesting and important features, like the detection of variable size, multi dimensional arrays.

2. Testing Real Programs.
---------------------------------
Testing with well known benchmarks like SPEC CPU2006. This will help
us to find out cases where Polly cannot optimize and improve the
coverage of Polly on existing Programs.

I also like this part and we should combine it with some public Polly regression testers. I had one at Ohio State, but it was just testing make polly-test, but not the test-suite. (Which is just tested internally be me and Andreas). As I now move back to Europe, we should make sure the new tester runs both and is located at a public place.

3. Profiling in polly
-----------------------
The idea is explained below with a few examples. Consider the following code.
  scanf ( ”%d” ,&b ) ;

  for ( i = 0 ; i< N; i +=b) {
          body ;
  }

Polly may not detect this as a SCoP because the variable b is read as
an user input. So to detect this as a SCoP we instrument the IR with
the information provided by profiling.
Suppose using profiling we figure out that most of the time the value
of b is say 2. we can convert the
above code as follows.

   scanf ( ”%d” ,&b ) ;
    if ( b == 2 ) {
        for ( i = 0 ; i< N; i += 2 ) {
                body ;
        }
   } else {
            f o r ( i = 0 ; i< N; i += b ) {
                    body ;
            }
   }

Now with the transformed code the for loop inside ’if’ will be
detected as a SCoP and can be parallelised.
Since value of N is 100 most of the time, the overall performance will
be improved.

Consider another scenario.
   f o r ( i = 0 ; i< N; i ++) {
            body ;
   }

Suppose using profiling we know that N is always very small. So there
wont be much gain from parallelising
it. So we have to tell polly that don’t detect this as a SCoP if N is
less than a specific value.

Some other immediate applications would be

* Automatially derive the best scheduling strategy supported by OpenMP.
* Adding simple profiling support, to understand how much time is
spent inside each scop.

Andreas Simbuerger has done some significant work on this and can be extended.

Yes, profiling is also interesting, though I personally have not thought too much about this topic, yet. You should probably get input from other LLVM developers. We may also try to not be too Polly centric. Those kind of optimizations seem to be useful in other parts of LLVM too. It would be great to work on a generic infrastructure, where Polly is just one place where we do function/loop/region versioning.

5. Porting Polly to Various architectures.
-------------------------------------------------

Currently Polly generates everything as 64 bit integer, which is
problamatic for embedded platforms.

Yes. See the answer to ether.

6. Vectorization in Polly
------------------------------
Lot of work needed to be done in this area.

You are right, even if we already got basic vectorization, this needs still a lot of work. My personal plan is to first integrate Pluto as a library into Polly. Like this we already get a good locality optimizer. Than we improve the Pluto algorithm to take SIMDization and cache line locality into account. (And than we consider alignment, data layout transformations, ...)

I propose you choose one or two related topics and you extend the application a little. It would be good to see what problems exist at the moment and how you plan to solve them. I think it is also helpful to have a time line that gives an impression how much time you estimate for each feature.

Cheers
Tobi