GSoC2012 proposal -- A new Back-end for polyhedral Optimization framework for LLVM (Polly)

Hi all,

My name is Qingrui Liu, a student at Sun Yat-sen University, China. I have been working on high level synthesis project which is implemented as a back-end of LLVM, in the last two years.When I want to use Polly to generate parallel LLVM-IR from Polly-IR for me as input to my high level synthesis tools at the begining of this year, I found that Polly’s back-end is not flexible and modular enough to allow user adding new platform support. So I am going to improve the existing back-end to an adaptive and modular back-end which will unleash the power of Polly to other projects.

To be short, I plan to implement my proposal in steps as follows:

  1. refactor the existing back-end to a preliminary modular back-end.
  2. Add A CodeGen adapter, abstract away the detail of CLAst.
  3. Implementation of Specific code generation class to the new back-end. In this phase, I am going to implement a SIMD code generation class.

An ultimate construction of the new backend will be like the figure below:

[Original Polly codegeneration pass] [Click codegeneration[1]] [whole function vectorize codegeneration[2]], …

\ | /
\ | /
\ | /
\ | /
\ | /
CodeGen adapter
/
/
/
Cloog AST infrastructure XXX AST infrastructure …
\ /
\ /
\ /
SCOP (Polly IR)

After the new backend is done, I think it will easy for other developers to implement a code generation pass which will meet the requirements of their platforms or LLVM back-ends. So the optimized LLVM-IR could be passed to the ordinary LLVM back-ends, such as the high level synthesis back-end I mentioned above. It will be a great help to both the developers and LLVM.

Tobias Grosser and Ether, who are the contributors of Polly, have assented to be my mentor, If my proposal is approved. Is this idea good enough for the Google summer of code? If it is, I am going to write a proposal for it. Any suggestion is appreciated.

Sincerely,
Qingrui Liu

[1]http://supertech.csail.mit.edu/cilk/
[2]http://www.cdl.uni-saarland.de/projects/wfv/

Hi Qingrui Liu,

sorry for replying slowly.

Here some comments:

Hi all,
    My name is Qingrui Liu, a student at Sun Yat-sen University, China.
I have been working on high level synthesis project which is implemented
as a back-end of LLVM, in the last two years. When I want to use Polly to
generate parallel LLVM-IR from Polly-IR for me as input to my high level
synthesis too ls at the begining of this year, I found that Polly's
back-end is not flexible and modular enough to allow user adding new
platform support.

You are right adding new platforms is probably not as straightforward as it could be and especially two months ago the Polly SIMD/OpenMP/Scalar code generation was strongly coupled. That made the code difficult to understand and to adapt. In the last weeks I restructured the code generation to remove the strong connection between those three. The situation should now be a lot better, even though there may still be possibilities to improve it.

So I am going to improve the existing back-end to an
adaptive and modular back-end which will unleash the power of Polly to
other projects.

This sentence sounds very nice, but it is actually not very descriptive.
The problem here is that it stands in the empty space. There are a many ways to modularize code. Some of them may be helpful and good as they
can improve the structure of the code and can simplify the addition of new features. However, modularization can also be counterproductive as it may add unnecessary overhead that complicates the understanding of the code (See first answer here [1]).

Making the back end structure easier to extend is a good thing. However, we should make sure we understand the needs of possible back ends such that we can choose a modularization that matches these needs.
This means simple to understand, but still flexible and well structured.

To be short, I plan to implement my proposal in steps as follows:
1. refactor the exis ting back-end to a preliminary modular back-end.

I think here and later in your proposal you should explain:

- What are the problems with the current code?

I do not say there are no problems, but to judge your proposal it is necessary to understand what you want to improve.

- What changes you want to perform here?

Do you want to move classes, define new interfaces, split some functionality into new classes, ...?

(You do not even propose the perfect solution. Feel free to say there are several solutions, which benefits they have and what still needs
to be investigated to decide which is best. You can even state you have no solution for a problem, but you plan a week to think about it. For such a proposal it is more important that people see you understood the problems and that you have an idea of how to approach them. In case you are unsure about something, it is better to say this and give a plan how you will proceed to find a solution than to propose a solution you are not yet sure about)

2. Add A CodeGen adapter, abstract away the detail of CLAst.

How would this look like? What would be the difference to the clast?

Again, I do not say we don't need this, but it would be helpful if you can express why this is needed and what features it should have. In general I prefer to have as little code as possible. That means for every code we add there needs to be an obvious benefit.

A solution without a CodeGen adapter may be to extract all the functionality within the clast into helper classes, create a very simple clast walker and call the helper classes there. We can do exactly the same for another new_codegen_ast, without having to define an CodeGen adapter. This does not need to be the better solution, but this is a possible alternative.

3. Implementation of Specific code generation class to the new back-end.
In this phase, I am going to implement a SIMD code generation class.

It would be great if you could make sure that your plan consists of gradual improvements. This means to build up as little parallel infrastructure as possible. At best, you add your improvements directly the current code generation. This will ensure it is properly tested and
it matches the needs we have. I especially want to be sure we do not end up in a situation where the OpenMP code generation still works in the old infrastructure and SIMD is done in the new one. Better make the old infrastructure gradually become the new one.

An ultimate construction of the new backend will be like the figure below:

[Original Polly codegeneration pass] [Click codegeneration[1]] [whole
function vectorize codegeneration[2]], ....
                                               \
        > /
                                                   \
        > /
                                                      \
         > /
                                                          \
         > /
                                                              \
         > /

   CodeGen adapter

/ \
                                                                      /
                    \
                                                                   /
                      \
                                            Cloog AST infrastructure
      XXX AST infrastructure ...
                                                                \
                         /
                                                                   \
                     /
                                                                      \
                   /

  SCOP (Polly IR)

I have currently no idea how your CodeGen adapter will look like. And I have especially no idea how you plan to perform whole function vectorization and click code generation based on Polly.

Some examples would definitely help.

After the new backend is done, I think it will easy for other developers
to implement a code generation pass which will meet the requirements of
their platforms or LLVM back-ends. So the optimized LLVM-IR could be
passed to the ordinary LLVM back-ends, such as the high level synthesis
back-end I mentioned above. It will be a great help to both the
developers and LLVM.

In general I would rather emphasize in your proposal the individual steps necessary, than to build everything around a currently not yet described 'CodeGen adapter'.

Tobias Grosser and Ether, who are the contributors of Polly, have
assented to be my mentor, If my proposal is approved. Is this idea good
enough for the Google summer of code? If it is, I am going to write a
proposal for it. Any suggestion is appreciated.

I think working on the back end of Polly is a good idea. There is still a lot of stuff that needs to be done there and I am sure a good proposal can be written for this. However, the best thing you can do is to start to write your proposal. Because only by doing this you will start to get into Polly and to understand what kind of problems exist and how you could solve them. Listing the problems, finding examples, possible solutions and creating some timeline to implement this will help here a lot. I am glad to give feedback (and promise to be reactive the days up to the deadline).

Besides your own ideas, you may also consider the following points:

- Extract the clast logic in a simple helper class (as proposed by ether and you) [4]

- Removing the independent blocks pass [5]

- Integrate a new code generator

We are planning to work on a new code generator that will solve several issues. Using your modularization to integrate this code generator would
be a perfect use case.

To have an idea what I think a proposal might look like you can have a look at my proposal a couple of years ago [2]. Ether should also have his old proposal. As it was accepted it was apparently good enough. :wink:

Another hint. It is in general very good if you already contributed some code. Fixing one of the existing Polly bugs would be a good start.
This one might not be too difficult [3]. (Let me know if you need help).

Cheers
Tobi

[1] language agnostic - When is abstraction and modularization a bad practice in programming? - Stack Overflow
[2] http://students.fim.uni-passau.de/~grosser/gcc_soc/
[3] http://llvm.org/bugs/show_bug.cgi?id=12311
[4] http://llvm.org/bugs/show_bug.cgi?id=12406
[5] http://llvm.org/bugs/show_bug.cgi?id=12398

Hi Tobi,

Thank you very much for the elaborate advises and sorry for the long delay. I have kept thinking your advises and investigate more detail of the existing code of Polly. Now, I re-evaluate my proposal and something interesting is added to the new one. :)

Hi Qingrui Liu,

sorry for replying slowly.

Here some comments:

Hi all,
My name is Qingrui Liu, a student at Sun Yat-sen University, China.
I have been working on high level synthesis project which is implemented

as a back-end of LLVM, in the last two years. When I want to use Polly to
generate parallel LLVM-IR from Polly-IR for me as input to my high level

synthesis too ls at the begining of this year, I found that Polly’s
back-end is not flexible and modular enough to allow user adding new
platform support.

You are right adding new platforms is probably not as straightforward as it could be and especially two months ago the Polly SIMD/OpenMP/Scalar code generation was strongly coupled. That made the code difficult to understand and to adapt. In the last weeks I restructured the code generation to remove the strong connection between those three. The situation should now be a lot better, even though there may still be possibilities to improve it.

Yes, I see the appreciable improvement you did. That’s great.

So I am going to improve the existing back-end to an
adaptive and modular back-end which will unleash the power of Polly to
other projects.

This sentence sounds very nice, but it is actually not very descriptive.
The problem here is that it stands in the empty space. There are a many ways to modularize code. Some of them may be helpful and good as they
can improve the structure of the code and can simplify the addition of new features. However, modularization can also be counterproductive as it may add unnecessary overhead that complicates the understanding of the code (See first answer here [1]).

Making the back end structure easier to extend is a good thing. However, we should make sure we understand the needs of possible back ends such that we can choose a modularization that matches these needs.
This means simple to understand, but still flexible and well structured.

Sorry, maybe it is too abstract for people to understand what is the power underlying the Polly. Now, Polly do optimization on the Loop in LLVM and generate a optimized loop back to LLVM. However, something like CodeGenForOpenMP should be independent from Polly which means, even without Polly, CodegenForOpenMP could applied individually in LLVM as a Pass which only accept the basic elements in LLVM,e.g. Loop, BasicBlock and Module etc… The situation is the same to the PTX back-end which only accept basic elements in LLVM e.g. Loop, BasicBlock and module etc… This is the way Polly should go and the power it should unleash.

But how to achieve this goal? It is absolutely an impasse with the existing back-end. So an adaptive and robust backend with an adapter is proposed to provide the LLVM basic element i.e. Loop in Polly to various back-ends in LLVM such as PTX back-end and OpenCL etc.

So It is the most pressing problem for Polly to provide an adaptive and robust backend for various back-end in LLVM. I will elaborate the adapter below.

To be short, I plan to implement my proposal in steps as follows:

  1. refactor the exis ting back-end to a preliminary modular back-end.

I think here and later in your proposal you should explain:

  • What are the problems with the current code?

I do not say there are no problems, but to judge your proposal it is necessary to understand what you want to improve.

  • What changes you want to perform here?

Do you want to move classes, define new interfaces, split some functionality into new classes, …?

(You do not even propose the perfect solution. Feel free to say there are several solutions, which benefits they have and what still needs
to be investigated to decide which is best. You can even state you have no solution for a problem, but you plan a week to think about it. For such a proposal it is more important that people see you understood the problems and that you have an idea of how to approach them. In case you are unsure about something, it is better to say this and give a plan how you will proceed to find a solution than to propose a solution you are not yet sure about)

The propose of the first step is as I mentioned above that some code like codegenForSequential, codegenForVector and codegenForOpenMP should be independent form the current Polly which mean they can be an individual pass applied in LLVM without Polly.

Also, some code refactor could be done by the way. such as addressing the bugs in polly while refatoring the existing backend into a modular one. such as:

  1. we can extract more independent functionality in the existing back-end into the helper classes in different files rather than assemble them in a single file (this will take time to do so and need to be well tested), I have made the first step of it in patch[1]. More steps will be done in the coming summer of code.

  2. as you have post in the Todo list, the existing Polly back-end is currently centered on the CLAST. This will be another challenge to this project.

  3. Another problem come up to me while I was fixing the bug of Polly[2]. The existing back-end use i64 everywhere to ensure a temporal correct result. Other project which may use date type larger than i64 may cause failure in Polly. So in the new code generator, we should be able to calculate the minimal type that is always large enough to ensure the correctness of the program.

  1. Add A CodeGen adapter, abstract away the detail of CLAst.

How would this look like? What would be the difference to the clast?

Again, I do not say we don’t need this, but it would be helpful if you can express why this is needed and what features it should have. In general I prefer to have as little code as possible. That means for every code we add there needs to be an obvious benefit.

A solution without a CodeGen adapter may be to extract all the functionality within the clast into helper classes, create a very simple clast walker and call the helper classes there. We can do exactly the same for another new_codegen_ast, without having to define an CodeGen adapter. This does not need to be the better solution, but this is a possible alternative.

Here, we come the most controversial part of this proposal. As I mentioned above, the adapter proposed here could provide user-friendly interfaces to various back-ends in LLVM to get the Loop optimized by Polly. Also, some basic interfaces which can be a virtual method to be inherited by the back-ends and passes in LLVM could be established such as doAPIDefinition,doInitializationLoop, doFinalizationLoop, runOnLoop, doInitializationStmt, doFinalizationStmt and runOnStmt, etc… these methods (interfaces) provide back-ends or passes in LLVM to implement their platform-specifc requirement.

Take the recently hot topic on PTX back-end for example, the PTX back-end coiuld call the doAPIDefinition method to applied their specific API to the Loop in Polly. It is greatly benefit developers who would like to make use of Polly and their parallel computing platforms.

  1. Implementation of Specific code generation class to the new back-end.
    In this phase, I am going to implement a SIMD code generation class.

It would be great if you could make sure that your plan consists of gradual improvements. This means to build up as little parallel infrastructure as possible. At best, you add your improvements directly the current code generation. This will ensure it is properly tested and
it matches the needs we have. I especially want to be sure we do not end up in a situation where the OpenMP code generation still works in the old infrastructure and SIMD is done in the new one. Better make the old infrastructure gradually become the new one.

An ultimate construction of the new backend will be like the figure below:

[Original Polly codegeneration pass] [Click codegeneration[1]] [whole
function vectorize codegeneration[2]], …
\

/

/

/

/

/

CodeGen adapter

/
/

/

Cloog AST infrastructure
XXX AST infrastructure …

/

/

/

SCOP (Polly IR)

I have currently no idea how your CodeGen adapter will look like. And I have especially no idea how you plan to perform whole function vectorization and click code generation based on Polly.

Some examples would definitely help.

After the new backend is done, I think it will easy for other developers
to implement a code generation pass which will meet the requirements of
their platforms or LLVM back-ends. So the optimized LLVM-IR could be
passed to the ordinary LLVM back-ends, such as the high level synthesis
back-end I mentioned above. It will be a great help to both the
developers and LLVM.

In general I would rather emphasize in your proposal the individual steps necessary, than to build everything around a currently not yet described ‘CodeGen adapter’.

Tobias Grosser and Ether, who are the contributors of Polly, have
assented to be my mentor, If my proposal is approved. Is this idea good
enough for the Google summer of code? If it is, I am going to write a
proposal for it. Any suggestion is appreciated.

I think working on the back end of Polly is a good idea. There is still a lot of stuff that needs to be done there and I am sure a good proposal can be written for this. However, the best thing you can do is to start to write your proposal. Because only by doing this you will start to get into Polly and to understand what kind of problems exist and how you could solve them. Listing the problems, finding examples, possible solutions and creating some timeline to implement this will help here a lot. I am glad to give feedback (and promise to be reactive the days up to the deadline).

Besides your own ideas, you may also consider the following points:

  • Extract the clast logic in a simple helper class (as proposed by ether and you) [4]

  • Removing the independent blocks pass [5]

  • Integrate a new code generator

We are planning to work on a new code generator that will solve several issues. Using your modularization to integrate this code generator would
be a perfect use case.

Thank you for these valuable advises. My proposal will be much better with your advises.

To have an idea what I think a proposal might look like you can have a look at my proposal a couple of years ago [2]. Ether should also have his old proposal. As it was accepted it was apparently good enough. :wink:

Another hint. It is in general very good if you already contributed some code. Fixing one of the existing Polly bugs would be a good start.
This one might not be too difficult [3]. (Let me know if you need help).

Cheers
Tobi

In conclusion, Polly with a adaptive and robust back-end would greatly help developers who would like to use Polly with their parallel computing platfroms and it is the most pressing problem for Polly to make such a back-end.:)

Sincerely,
Tsingray

PS. We have come to an agreement that all these should be done patch by patch upon the current back-end.

Hi Tobi,

Thank you very much for the elaborate advises and sorry for the long
delay. I have kept thinking your advises and investigate more detail of
the existing code of Polly. Now, I re-evaluate my proposal and something
interesting is added to the new one. :)

:slight_smile:

        So I am going to improve the existing back-end to an
        adaptive and modular back-end which will unleash the power of
        Polly to
        other projects.

    This sentence sounds very nice, but it is actually not very descriptive.
    The problem here is that it stands in the empty space. There are a
    many ways to modularize code. Some of them may be helpful and good
    as they
    can improve the structure of the code and can simplify the addition
    of new features. However, modularization can also be
    counterproductive as it may add unnecessary overhead that
    complicates the understanding of the code (See first answer here [1]).

    Making the back end structure easier to extend is a good thing.
    However, we should make sure we understand the needs of possible
    back ends such that we can choose a modularization that matches
    these needs.
    This means simple to understand, but still flexible and well structured.

Sorry, maybe it is too abstract for people to understand what is the
power underlying the Polly. Now, Polly do optimization on the Loop in
LLVM and generate a optimized loop back to LLVM. However, something like
  CodeGenForOpenMP should be independent from Polly which means, even
without Polly, CodegenForOpenMP could applied individually in LLVM as a
Pass which only accept the basic elements in LLVM,e.g. Loop, BasicBlock
and Module etc.. The situation is the same to the PTX back-end which
only accept basic elements in LLVM e.g. Loop, BasicBlock and module
etc.. This is the way Polly should go and the power it should unleash.

This is a possible way, but to me it is not entirely clear this the way to go. If you believe so you should state the benefits and the drawbacks and explain why the benefits out-weight the drawbacks.

Possible problems:

o Compile time overhead

Generating sequential code, reanalyzing it and transforming it may
need more compile time than directly emitting the right code.

o How to derive necessary information

If you work directly on LLVM-IR you need to derive information is readily available in Polly. This requires additional code to be written
and/or the definition of some meta-data nodes. When using meta-data we need to find a way to ensure that the communication Polly -> later optimizer works well.

o It needs engineering time to develop this.

For the moment I try to keep as much of the functionality as possible generic, but try to avoid going back to LLVM-IR and reanalysing it. Mainly because because of point 2).

I would love to be shown that even more code can be factored out of Polly.

But how to achieve this goal? It is absolutely an impasse with the
existing back-end. So an adaptive and robust backend with an adapter is
proposed to provide the LLVM basic element i.e. Loop in Polly to various
back-ends in LLVM such as PTX back-end and OpenCL etc.

'Adaptive and robust backend' does not state anything. The question is how you will solve the problems seen above.

So It is the most pressing problem for Polly to provide an adaptive and
robust backend for various back-end in LLVM. I will elaborate the
adapter below.

>

        To be short, I plan to implement my proposal in steps as follows:
        1. refactor the exis ting back-end to a preliminary modular
        back-end.

    I think here and later in your proposal you should explain:

    - What are the problems with the current code?

    I do not say there are no problems, but to judge your proposal it is
    necessary to understand what you want to improve.

    - What changes you want to perform here?

    Do you want to move classes, define new interfaces, split some
    functionality into new classes, ...?

    (You do not even propose the perfect solution. Feel free to say
    there are several solutions, which benefits they have and what still
    needs
    to be investigated to decide which is best. You can even state you
    have no solution for a problem, but you plan a week to think about
    it. For such a proposal it is more important that people see you
    understood the problems and that you have an idea of how to approach
    them. In case you are unsure about something, it is better to say
    this and give a plan how you will proceed to find a solution than to
    propose a solution you are not yet sure about)

The propose of the first step is as I mentioned above that some code
like codegenForSequential, codegenForVector and codegenForOpenMP should
be independent form the current Polly which mean they can be an
individual pass applied in LLVM without Polly.

What would codegenForSequential do without Polly?

What would codegenForVector do without Polly? Is it some kind of LoopVectorizer or a BasicBlockVectorizer? I doubt you can reuse any of the existing code for this?

Having an LLVM-IR OpenMP generation pass may be interesting to several people, but you should definitely describe how you would pass information about parallel loops, shared/private variables to this pass.

Having some of them separate may be a good thing, but this needs more planning. Please allocate time for planning instead of proposing a solution that is not yet 100% clear.

Also, some code refactor could be done by the way. such as addressing
the bugs in polly while refatoring the existing backend into a modular
one. such as:

1. we can extract more independent functionality in the existing
back-end into the helper classes in different files rather than assemble
them in a single file (this will take time to do so and need to be well
tested), I have made the first step of it in patch[1]. More steps will
be done in the coming summer of code.

Good.

2. as you have post in the Todo list, the existing Polly back-end is
currently centered on the CLAST. This will be another challenge to this
project.

Yes, we can do better here.

3. Another problem come up to me while I was fixing the bug of Polly[2].
The existing back-end use i64 everywhere to ensure a temporal correct
result. Other project which may use date type larger than i64 may cause
failure in Polly. So in the new code generator, we should be able to
calculate the minimal type that is always large enough to ensure the
correctness of the program.

This problem should be solved somehow.

        2. Add A CodeGen adapter, abstract away the detail of CLAst.

    How would this look like? What would be the difference to the clast?

    Again, I do not say we don't need this, but it would be helpful if
    you can express why this is needed and what features it should have.
    In general I prefer to have as little code as possible. That means
    for every code we add there needs to be an obvious benefit.

    A solution without a CodeGen adapter may be to extract all the
    functionality within the clast into helper classes, create a very
    simple clast walker and call the helper classes there. We can do
    exactly the same for another new_codegen_ast, without having to
    define an CodeGen adapter. This does not need to be the better
    solution, but this is a possible alternative.

Here, we come the most controversial part of this proposal. As I
mentioned above, the adapter proposed here could provide user-friendly
  interfaces to various back-ends in LLVM to get the Loop optimized by
Polly.

What do you mean by LLVM back-ends? Can you give examples of such interfaces?

Also, some basic interfaces which can be a virtual method to be
inherited by the back-ends and passes in LLVM could be established such
as doAPIDefinition,doInitializationLoop, doFinalizationLoop, runOnLoop,
doInitializationStmt, doFinalizationStmt and runOnStmt, etc.. these
methods (interfaces) provide back-ends or passes in LLVM to implement
their platform-specifc requirement.

OK. This needs probably more discussion, but it may be interesting.

This is adapter related in some way to your 'running passes without Polly' proposal. I am asking because if you would have passes that run without Polly, Polly could just always generate sequential code, annotate this code with additional information and offload all the other work to the polly independent passes. (I am not convinced this is the way to go, but in case you want to go there, is there still a need for a CodeGenAdapter?)

Take the recently hot topic on PTX back-end for example, the PTX
back-end coiuld call the doAPIDefinition method to applied their
specific API to the Loop in Polly. It is greatly benefit developers who
would like to make use of Polly and their parallel computing platforms.

This is probably a very good example. (Or the OpenMP stuff)

A short description of how this should look like might be good.

    To have an idea what I think a proposal might look like you can have
    a look at my proposal a couple of years ago [2]. Ether should also
    have his old proposal. As it was accepted it was apparently good
    enough. :wink:

    Another hint. It is in general very good if you already contributed
    some code. Fixing one of the existing Polly bugs would be a good start.
    This one might not be too difficult [3]. (Let me know if you need help).

    Cheers
    Tobi

In conclusion, Polly with a adaptive and robust back-end would greatly
help developers who would like to use Polly with their parallel
computing platfroms and it is the most pressing problem for Polly to
make such a back-end.:)

:slight_smile:

Sincerely,
Tsingray

PS. We have come to an agreement that all these should be done patch by
patch upon the current back-end.

Yes, please. :wink: