"Graphite" for llvm

Hi ether,

hi,

dose anyone going/planning to add something like
Graphite(http://gcc.gnu.org/wiki/Graphite) in gcc to llvm(or that
should be implement at the level of clang?)?

I already looked into implementing something like Graphite for LLVM. However just recently, so I have not released any code yet. As soon as some code is available I will post patches.

Anybody who wants to work on the polyhedral model in LLVM, is invited to the Graphite mailing list, so we can share ideas.

Here some information about graphite like optimizations in LLVM.

A short introduction to the current state of GCC/Graphite:

hi Tobi ,

that sounds greate :smiley:

I already looked into implementing something like Graphite for LLVM. However just recently, so I have not released any code yet. As soon as some code is available I will post patches.

whats its status? do you need any help?

A general plan to implement polyhedral transformations in LLVM:

1. The identity transformation (LLVM->polyedral->LLVM)

Create the polyhedral representation of the LLVM IR, do nothing with it, and generate LLVM IR from the polyhedral representation. (Enough to attach external optimizers)

1.1 Detect regions
1.2 Translate LLVM IR to polyhedral model
-----------------------------------------

The first step will be to analyze the LLVM intermediate language and extract control flow regions that can be analyzed using the polyhedral model. This is mainly based on the scalar evolution analysis and should be more or less like the detection in Graphite.
One point I do not yet fully understand is how to get array access functions from the LLVM-IR. (Probably based on getElementPtr)

as i know, getElementPtr combine the base pointer and the offset to generate the pointer to the given element, like:
&(b[i]) ==> %8 = getelementptr i32* %b, i32 %i.03 ;1-dimension
and &(c[i][j]) ==> %4 = getelementptr [4 x i32]* %c, i32 %i.0.reg2mem.0.ph.ph, i32 %j.0.reg2mem.0.ph ;2-dimension
so maybe you could check the non-pointer operand of the getelementptr instructrion.

Another question is which polyhedral library can be used inside LLVM. One option would be ISL from Sven Verdoolaege (LGPL) another the PPL from Roberto Bagnara (GPLv3).

1.3 Generate LLVM IR from polyhedral mode
-----------------------------------------

For code generation the CLooG/isl library can be used. It is LGPL licensed. Sven will also work on an CLooG using the PPL, so this could also be an option.

2. Optimize on the polyhedral representation

2.1 Use external optimizers
---------------------------

The polyhedral loop description is simple and not compiler depended. Therefore external tools like LooPo (automatic parallelization), Pluto (optimization in general)

i had contacted the author a week ago, and if we use it, we need a IR generator in llvm side to extract SCoP, and the corresponding parser in Pluto side that read, then a backend for cloog and the corresponding in parser llvm that read those stuff back to llvm. :slight_smile:

or even Graphite

hows the statue of PCP? could us just use the PCP language with annotation?

might be used to optimize code. This could give a first impression what to expect from the polyhedral model in LLVM.

There are also affords to establish an interchangeable polyhedral format (scoplib - Louis-Noel Pouchet) and to generate a polyhedral compilation package. These will allow to share/exchange optimizations between different compilers and research tools.

Furthermore an external interface will enable researchers to use LLVM for their work on polyhedral optimizations. This might be useful as there is no polyhedral compiler for any dynamic language yet. Also recent work on optimizing for GPUs has started in the polyhedral community, so the LLVM OpenCL implementation might be interesting too.

if that targeting some entertainment application instead of scientific computing, implement the polyhedral framework beyond the llvm "support" and "system" library, to allow those thing run as native windows application is a good idea, i think.

2.2 Implement optimizations in LLVM
-----------------------------------

Useful optimizations could be imported into / rewritten in LLVM. For optimizations that transfrom the LLVM-IR like vectorization or automatic parallelization at least some part of the optimizations has to be in LLVM anyway.

the information that compute in polyhedral framework such like iteration domain and affine schedule may all llvm to do some interesting optimization like generate stream processing code from normal c code (and to optimize pointer access boundary check?)

Enjoy your holidays

the same to you guys :slight_smile:

best regards

--ether

ether wrote:

The polyhedral loop description is simple and not compiler depended. Therefore external tools like LooPo (automatic parallelization), Pluto (optimization in general)

i had contacted the author a week ago, and if we use it, we need a IR generator in llvm side to extract SCoP, and the corresponding parser in Pluto side that read, then a backend for cloog and the corresponding in parser llvm that read those stuff back to llvm. :slight_smile:

or even Graphite

hows the statue of PCP? could us just use the PCP language with annotation?

Hi all,

PCP is only partially implemented: conversion out of PCP to Graphite is not implemented, but the existing code would definitely help anybody working in interfacing other tools with PCP. The main people to contact are
"Sjodin, Jan" <Jan.Sjodin@amd.com>
Sebastian Pop <sebpop@gmail.com>

Work on PCP has stalled because Jan has left for another group.

In the hort term, Tobias is right that the best way to interface tools is to use the scoplib format and library from Louis-Noel Pouchet (PoCC).

In any case, work on either PCP or scoplib will be beneficial. The community of polyhedral compilation experts is small, and its openly available output is already diminished by two proprietary development projects at IBM and Reservoir Labs. I strongly wish that any effort within LLVM will be wise enough to join forces with Graphite :slight_smile:

Albert

PCP is only partially implemented: conversion out of PCP to Graphite is not
implemented,

Actually Gimple to PCP to Graphite is implemented in some extent,
but there still are plenty of bugs and we should work on the out of
Graphite to PCP to Gimple/LLVM if we want to get rid of all these bugs.
Also the patches for these are not yet in the Graphite branch,
just in local trees for now.

but the existing code would definitely help anybody working in
interfacing other tools with PCP. The main people to contact are
"Sjodin, Jan" <Jan.Sjodin@amd.com>
Sebastian Pop <sebpop@gmail.com>

Work on PCP has stalled because Jan has left for another group.

Jan, what's the plan to integrate PCP in GCC and/or LLVM?

Jan, what’s the plan to integrate PCP in GCC and/or LLVM?

Currently thre is no concrete plan, or at least nothing immediate.
I believe a modular approach to loop transformations is desirable.
The main purpose of PCP is to get a common interface between a lower level
IR like SSA and higher level optimizations done on structured code, but one
could also directly translate from a front-end to PCP. On the
analysis/transformation side there could be several implementations,
including the polyhedral model. What needs to be done is to clearly specify
how different information should be encoded. Debug info, costs for
heuristics etc. The second benefit of having PCP is to be able to read and
write code fragments to a file. LLVM has some of these features already,
but SSA is not the right level. Implementing views on top of SSA might be
possible, using program structure trees and a lot of bookkeeping, but that
will also reduce the modularity code.

Front End ← → Polyhedral Model
\ /
==> PCP <==
/ ^
SSA ← | → Classical Loop transforms

v
Text

I believe it would not be too much work to LLVM-ify the current
implementation of PCP since it is all written in C++. Most of the work
would be to translate implementations of PCP data ADTs to use the LLVM ones.
There are however a few areas that need to be addressed:

  1. Memory management.

  2. Error handling.

  3. Improved type system, since we need to be able to reason better about
    induction variables and size of data. Alignment may come in here as well.

  4. Different properties must be encoded and some should perhaps put into
    the PCP language itself e.g. reduction operations.

I can look into how the current code can be released, or contributed in order
for other people to work on it.

  • Jan

hi Tobi ,

that sounds greate :smiley:

I already looked into implementing something like Graphite for LLVM.
However just recently, so I have not released any code yet. As soon as
some code is available I will post patches.

whats its status? do you need any help?

Very recently started. That's why I did not talk about this on my own.

However help is always needed. As already said. Getting affine access functions from the memory accesses would be a big help.

A general plan to implement polyhedral transformations in LLVM:

1. The identity transformation (LLVM->polyedral->LLVM)

Create the polyhedral representation of the LLVM IR, do nothing with
it, and generate LLVM IR from the polyhedral representation. (Enough
to attach external optimizers)

1.1 Detect regions
1.2 Translate LLVM IR to polyhedral model
-----------------------------------------

The first step will be to analyze the LLVM intermediate language and
extract control flow regions that can be analyzed using the polyhedral
model. This is mainly based on the scalar evolution analysis and
should be more or less like the detection in Graphite.
One point I do not yet fully understand is how to get array access
functions from the LLVM-IR. (Probably based on getElementPtr)

as i know, getElementPtr combine the base pointer and the offset to
generate the pointer to the given element, like:
&(b[i]) ==> %8 = getelementptr i32* %b, i32 %i.03 ;1-dimension
and &(c[i][j]) ==> %4 = getelementptr [4 x i32]* %c, i32
%i.0.reg2mem.0.ph.ph, i32 %j.0.reg2mem.0.ph ;2-dimension
so maybe you could check the non-pointer operand of the getelementptr
instructrion.

Probably. I think for single dimensional arrays it will not be too difficult using scalar evolution to get the access functions. I think multi dimensional arrays will get complicated.

Another question is which polyhedral library can be used inside LLVM.
One option would be ISL from Sven Verdoolaege (LGPL) another the PPL
from Roberto Bagnara (GPLv3).

1.3 Generate LLVM IR from polyhedral mode
-----------------------------------------

For code generation the CLooG/isl library can be used. It is LGPL
licensed. Sven will also work on an CLooG using the PPL, so this could
also be an option.

2. Optimize on the polyhedral representation

2.1 Use external optimizers
---------------------------

The polyhedral loop description is simple and not compiler depended.
Therefore external tools like LooPo (automatic parallelization), Pluto
(optimization in general)

i had contacted the author a week ago, and if we use it, we need a IR
generator in llvm side to extract SCoP, and the corresponding parser in
Pluto side that read, then a backend for cloog and the corresponding in
parser llvm that read those stuff back to llvm. :slight_smile:

The way to go is the scoplib format (propably extended by quantified variables). This format could be extracted from graphite easily and could also be created in LLVM.
What we need to get back into LLVM is only the new optimized schedule described e.g. as cloog like scattering functions. These can be parsed easily. The real code generation could be done internally, so it is not necessary to parse the generated from external tools.

or even Graphite

hows the statue of PCP? could us just use the PCP language with annotation?

See Jan's mail.
The first steps we have to take are not yet dependend on PCP, later on it might be useful for debugging and better readability of the test cases.

might be used to optimize code. This could give a first impression
what to expect from the polyhedral model in LLVM.

There are also affords to establish an interchangeable polyhedral
format (scoplib - Louis-Noel Pouchet) and to generate a polyhedral
compilation package. These will allow to share/exchange optimizations
between different compilers and research tools.

Furthermore an external interface will enable researchers to use LLVM
for their work on polyhedral optimizations. This might be useful as
there is no polyhedral compiler for any dynamic language yet. Also
recent work on optimizing for GPUs has started in the polyhedral
community, so the LLVM OpenCL implementation might be interesting too.

if that targeting some entertainment application instead of scientific
computing, implement the polyhedral framework beyond the llvm "support"
and "system" library, to allow those thing run as native windows
application is a good idea, i think.

Sure.

2.2 Implement optimizations in LLVM
-----------------------------------

Useful optimizations could be imported into / rewritten in LLVM. For
optimizations that transfrom the LLVM-IR like vectorization or
automatic parallelization at least some part of the optimizations has
to be in LLVM anyway.

the information that compute in polyhedral framework such like iteration
domain and affine schedule may all llvm to do some interesting
optimization like generate stream processing code from normal c code
(and to optimize pointer access boundary check?)

Yes

See you

Tobi

Tobias Grosser wrote:

The way to go is the scoplib format (propably extended by quantified variables). This format could be extracted from graphite easily and could also be created in LLVM.
What we need to get back into LLVM is only the new optimized schedule described e.g. as cloog like scattering functions. These can be parsed easily. The real code generation could be done internally, so it is not necessary to parse the generated from external tools.

By the way, Konrad Trifunovic is interested (for his own research) to improve on the current read/write capabilities of Graphite to process the full scoplib format instead. Dealing with iteration domain and schedule/scattering is easy, but to process array subscripts will require more work. We will keep you informed of the progresses on the Graphite mailing list, but collaboration outside the Graphities is welcome.

Albert

If you want to know how the address is calculated as a function of
each enclosing loop, ScalarEvolution should be quite usable for
multiple dimensions. This is represented with an add-recurrence
with another add-recurrence as its "start" operand. For example:

  for (i=0; i<m; ++i) // X
    for (j=0; j<n; ++j) // Y
      A[i][j];

The store address has this expression:

  {{A,+,sizeof(double)*n}<X>,+,sizeof(double)}<Y>

This says that the value starts at A, steps by sizeof(double)*n
(address units) with the iteration of loop X, and steps by
sizeof(double) with the iteration of loop Y.

However, if you want to recognize this as a high-level array
reference on A with subscripts "i" and then "j", there are some
missing pieces.

On a related note, analyzing getelementptr yourself directly is
doable, but there are several major complications. Using a helper
library such as ScalarEvolution can protect you from many
low-level artifacts (though not all).

Dan

Probably. I think for single dimensional arrays it will not be too
difficult using scalar evolution to get the access functions. I think
multi dimensional arrays will get complicated.

If you want to know how the address is calculated as a function of
each enclosing loop, ScalarEvolution should be quite usable for
multiple dimensions. This is represented with an add-recurrence
with another add-recurrence as its "start" operand. For example:

   for (i=0; i<m; ++i) // X
     for (j=0; j<n; ++j) // Y
       A[i][j];

The store address has this expression:

   {{A,+,sizeof(double)*n}<X>,+,sizeof(double)}<Y>

This says that the value starts at A, steps by sizeof(double)*n
(address units) with the iteration of loop X, and steps by
sizeof(double) with the iteration of loop Y.

However, if you want to recognize this as a high-level array
reference on A with subscripts "i" and then "j", there are some
missing pieces.

You are right, starting with the ScalarEvolution is the right approach. However I believe the high level array analysis might be useful to get simpler expressions. I have to think about this later on.

On a related note, analyzing getelementptr yourself directly is
doable, but there are several major complications. Using a helper
library such as ScalarEvolution can protect you from many
low-level artifacts (though not all).

Sure, it is a great tool.

Tobias

Dear colleagues,

Happy New Year!

Just wanted to mention that since cTuning community is interested in
performance/code size/power tuning using empirical feedback-directed techniques
using different compilers including GCC and LLVM, so just wanted to mention that
we would be interested to add support to fine-grain optimizations from GRAPHITE
to Interactive Compilation Interface at some point at the beginning
of this year to be able to use cTuning optimization framework directly
and share optimization cases with the community.

Will keep in touch,
Grigori

hi Tobi,

i just added the Poly library(http://wiki.llvm.org/Polyhedral_optimization_framework) to llvm build system, which only contain a toy pass "Poly".
i think we could add the polyhedral optimization stuff in to this library.

it was test under cmake+visual studio 2009, and i also add the library build rule to MAKEFILEs, but not sure if it work under linux/cygwin/mac, sorry.
hope this help

best regards

--ether

polylib.patch (7.04 KB)

hi ether,

I pushed your work to our git repository at
http://repo.or.cz/w/llvm-complete/pofl.git

So we can work on first version that could be committed to the LLVM svn repository.

I just had some discussions on the LLVM IRC channel concerning the integration of Poly.

[the complete mail]

hi ether,

great start.

I pushed your work to our git repository at
http://repo.or.cz/w/llvm-complete/pofl.git

This repository is ment to track work on a first version of LLVM Poly that could be proposed for integration to LLVM. Feel free to register as a user and to commit to the repository.

I just had some discussions on the LLVM IRC channel concerning the integration of Poly.

The preferred way to implement this seemed to be like the tools/clang integration in llvm. A separated repository (that might also be hosted on the llvm svn server), that is only build if available. So user can decide if they want to try LLVMPoly and check it out on demand.

What we need to achieve before proposing a patchset:

1. Integrate ClooG/isl
2. A working pass Frontend/Backend (very limited, no optimizations, but able to handle real world code)

Thanks for working on this

Tobi

hi Tobi,

hi Tobi,

i just added the Poly
library(http://wiki.llvm.org/Polyhedral_optimization_framework) to llvm
build system, which only contain a toy pass "Poly".
i think we could add the polyhedral optimization stuff in to this library.

it was test under cmake+visual studio 2009, and i also add the library
build rule to MAKEFILEs, but not sure if it work under linux/cygwin/mac,
sorry.
hope this help

best regards

--ether

[the complete mail]

hi ether,

great start.

I pushed your work to our git repository at
http://repo.or.cz/w/llvm-complete/pofl.git

This repository is ment to track work on a first version of LLVM Poly that could be proposed for integration to LLVM. Feel free to register as a user and to commit to the repository.

ok :slight_smile: i just learning how to use git yesterday

I just had some discussions on the LLVM IRC channel concerning the integration of Poly.

The preferred way to implement this seemed to be like the tools/clang integration in llvm. A separated repository (that might also be hosted on the llvm svn server), that is only build if available. So user can decide if they want to try LLVMPoly and check it out on demand.

got it, i think we could move it out of opt after we finished the first implement. right now we could play with this dirty library first.

What we need to achieve before proposing a patchset:

1. Integrate ClooG/isl
2. A working pass Frontend/Backend (very limited, no optimizations, but able to handle real world code)

Thanks for working on this

you are welcome.

Tobi

best regards

--ether

hi all,

In LLVM we could add support for generalized CFG regions and RegionPasses. A region is a part of the CFG. The only information we have is, that it has one entry and one exit, this it can be optimized separately.
I think this is the best way to add region analysis. I must admit this approach
helps me on another, similar project I'm working on in parallel (no pun intended).
Tobias, is this how you are architecting your region analysis already?

John

i just implementing the skeleton of Region/RegionInfo like LoopBase and LoopInfoBase[1] in the llvm existing codes, and found that theres lots of common between "Region" and "Loop":

1. both of them are consist of several BasicBlocks
2. both of them have some kind of nested structures, so both a loop and a region could have parent or childrens
3. both of them have a BasicBlocks(header of a loop and "entry" of a region) that dominates all others

and the Region class will have the most stuffs very similar in LoopBase, like: ParentRegion, SubRegions, Blocks, getRegionDepth(), getExitBlock(), getExitingBlock() ......

so, could us just treat "Loop" as some kind of general "Region" of BasicBlocks, and make Loop and Region inherit from "RegionBase"?

[1] http://llvm.org/doxygen/LoopInfo_8h-source.html

best regards
--ether

sorry that i forgot to change the subjuect

hi all,

In LLVM we could add support for generalized CFG regions and RegionPasses. A region is a part of the CFG. The only information we have is, that it has one entry and one exit, this it can be optimized separately.
I think this is the best way to add region analysis. I must admit this approach
helps me on another, similar project I'm working on in parallel (no pun intended).
Tobias, is this how you are architecting your region analysis already?

John

i just implementing the skeleton of Region/RegionInfo like LoopBase and LoopInfoBase[1] in the llvm existing codes, and found that theres lots of common between "Region" and "Loop":

1. both of them are consist of several BasicBlocks
2. both of them have some kind of nested structures, so both a loop and a region could have parent or childrens
3. both of them have a BasicBlocks(header of a loop and "entry" of a region) that dominates all others

and the Region class will have the most stuffs very similar in LoopBase, like: ParentRegion, SubRegions, Blocks, getRegionDepth(), getExitBlock(), getExitingBlock() ......

so, could us just treat "Loop" as some kind of general "Region" of BasicBlocks, and make Loop and Region inherit from "RegionBase"?

[1] http://llvm.org/doxygen/LoopInfo_8h-source.html

best regards
--ether

sorry that i forgot to change the subjuect

Hi ether,

sounds interesting. Actually is/may be some kind of region. If you want you can have a look at the analysis, that I wrote. It is not yet finished, not completely documented and work in progress. However the first big comment might be interesting for you. Or seeing the results of
opt -regions -analyze

The git repo to see it is here:
http://repo.or.cz/w/llvm-complete/tobias-sandbox.git/shortlog/refs/heads/region

I will think about this and maybe reply again.

Tobi

hi Tobi

sorry that i forgot to change the subjuect

Hi ether,

sounds interesting. Actually is/may be some kind of region. If you want you can have a look at the analysis, that I wrote. It is not yet finished, not completely documented and work in progress. However the first big comment might be interesting for you. Or seeing the results of
opt -regions -analyze

The git repo to see it is here:
http://repo.or.cz/w/llvm-complete/tobias-sandbox.git/shortlog/refs/heads/region

that make sense to me, and if you make your Region class a subclass of LoopBase, the codes like "addChildLoop" and "getLoopDepth()" from LoopBase may help you a lot to manipulate regions in the later optimization passes (of course, we should give it a more meaningful name like "addChildRegion") :slight_smile:

and i think if we ignore the "goto" statement and "return" statement (i remember theres a pass in llvm that will make a function only return in one basicblock) in loops, loops also will have only one exit block, so we can treat loop as a special region that have back edge, and we can say, a loop must be a region but a region not necessary a region.

we can read something about this in <<Compilers: Principles Techniques and Tools, Second Edition>>, 9.7 region-based analysis.

best regards
--ether

sorry that i forgot to change the subjuect

hi all,

Hi ether,

now a kind of more complete answer.

In LLVM we could add support for generalized CFG regions and
RegionPasses. A region is a part of the CFG. The only information we
have is, that it has one entry and one exit, this it can be optimized
separately.
I think this is the best way to add region analysis. I must admit this
approach
helps me on another, similar project I'm working on in parallel (no
pun intended).
Tobias, is this how you are architecting your region analysis already?

John

i just implementing the skeleton of Region/RegionInfo like LoopBase and
LoopInfoBase[1] in the llvm existing codes, and found that theres lots
of common between "Region" and "Loop":

1. both of them are consist of several BasicBlocks

Correct.

2. both of them have some kind of nested structures, so both a loop and
a region could have parent or childrens

Correct.

3. both of them have a BasicBlocks(header of a loop and "entry" of a
region) that dominates all others

Correct.

and the Region class will have the most stuffs very similar in LoopBase,
like: ParentRegion, SubRegions, Blocks, getRegionDepth(),

Correct.

getExitBlock(), getExitingBlock() ......

This might need some thoughts,

so, could us just treat "Loop" as some kind of general "Region" of
BasicBlocks, and make Loop and Region inherit from "RegionBase"?

I would like to do so, as I like the structure of this approach.
However until now my pass was written on the side, as a proof of concept.

I wrote two Passes:

1. Regions
Detect the regions and print the regions tree. Try it with:
opt -regions -analyze file.bc

2. RegionsWithoutLoops
Find the maximal regions that do not contain any loops. Try it with:
opt -regions-without-loops file.bc
opt -view-regions-without-loops file.bc (needs graphviz)

Both ATM only work on BasicBlocks. However I have seen the patches in
your sandbox and I really like the idea to keep the analysis general.

If you are interested you could have a look at my sandbox (not yet well
documented and cleanly formatted).

We might want to think about, how to merge our work.

Tobi

Why not use the "standard" algorithm for detecting SESE-regions and building a program structure tree?
It should handle everything you want. It also becomes much simpler to specify a connected SESE-region
by entry/exit edges, while a disconnected region is specified by entry/exit blocks. Only defining regions on
blocks is not enough to be able to quickly determine how to replace/move a region in a CFG.

The algorithm can be found in this paper:
The Program Structure Tree: Computing Control Regions in Linear Time
by Richard Johnson , David Pearson , KeshavPingali

- Jan