Array Dependence Analysis

As part of the advanced compilers course semester project (at UIUC), we are starting to implement array dependence analysis for LLVM.
As of now we are considering GCD and Banerjee tests.

Any suggestion on features, tests and/or interface are welcome.

Our deadline is at the beginning of may so hopefully by then we will have a working prototype to submit.

Hi,

As part of the advanced compilers course semester project (at UIUC), we
are starting to implement array dependence analysis for LLVM.

I'm currently working on a similar project and hoping to finish it in
about two weeks. I am going to share the code when it's ready. I've
spent some time analyzing LLVM code for scientific and "ordinary"
programs to find out what is necessary to make the IR-level array
dependence analysis usable. I've designed techniques that I know will
work for chosen programs, but the general precision of the pass
implementation is still a mystery to me. If it appears acceptable, you
may find some concepts useful for your project. Otherwise, you'll at
least know what isn't useful:)

As for now, I'm focusing on the dependency analysis engine and publish
only a simplistic interface (it only allows to check whether a loop
carries a memory dependence). However, the engine is quite
self-contained and should be easy to attach to a different interface.
For instance, the engine will be able to find dependence direction
vectors for an instruction pair.

Any suggestion on features, tests and/or interface are welcome.

From my experience I can say it's essential to use a precise alias

analysis as the base for the array dependence analysis. Fortunately,
using Data Structure or Andersen's AA for the whole program can provide
really good results.

Also, on this list I was advised some time ago to look at
the Omega test:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2008-January/012185.html
Having taken a look at the Omega library interface I think it shouldn't
be a complex task to use it for dependence testing. I'm thinking of
adding it to my implementation in the future. It could probably replace
Banerjee test. If you're interested, I advise to use the version patched
by Jerry James: http://jjames.fedorapeople.org/omega/

Wojtek

As part of the advanced compilers course semester project (at UIUC), we
are starting to implement array dependence analysis for LLVM.

Great! This is something we've needed for a long time.

I'm currently working on a similar project and hoping to finish it in
about two weeks.

Cool! I think the most critical part of this is to get a good interface for dependence analysis. There are lots of interesting implementations that have various time/space tradeoffs.

For example, it would be great if Omega was available as an option, even if the compiler didn't use it by default. This argues for making dependence analysis implementations pluggable like alias analyses.

As for now, I'm focusing on the dependency analysis engine and publish
only a simplistic interface (it only allows to check whether a loop
carries a memory dependence). However, the engine is quite
self-contained and should be easy to attach to a different interface.
For instance, the engine will be able to find dependence direction
vectors for an instruction pair.

Sounds good. If you have the interface nailed down, it would be good to post what the query interface looks like. Vikram and others have a lot of experience with dependence analysis and know what sorts of queries various clients are interested in. They can help give feedback on the direction you're pursuing independent of the implementation.

Any suggestion on features, tests and/or interface are welcome.

I'd suggest looking at:

Using the chains of recurrences algebra for data dependence testing and induction variable substitution
MS Thesis, Johnie Birch

Array Data Dependence Testing with the Chains of Recurrences Algebra
http://citeseer.ist.psu.edu/vanengelen04array.html

An empirical evaluation of chains of recurrences for array dependence testing

etc.

From my experience I can say it's essential to use a precise alias

analysis as the base for the array dependence analysis. Fortunately,
using Data Structure or Andersen's AA for the whole program can provide
really good results.

Yep, but any particular dep analysis implementation should just require AliasAnalysis instead of a specific implementation. This lets the client (e.g. llvm-gcc) decide what passes to plug together.

-Chris

LLVM loop transformer operates at loop level, which is not what many optimizers do in general. So, a loop level interface (i.e. based on LoopPass in llvm) to find loop-carried dependence is preferable to loop optimizer.

Hi,

Devang Patel wrote:

LLVM loop transformer operates at loop level, which is not what many
optimizers do in general. So, a loop level interface (i.e. based on
LoopPass in llvm) to find loop-carried dependence is preferable to
loop optimizer.

Do you mean making Array Dependence Analysis a loop-level analysis?
Would its results be available for some function-level pass then?

Wojtek

We could extend pass manager framework to let function-level pass use loop-level analysis info. However that is not ideal.

A loop level pass operates on a loop only. If a function level pass needs array dependence analysis info then it expects info. to cover entire function. Which means, it will need function level array dependence analysis pass, which is natural. If loop level pass, say LP, is interested in array dependence info, then in many cases it is more interested in loop-carried dependence info for a given loop. If such info. is made available to through a loop level pass then it'd allow loop pass manager to execute allow loop pass manager to handle LP together with other loop passes.

Hi,

Cool! I think the most critical part of this is to get a good
interface for dependence analysis. There are lots of interesting
implementations that have various time/space tradeoffs.

For example, it would be great if Omega was available as an option,
even if the compiler didn't use it by default. This argues for making
dependence analysis implementations pluggable like alias analyses.

Yes, I also thought about it that way. I think we may look at the
dependence analysis in LLVM at three levels (from the lowest to the
highest one):

1) Testing for dependence given two sequences of GEP indices assuming
that base pointers are the same. The indices could have a SCEV form or
even be preprocessed to something simpler (affine expressions for example).

2) Testing for dependence of two instructions (that access memory). It
would use alias analysis to answer some queries immediately, or to check
whether two GEP base pointers equal. If the latter is the case, 1) would
be used to check for dependences.

3) Complex queries (for example: does the given loop carry dependence?).
It would use 2) and summarize its results.

Only the first level could be pluggable allowing to interchange or chain
core dependency testing techniques. I think there will be no use in
making pluggable the higher ones (please, correct me if I am wrong).
This approach would require to divide the analysis structure into two
parts, say; DependenceAnalysis and IndexingTester.

That said, I must admit I haven't made it that way in my prototype. I
have it in mind, but I'm currently trying to keep things simple and just
to check whether the precision of my implementation is worth anything.

Sounds good. If you have the interface nailed down, it would be good
to post what the query interface looks like. Vikram and others have a
lot of experience with dependence analysis and know what sorts of
queries various clients are interested in. They can help give
feedback on the direction you're pursuing independent of the
implementation.

Ok. I'll post it when it crystallizes (what should happen soon).

I'd suggest looking at:

Using the chains of recurrences algebra for data dependence testing
and induction variable substitution
MS Thesis, Johnie Birch

Array Data Dependence Testing with the Chains of Recurrences Algebra
http://citeseer.ist.psu.edu/vanengelen04array.html

An empirical evaluation of chains of recurrences for array dependence
testing
http://doi.acm.org/10.1145/1152154.1152198

I've read the second one, but am not sure if it's easy to implement if
overflow and unknown signedness are taken into account. For now, I will
stick to the classical Banerjee test. If time allows I'll return to the
article.

From my experience I can say it's essential to use a precise alias

analysis as the base for the array dependence analysis. Fortunately,
using Data Structure or Andersen's AA for the whole program can
provide
really good results.

Yep, but any particular dep analysis implementation should just
require AliasAnalysis instead of a specific implementation. This lets
the client (e.g. llvm-gcc) decide what passes to plug together.

You're right. From the implementation point of view the choice of alias
analysis doesn't matter at all.

Wojtek

I agree. I think the dependence analysis pass should be a FunctionPass. LoopPasses can use function passes, but not visaversa.

-Chris

I would not assume that a loop xform would mainly be interested in
loop-*carried* dependencies. Some need loop-independent deps also, and
some may even need deps outside loops. So making array dep analysis a
function pass seems best.

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.org

...... Original Message .......

How can I build the front-end to generate 16-bit integers?

Hi,

Cool! I think the most critical part of this is to get a good
interface for dependence analysis. There are lots of interesting
implementations that have various time/space tradeoffs.

For example, it would be great if Omega was available as an option,
even if the compiler didn't use it by default. This argues for making
dependence analysis implementations pluggable like alias analyses.

Yes, I also thought about it that way. I think we may look at the
dependence analysis in LLVM at three levels (from the lowest to the
highest one):

1) Testing for dependence given two sequences of GEP indices assuming
that base pointers are the same. The indices could have a SCEV form or
even be preprocessed to something simpler (affine expressions for example).

2) Testing for dependence of two instructions (that access memory). It
would use alias analysis to answer some queries immediately, or to check
whether two GEP base pointers equal. If the latter is the case, 1) would
be used to check for dependences.

3) Complex queries (for example: does the given loop carry dependence?).
It would use 2) and summarize its results.

Only the first level could be pluggable allowing to interchange or chain
core dependency testing techniques. I think there will be no use in
making pluggable the higher ones (please, correct me if I am wrong).
This approach would require to divide the analysis structure into two
parts, say; DependenceAnalysis and IndexingTester.

This all sounds great to me!

That said, I must admit I haven't made it that way in my prototype. I
have it in mind, but I'm currently trying to keep things simple and just
to check whether the precision of my implementation is worth anything.

I'm fine with starting simple and generalizing it out from there. I'd actually recommend against trying to implement a maximally precise dependence analyzer without a client. With no client, there is no way to test that you're getting correct results and whether the improvements in precision are actually useful.

I'd suggest starting out with a simple checker, e.g. that handles simple affine cases, and some client (a loop xform?). This should be enough to get the transformation working on simple loops, and when the client is tested and working there, you can use it to evaluate whether the precision improvements are worthwhile.

If you're looking for a simple client, I'd suggest something like loop reversal. This requires a pretty simple dependence test and can be useful for some targets. The idea is to turn:

for (i = 0 .. 100)
   A[i] = 4;

into:

for (i = 100 .. 0)
   A[i] = 4;

Another transform that needs dependence analysis which would be even more useful (but still very simple) is a "loops to memset/memcpy" pass. This optimization would speed up the 'viterbi' program in the test suite a *lot* for the various 'history' loops. A simple version of this is:

for (i = 0 .. 100)
   A[i] = 0;

->

memset(A, 0, sizeof(A[0])*100);

I'd suggest looking at:

Using the chains of recurrences algebra for data dependence testing
and induction variable substitution
MS Thesis, Johnie Birch

Array Data Dependence Testing with the Chains of Recurrences Algebra
http://citeseer.ist.psu.edu/vanengelen04array.html

An empirical evaluation of chains of recurrences for array dependence
testing
http://doi.acm.org/10.1145/1152154.1152198

I've read the second one, but am not sure if it's easy to implement if
overflow and unknown signedness are taken into account. For now, I will
stick to the classical Banerjee test. If time allows I'll return to the
article.

Ok. Starting simple is good. Thanks again for working on this, this is a major hole in llvm,

-Chris

How can I build the front-end to generate 16-bit integers?

Please clarify your question.
If you are asking how to build llvm-gcc for a 16 bit target,
I think the answer is: (1) gcc itself doesn't support 16 bit
targets; (2) llvm doesn't currently support any 16 bit targets
(but could with a little work). So you are out of luck unless
you are willing to work on it.

Ciao,

Duncan.

Note that if you only care about C/ObjC (not C++, fortran, ada, etc) that clang may be a good option for you.

-Chris

Yes, I am working on an 8-bit target and I am only interested in C. We
have made some progress in adapting llvm to lower and generate the
target instructions from the current llvm-gcc output, however, we would
like to have 16-bit int type for this target, and currently, llvm-gcc
assumes 32-bit int, and of course, 32-bit integer promotions.
I know some other ports of gcc have 16-bit int type, so I am looking for
a way to configure (or if needed to modify) the front-end to generate
16-bit int type and only promote the integer calculation to 16-bit.
Ideally I would like to disable the integer promotions in the front-end
(going out of the standard for performance purposes) and take care of
them in my backend as needed.

Thanks,
A.

gcc does, in fact, support some 16-bit targets.

-eric

Chris Lattner wrote:

I'm fine with starting simple and generalizing it out from there. I'd
actually recommend against trying to implement a maximally precise
dependence analyzer without a client. With no client, there is no way
to test that you're getting correct results and whether the
improvements in precision are actually useful.

I'd suggest starting out with a simple checker, e.g. that handles
simple affine cases, and some client (a loop xform?). This should be
enough to get the transformation working on simple loops, and when the
client is tested and working there, you can use it to evaluate whether
the precision improvements are worthwhile.

In fact, I've a client to test dependence analysis. It's a simple loop
parallelization pass. Basically, it tries to divide the set of
iterations into smaller sets and execute them in separate threads. The
loop is parallelized if it has appropriate shape (i.e. is rotated, has
canonical indvar), has known iteration count (at runtime) and doesn't
carry any dependence. The new dependence analysis is used to check if
there are no memory dependences. Register dependences are simpler to
detect as they are introduced only by the loop header PHI nodes. Some
register dependences can be eliminated (in case of scalar reduction, for
example). This pass is almost finished with some minor FIXMEs for corner
cases. If there is an interest I can share/contribute it soon. However,
I wouldn't expect much from it and would rather treat it as a toy.

After your advice, I am going to prepare (locally) custom tests for
llvm-test testsuite. The number of loops not carrying memory
dependences, or parallelizable ones can be a good measure of dependence
analysis precision. Comparing output of parallelized programs to the
original ones would, probably, help to check its correctness.

If you're looking for a simple client, I'd suggest something like loop
reversal.

I think it would exercise dependence analysis in the same way as the
above pass. Off course, it's still worthy to implement it.
Unfortunately, I am not sure if I'll have time for it in the nearest
future. Maybe someone else is interested?:slight_smile:

Another transform that needs dependence analysis which would be even
more useful (but still very simple) is a "loops to memset/memcpy"
pass. This optimization would speed up the 'viterbi' program in the
test suite a *lot* for the various 'history' loops.

Ok, but, unfortunately, as above...

Wojtek

Makes sense, sounds fun!

-Chris

Wojtek,

If you like, I can help guide this SoC project.

I would also like to see if we can coordinate with Alex and Albert, who are
doing the class project here.

As a first comment, your 3 layers are a good organization but two comments:

1. Layer 1 shd also look at loop bounds and array bounds: those can be used
to disprove some deps.

2. The interface will also need to compute direction and perhaps distance
vectors. Those may or may not be easy to put in one of your layers (say,
layer 3, where they belong).

--Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.org

...... Original Message .......

Vikram,

If you like, I can help guide this SoC project.

Unfortunately, I am not eligible for the SoC programme as I will be
graduating in April. In fact, data dependence analysis (in basic version
at least) is one of my master thesis' parts. I should get the basic
version working in a few days. Internally, it is more or less the
algorithm described in the textbook by Allen and Kennedy, but externally
it has very limited interface. I was planning to improve it after the
graduation, if my time allows. But now, as there is more people to work
on the issue it may be more interesting to develop something better.

I would also like to see if we can coordinate with Alex and Albert, who are
doing the class project here.

Although I'm willing to help, I suppose Alex and Albert would feel more
comfortable if the schedule of their course project was independent of a
guy living on another hemisphere:) My intention was to signalize that
some efforts towards implementing dependence analysis have already been
made. I will try to post my first prototype in the beginning of April.
I'll also try to write down some thoughts about the pass interface, its
internal structure and, also, problems I've encountered during my
"research". Some of the problems aren't directly connected to dependence
analysis but block it from being precise. For example, in codes coming
from translating Fortran programs, arrays that are declared in COMMON
blocks and passed as function arguments become a problem for alias
analysis. Often a conservative "there is a dependence" answer must be
returned without, even, checking indices. But returning to the main
subject, maybe some parts of my prototype and some thoughts would be
useful for the "next generation" implementation...

As for my role, I'm ready to share the experience I've gained while
implementing the prototype. Also, if there is a self-contained part that
can be written independently I may try to do it. I'd be happy to take
advantage of your guidance then.

As a first comment, your 3 layers are a good organization but two comments:

1. Layer 1 shd also look at loop bounds and array bounds: those can be used
to disprove some deps.

Currently, I do use loop bounds to disprove deps. However I don't take
into account trapezoidal or triangular loops.

2. The interface will also need to compute direction and perhaps distance
vectors. Those may or may not be easy to put in one of your layers (say,
layer 3, where they belong).

Yes. "The layers" is only a concept describing which other analysis
queries the given query depends on. Internally, the analysis can be one
class so propagating direction vector from the lowest layer (where they
are computed) to the highest (where they are given to the client) is not
a problem.

Wojtek

I was able to build and play with clang today.
Clang also promotes integer arithmetic to 32-bit.

Any pointers on how to tweak it so that it generate i16 for “int” and also promote int operations to 16-bit operations only?

Thanks,
Sanjiv