Flow-Sensitive AA

I'm not quite understanding how one would use the existing alias analysis
framework to represent a flow-sensitive analysis.

Let's say I have a load and a store and I want to determine whether the load
loads from the same place the store stores to. Today we'd do something like
this:

AA.alias(load->getPointerOperand(), /* get the size */
        store->getPointerOperand()), /* size *);

There is nothing here that tells AliasAnalyses which program point we're at.

It doesn't seem to be enough to say:

AA.alias(load, /* size */, store, /* size */);

because AliasAnalysis wouldn't know which operands we're interested in.

So does AliasAnalysis need an updated interface, something like:

  virtual AliasResult alias(const Value *V1, unsigned V1Size,
                            const Instruction *V1Inst,
                            const Value *V2, unsigned V2Size,
                            const Instruction *V2Inst);

? Or am I missing something?

                                                 -Dave

Yep, the current infrastructure isn't set up to support this.

I haven't seen a real world case where flow sensitive AA is useful. Normally, the conversion to SSA form is sufficient. Can you talk about cases where this matters to you?

-Chris

> I'm not quite understanding how one would use the existing alias
> analysis
> framework to represent a flow-sensitive analysis.

Yep, the current infrastructure isn't set up to support this.

I haven't seen a real world case where flow sensitive AA is useful.

Yes, in the general case, flow-sensitive AA isn't worth it.

Normally, the conversion to SSA form is sufficient. Can you talk
about cases where this matters to you?

Mostly it involves tying into our memory dependence analysis which
annotates things on program points. I need a way to translate back
to our optimizer data structures.

So it's not "flow-sensitive AA" in the strictest sense but it does require
program point information.

I looked at MemoryDependenceAnalysis and it's not quite what I want.
I need to know things like "does this load and store pair interfere," not
"give me the most recent instruction that interferes with this memory
operation." It's the kinds of things you'd want to ask for scheduling
purposes, where no particular instruction ordering is assumed.

                                         -Dave

Ok, so this is not the ideal way to do things. AliasAnalysis was just a
convenient way to hook into something. But as I've gone through this
I've realized it's just not going to work very well.

What I really need is a dependence analysis interface. I need to know
about loop-carried dependencies and that sort of things, whether two memory
operations reference the same data, distance information, etc.. As far as I
can tell, there's no infrastructure for this in LLVM.

I don't actually need the analysis because we have it in our optimizer. What
I need is some kind of interface akin to AliasAnalysis that can chain
analyses, etc.

I'm sure there are others working on this. I believe Vikram mentioned his
group is working on a parallelizing compiler based on LLVM.

I would think it would be straightforward to taken the AliasAnalysis interface
and essentially duplicate it and call it DependenceAnalysis or some such
thing. But if someone's already done this I'd rather not duplicate the
effort.

                                               -Dave

What I really need is a dependence analysis interface. I need to know
about loop-carried dependencies and that sort of things, whether two memory
operations reference the same data, distance information, etc.. As far as I
can tell, there's no infrastructure for this in LLVM.

Right, this is something we've wanted for a long time, but noone has stepped up to add.

I don't actually need the analysis because we have it in our optimizer. What
I need is some kind of interface akin to AliasAnalysis that can chain
analyses, etc.

Ok.

I'm sure there are others working on this. I believe Vikram mentioned his
group is working on a parallelizing compiler based on LLVM.

That project is just barely starting to get off the ground and will take awhile before it starts doing much, as far as I know.

I would think it would be straightforward to taken the AliasAnalysis interface
and essentially duplicate it and call it DependenceAnalysis or some such
thing. But if someone's already done this I'd rather not duplicate the
effort.

In theory, it should be pretty easy to create a new DependenceAnalysis analysis class and start piping it around. It would be nice if there was a trivial implementation so that we can write regtests.

-Chris

Wojtek Matyjewicz has written a simple DependenceAnalysis interface and sent email about it to llvmdev in June -- the message is attached. He said he wrote several tests behind that interface -- ZIV, strong SIV, Banerjee, and some form of the Delta test -- and two students in my Spring class added the Omega test. I have not reviewed his interface yet because I've been traveling almost nonstop since then.

At Illinois, we are working on a parallelizing compiler but we're at an extremely early stage. We too will need a dependence analysis interface that can support fairly aggressive analysis, including strong tests, direction vectors, perhaps distance vectors, and dependence breaking conditions. We were going to start with Wojtek's interface as a strawman. Collaborations on extending the interface or implementation would be very welcome. I'm copying Matthieu Delahaye who is our lead programmer on this effort (he's also on llvmdev).

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

[LLVMdev] Data dependence analysis.eml (5.53 KB)

Is there some code of the interface we can discuss? I'm more than happy to
provide input, requirements, etc. I'll contribute code where I can
(management decision there, but I'm pretty sure I can sell it).

If there's something that's been used for simple things now, let's get it
checked in and start refining it.

                                                     -Dave

Wojtek,

Please see David's message below. Have you or can you check in your code, perhaps as a project for now? That will allow us to start looking at it and perhaps collaborating on it.

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Wojtek,

Please see David's message below. Have you or can you check in your
code, perhaps as a project for now? That will allow us to start
looking at it and perhaps collaborating on it.

Sure. For now, I am posting it as an attachment, because it does not
build against the current SVN version. It is really basic (for example,
it cannot produce distance vectors, breaking conditions and it does not
handle affine loop bounds), but hope you find some pieces of it useful.

The attached version does not reflect the recent changes in the LLVM IR
(larger set of first-class types) yet, so it should be built against one
of the previous revisions (it works with r50174 for sure). It have to be
configured with --with-llvmsrc and --with-llvmobj switches. Build
process will produce libmemdep.so shared library that can be load into
opt. In the archive, there is also 'test' directory with simple programs
to test the pass. I suggest using such a toolchain:

$ llvm-gcc -c -emit-llvm -O3 <program.c> -o - | opt -load=loopmemdep.so
-loopsimplify -anders-aa -loop-memdep -debug-only=loop-memdep

(Debug output is quite verbose.)

I am investigating what changes are necessary to add support for
first-class structs and arrays and will prepare a version to check in as
a LLVM project if there still is interest.

Wojtek

loopmemdep.tar.bz2 (35.7 KB)

Great, thanks! How much work do you think it will take to bring it up to date with the LLVM IR, except *ignoring* first-class structs and arrays for now? I believe llvm-gcc does not generate those in most cases and we can do a lot without supporting those. What else is missing relative to the current LLVM IR?

Thanks,

--Vikram
Associate Professor, Computer Science
University of Illinois at Urbana-Champaign
http://llvm.org/~vadve

Right, the design of the first class aggregates is that optimizers should generally assume they don't exist. This is the same sort of thing as how most optimizers assume that mem2reg has been run. Of course, this should only affect effectiveness, not correctness.

-Chris

Great, thanks! How much work do you think it will take to bring it up
to date with the LLVM IR, except *ignoring* first-class structs and
arrays for now? I believe llvm-gcc does not generate those in most
cases and we can do a lot without supporting those. What else is
missing relative to the current LLVM IR?

To my surprise, it took almost no time (only some minor changes
unrelated to the IR were necessary). Furthermore, it seems that lack of
support for first-class structs and arrays is not a matter of
correctness but a matter of precision. I think the analysis should give
correct answers even for bitcode that uses recently introduced
instructions. I have attached a version that can be built against the
SVN version.

The code contains many FIXME notes. In general, they indicate
possibilities of efficiency or precision improvements rather than bugs.
However, there is one issue I have ignored - possibility of overflow in
the index expression. Suppose, we have such a loop:
  for (i8 i = 0; i != 200; ++i) {
    A[2 * i + 5] = ...
    ... = A[2 * i + 3]
  }
If both index expressions are evaluated in 8-bit arithmetic,
then the dependence equation should be solved in modular arithmetic:
  2 * i + 5 == 2 * (i + delta) + 3 (mod 256)
So the dependence distance is delta == 1 or delta == -127 what gives
two direction vectors: (<), (>). My version will produce only the first
one. Taking into account overflow issue during dependence analysis seems
very difficult to me. Does anybody have some experience in this area?
Fortunately, programmers generally do not write programs in such a
nasty way.

And now a small explaination of debug output for mandel.c (included in
the attachment):

   [ ... ]

   Testing DMA dependence between:
        DMA to %tmp95 of size 1 (m)
        DMA to %tmp95 of size 1 (m)
// Testing for output dependence between pict[py * pict_width + px]
// assignments in different iterations. DMA stands for Direct Memory
// Access that means an LLVM instruction that directly accesses memory
// (the opposite is call site).

   [ ... ]

      CommonNestSize: 2
// Number of loops that are common for both instructions
// (In this example the first and second instructions are the same -
// we are testing for self-dependence)

      LBounds1: (1049, 1679)
// Bounds for all loops containing the first instruction

      LBounds2: (1049, 1679)
// Bounds for all loop containing the second instruction

      ALengths: (0)
// Dimensions of accessed array, 0 = unknown

      Indices1: (1680*i_0 + 1*i_1 + 0)
// Affine formula for index expression in the first instruction
// i_0 means induction variable for loop at level 0 (outermost),
// i_1 means IV for loop at level 1, etc...

      Indices2: (1680*i_0 + 1*i_1 + 0)
// Affine formula for index expression in the first instruction

       Group: (0)
        Pos 0:
         Banerjee test:
          U1: (1049, 1679); U2: (1049, 1679)
          A1: (1680, 1), B1: 0
          A2: (1680, 1), B2: 0
           Trying (*,*): (L = -1763999, M = 0, R = 1763999)1
           Trying (<,*): (L = -1763999, M = 0, R = -1)0
           Trying (=,*): (L = -1679, M = 0, R = 1679)1
           Trying (=,<): (L = -1679, M = 0, R = -1)0
           Trying (=,=): (L = 0, M = 0, R = 0)1
           Trying (=,>): (L = 1, M = 0, R = 1679)0
           Trying (>,*): (L = 1, M = 0, R = 1763999)0
         possible dependence; direction vectors : { (=,=) }
        Delta test: possible dependence; direction vectors : { (=,=) }
// Dependence tester internals

    Array dependence tester: dependence; direction vectors: { (=,=) }
// The answer.

-Wojtek

loopmemdep.tar.bz2 (37.8 KB)

Hi Wojtek,

This is a great start. I'll focus and try to understand your interface design and choices. Here are my initial thoughts based on a quick read.

The only interface provided by LoopMemDepAnalysis is "bool carriesDependence()" which may not be sufficient. But we can extend this.

The ArrayDepTest provides testDependenc() as well as testPositions(). I'm not very clear about the testPositions(). Would it be possible for you to explain this ?

One nit-pick, I see that some of the interfaces use tons of parameter, which is something I'd like reduce for ease of use.

Thanks for posting your work.

Hi,

The only interface provided by LoopMemDepAnalysis is "bool carriesDependence()" which may not be sufficient. But we can extend this.

Right. It was the only method I used for my purposes. I believe that most of the high-level functionality could be built on top of the testDependence() method in LMDAImplementation class.

I see one more major missing feature - lack of support for loop-independent dependences.

The ArrayDepTest provides testDependenc() as well as testPositions(). I'm not very clear about the testPositions(). Would it be possible for you to explain this ?

The idea was to provide infrastructure for different dependence tests. The ArrayDepTester class can be seen as a manager. Each concrete test (ie. the Delta test) is implemented as a subclass of ArrayDepTest. Tests are inserted into ArrayDepTester object in order of increasing complexity. ArrayDepTester::testDependence() first partition indexing positions (GEP operands) into separable positions and coupled position groups (according to the terminology in "Optimizing Compilers for Modern Architectures"). Then the dependence tests (ArrayDepTest objects) are run independently for each coupled position group (separable position can be seen a special case of coupled position group). This is a reason for ArrayDepTest::testPositions() method exists. It is given the same arguments as ArrayDepTester::testDependence() but with additional information about particular positions to test. It returns one of three answers: NoDependence, ExactDependence, PossibleDependence. First two answers tells there is no need to run more complex tests on this position group. The last ones causes ArrayDepTester to try next (more complex) test to refine the answer.

An example:
   for i:
     for j:
       A[i][j][k] = A[i+1][j][k+j]

There is one separable indexing position - the first one. The second and third create a coupled group. Suppose, we have two tests - the Delta test (which incorporates ZIV and SIV) and the Omega test. ArrayDepTester will first run the Delta test on the first position. It will easily detect an exact dependence, so there will be no use in running the Omega test. However, it may be the case that for the group consisting of second and third positions the Delta test is not sufficient and the Omega test will produce better results.

To be honest, I am not sure if this infrastructure is not an overkill for my simple prototype. I was just trying to code concepts from "Optimizing compilers for Modern Architectures" book.

One nit-pick, I see that some of the interfaces use tons of parameter, which is something I'd like reduce for ease of use.

Right. It was my concern as well, but I eventually decided to write it this way. Feel free to change it.

-Wojtek

We want to model this as an analysis and make following changes.

- Rename LoopMemDepAnalysis as DataDependenceAnalysis. Various transformation passes will use this interface to access data dependence info. This is an external interface. Put this in include/llvm/Analysis.
- Make DirectionVector (and later on DistanceVector) independent interface and put them in ADT.
- Put various tests, DeltaTest, in lib/Analysis folder. The transformation pass does not need to see these details.
- DataDependenceAnalysis will select various dependence tests based on user selection. We want a interface similar to AnalysisGroup used by Alias Analysis, but we also want to allow the possibility of running multiple tests at the same time.
- Make ArrayDepTester a private implementation detail of DataDependenceAnalysis.

What do you think ?

Hi,

[...]

- Put various tests, DeltaTest, in lib/Analysis folder. The
transformation pass does not need to see these details.

I believe some low-level tests should actually not be implemented as a
separate Analysis but placed into Support. For instance, DeltaTest would
use GCD or other tests on a different set of indexes once constraints
are propagated. This means these tests are called on expressions that
are not necessarily present in the code itself.

- DataDependenceAnalysis will select various dependence tests based
on
user selection. We want a interface similar to AnalysisGroup used
by
Alias Analysis, but we also want to allow the possibility of running
multiple tests at the same time.

That will probably be the most difficult part. But with all the people
that shows interest on using or adding new analysis here, I am hopeful
we will obtain an acceptable stable API.

Regards,
Matthieu Delahaye

This requires the GCD Analysis test to provide interface to evaluate expression on demand. SCEV also does the same thing.

>
> I believe some low-level tests should actually not be implemented as a
> separate Analysis but placed into Support. For instance, DeltaTest would
> use GCD or other tests on a different set of indexes once constraints
> are propagated. This means these tests are called on expressions that
> are not necessarily present in the code itself.

This requires the GCD Analysis test to provide interface to evaluate expression on demand. SCEV also does the same thing.
-
Devang

The more I am thinking about it, the more I believe that putting these tests together is much
more complex than chaining them like it is done with the different AAs. To make my case, I'll have
first to go through different surveys of existing memory dependence analysis and I'll get back
to the list with something more concrete.

Matthieu

We want to model this as an analysis and make following changes.

- Rename LoopMemDepAnalysis as DataDependenceAnalysis. Various
transformation passes will use this interface to access data
dependence info. This is an external interface. Put this in include/
llvm/Analysis.
- Make DirectionVector (and later on DistanceVector) independent
interface and put them in ADT.
- Put various tests, DeltaTest, in lib/Analysis folder. The
transformation pass does not need to see these details.
- DataDependenceAnalysis will select various dependence tests based on
user selection. We want a interface similar to AnalysisGroup used by
Alias Analysis, but we also want to allow the possibility of running
multiple tests at the same time.
- Make ArrayDepTester a private implementation detail of
DataDependenceAnalysis.

What do you think ?

It makes sense to me. However, I don't have idea how to organize
dependence analysis internally to allow for flexibility and precision at
the same time.

The second hard part is, I guess, designing good analysis interface. For
instance: how should distance vectors be represented? as separate
objects or annotations to distance vectors? The number of distance
(direction) vectors grows quadratically with the size of the loop
(roughly) what may constitute a problem for really large loops.

Unfortunately, I won't have time to help you with the implementation
now. But feel free to use any pieces of the code I have posted.

Wojtek

I believe that Matthieu Delahaye is planning to work on this project so that should not be a problem. Your code will be helpful to him as a starting point so thanks for that! If you ever want to get back into this, please let us know.

–Vikram
http://www.cs.uiuc.edu/~vadve
http://llvm.org/