SIV tests in LoopDependence Analysis, Sanjoy's patch

Gents,

I spent some time reading over Sanjoy’s patch for LoopDependenceAnalysis. Unfortunately, an early version of these notes escaped; this is the complete review.

First off, I agree with his choice to implement the SIV tests. For scientific Fortran, the SIV (and the simpler ZIV) tests cover about 85% of the cases in practice. For C and C++, I expect the percentage will be much higher. Someday we might like to see the general SIV test and handling of MIV subscripts, but we can get a long ways without them. It was my intention to implement exactly these tests, but Sanjoy is way ahead of me.

My biggest problem is with the choice (not Sanjoy’s, I think) of implementing LoopDependenceAnalysis as a LoopPass. Dependence analysis needn’t be restricted to loop nests. I’d like to see DependenceAnalysis for an entire function, so we can do things like loop fusion, scheduling, etc. In particular, we should be able to test for dependence between references in disjoint loops. Here’s an (incomplete) description of what I’m thinking: https://sites.google.com/site/parallelizationforllvm/

In the large, I think we want to build a dependence graph for an entire function, with edges for dependences, annotated with direction/distance info. I imagine the code divided into 2 big chunks:

  1. the dependence graph builder, that walks around the code looking for interesting pairs of references, calling the dependence tester, and assembling the results into a dependence graph
  2. the dependence tester, that takes a pair of references and tests them for dependence, computing direction/distance vectors when possible (very close to what Sanjoy has built).
    In the meantime, I agree with Sanjoy’s idea that a great next step would be to compute direction vectors (distance vectors when possible, strong SIV). I’d reorganize things a bit, maybe, so we return NULL for proven independence and a pointer for a dependence description otherwise, including a direction/distance vector with an entry for all common loops (or potentially common loops, in case of loop fusion). Entries in the vector should include <, =, > (and combos like <=) and distances, but also entries for Scalar and Unknown.

Remember that a single pair can end up with several dependences.

Remember loop-independent dependences.

Might make provision finding input dependences. They’re expensive (typically lots of them), but very useful to guide restructuring to improve use of cache and registers.

More detailed comments follow below.

Preston

LoopDependenceAnalysis.h

  • comment about isAffine() seems wrong. Consider A[2i + 3j + 10], where i and j are both induction variables in the loop nest. Isn’t that affine?
  • findOrInsertDependencePair() - insert into what? Could use some comments explaining whats going on here
  • cache should perhaps not be based on pairs of instructions, but on pairs of subscripts, since the anlysis for A[i] and A[i+1] is exactly the same as the analysis for B[i] and B[i + 1]

LoopDependenceAnalysis.cpp

  • AnalyzePair
  • should check for mod-ref info with calls, so we can take advantage of any available interprocedural analysis. For example, if we have a load from A[i] and a call, you immediately give up and call it Unknown… That’s pessimistic; we should make sure that A is modified by the call.
  • when analyzing subscript pairs, if a result is Unknown, you give up. That’s pessimistic; you should look at remaining subscript pairs too. If one proves Independent, then there’s no dependence.
  • Of course, this is the place to accumulate and merge direction vectors.- isSIVPair
  • only works with single loop nest. We’d prefer that it also work with disjoint loops too, so we can do loop fusion. See Wolfe’s “Optimizing Supercompilers for Supercomputers”, page 18 and chapter 5. Instead of a set of Loop *, accumulate a set of loop levels (ints).- analyzeSIV
  • annoying to search for common loop again, since we had to find it to arrive here
  • want to be able to analyze references in disjoint loops as if already fused; will need to adapt SIV tests, since loop bounds aren’t always available
  • the actual tests look ok

Result types for analyzePair and analyzeSubscript (and analyzeSIV, at al.) should probably be different.

Hi Hal, Preston!

Sorry for the delay! Got busy with some offline work.

I've worked on my previous code to calculate direction and distance
vectors whenever possible (strong SIV, basically). I think the
current code is much clearer and would like your opinions on it.

I have attached the patch and also pushed to the github repo I
mentioned [1].

Thanks!

[1] https://github.com/sanjoy/llvm/tree/lda

lda.diff (34.4 KB)

Hi Sanjoy,

Thanks for the update.

Reading through LoopDependenceAnalysis.h

  • I’d sure like to see some design documentation (this is true for most of the LLVM code I’ve seen). Tell us what the plan is, only then talk about the implementation
  • I hate the names Subscript and Subscripts. Implies each entry corresponds to a subscript, but that’s not true. Each entry corresponds to a level in the loop nest. There may be several subscript pairs summarized in a single level entry. I’d suggest Level, or LoopLevel, or some such, more closely related to the literature. Indicating that entries are related to the loop nest, versus suggesting that they somehow correspond to subscripts (which are associated with memory references).

Your definition for Subscript has a Kind that’s either Independent, Dependent, or Unknown. This seems all wrong. If some level is independent, then no dependence exists at all (not just at this level). Otherwise, there will absolutely be a direction (3 bits) and perhaps a distance. I’d also like to see a bit for Scalar, and perhaps a few more (see here). Unknown isn’t special; it just mean there’s a dependence and any direction is possible. We indicate Unknown by setting all 3 bits in the direction vector (same for Scalar). In this way, we can test for the validity of many xforms by simply examining the direction vectors; don’t need to look at other flags.

The definition for Dependence should probably be a class with a number of methods.
You currently have a field for Result. Don’t think it’s useful.
On the other hand, I’d include Kind, one of Flow, Antri, Output, and Input.
You absolutely need a flag indicating the possibility of a loop-independent dependence.
You need your vector of levels.
I think I’d include pointers to the source and destination here.
I’d have methods for things like:

  • bool confused(), returns true if all levels are <=>
  • bool consistent(), returns true is all levels are Scalar or have a distance
  • unsigned loopCarried(), return the level of the left-most non-equal direction

The methods could be implemented by a computation or by looking at a flag that’s computed once.

I’d very much like to see the dependence representation and tests separated from the collection of dependences associated with a loop nest. People working on loop nests will need to be able to access the associated dependences in various ways (e.g., give me all the dependences carried by a certain level) that you can’t yet predict. Make 'em define their own structures. You should focus on the tests.

I’d encourage more discussion (indeed, any discussion) of all this before hacking up more code. Read the stuff I’ve written here and comment, or I’ll get you access and you can write directly That’s not to say what you’ve written is a waste; I think it’s quite wonderful. You’ve implemented most of the tough parts and I’m confident it will all prove useful in the final solution.

I’ll review the actual implementation in LoopDependenceAnalysis.cpp in another note.

Thanks,
Preston

Hi Sanjoy,

In LoopDependenceAnalysis::AnalyzePair, what’s going to happen if we have something like this

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
A[i][j]++;

versus

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
A[j][i]++;

That is, will the ordering of the Subscripts in Subscript be different?
I don’t think they should be. The ordering should correspond to the loops.
That is, results related to the i loop should appear first, then results related to the j loop.

And what’s going to happen here?

for (i = 0; i < n; i++)
A[i+1][i+2] = A[i][i];

where there’s 2 subscripts, but only one loop level?
We should analyze each pair of subscripts separately
and merge (intersect) results applying to the same loop level.
In this case, we should notice that there’s no possible dependence.

And here?

for (i = 0; i < n; i++)

for (j = 0; j < n; j++)

A[i] = A[i] + B[i][j];

where A subscripts don’t mention the j loop at all?
I would say there are 3 dependences:

  1. a loop-independent anti dependence with direction vector [0 S|<]
  2. a loop-carried flow dependence with direction vector [0 S|0]
  3. a loop-carried output dependence with direction vector [0 S|0]

You don’t seem to have any scheme for handling these cases.
We should come up with something.

Preston

Hi Sanjoy & Hal,

Looking at LoopDependenceAnalysis::analyzeZIV …
I don’t understand much about SCEV, but the code seems inadequate.
Consider these three examples:

int a = …; // we can’t be sure of the relationship between a and b
int b = …; // perhaps they are parameters, or the result of I/O

for (int i = 0; i < n; i++) {
v[a][i] = …;
v[a + 1][i] = …;
}

for (int i = 0; i < n; i++) {
v[a][i] = …;
v[a][i] = …;
}

for (int i = 0; i < n; i++) {
v[a][i] = …;
v[b][i] = …;
}

In the first loop, we can be confident that the two stores are to different locations and there is no dependence.

In the second loop, we can be sure that the two references are to the same location and that there’s a consistent loop-independent output dependence between the two stores.

In the third loop, we can’t be sure what’s going on and must therefore assume an inconsistent loop-independent output dependence.

The ZIVanalysis code doesn’t do the right thing. I believe, but am not certain, that it will get the 1st two loops right, but will return Independent for the 3rd case. Better code might look something like:

SCEV *delta = a - b
if (delta is zero) {
consistant &= true
return Dependent
}
else if (delta is a known constant)
return Independent
}
else { // delta is some symbolic term
consistent = false
return Dependent
}

By the way, when I assert all these things about dependence analysis, I’m not always correct. If you disagree, please let me know and we’ll discuss it.

Thanks,
Preston

Hi Preston,

Thanks for the feedback!

In LoopDependenceAnalysis::AnalyzePair, what's going to happen if we > have something like this

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    A[i][j]++;

versus

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    A[j][i]++;

I think this can be fixed by ordering the subscripts on the loop
nesting. This should be doable, since the add recurrences remember
the loop whose induction variable they're representing.

Secondly, I think we can specialize on dealing only with linear add
recurrences and constants (symbolic or actual constants) at this
point. Since LLVM represents indices like

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    a[i + j]++

as quite nicely as add recurrences of add recurrences we should be
able to get quite far along with this approach.

And what's going to happen here?

for (i = 0; i < n; i++)
    A[i+1][i+2] = A[i][i];

I left out handling coupled subscripts at this point because ignoring
them only results in more conservative safer inferences about
dependence. More sophisticated analysis like this can be added
incrementally, I think.

And here?

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    A[i] = A[i] + B[i][j];

I'm not too clear about this. Assuming A and B point to two different
locations, there seems to be only one loop independent dependence --
from A[i] to A[i].

You are right about the ZIV issue, it needs to be fixed.

I think the following would be a good start at modelling dependencies
between two Values:

class Dependence {
public:
  struct Level {
    enum { LT = 1, EQ = 2, GT = 4, ALL = 7 } direction;
    bool scalar;
    const SCEV *distance; // NULL implies no distance available
    const Loop *loop;
  };

  enum Kind { Flow, Anti, Output, Input };

  Kind getKind() const {
    return kind;
  }

  const Value *getSource() const {
    return source;
  }

  const Value *getDestination() const {
    return destination;
  }

  typedef SmallVector<const Level *, 4>::const_iterator const_iterator;
  const_iterator begin() const { return levels.begin(); }
  const_iterator end() const { return levels.end(); }

private:
  Value *source, *destination;
  Kind kind;
  SmallVector<const Level *, 4> levels;

  friend class LoopDependenceAnalysis;
};

More methods, like `isConsistent` and `isConfused` can be added easily
to this interface, since the raw data required to compute them is
available.

Given two Values, then, we can first break them up into their
respective indices. We can then order them based on their loop
nesting. To actually compute the dependence, we can traverse down the
subscript lists of the two Values and keep checking individual
subscript pairs.

I'm not sure on how to deal with cases such as these:

for (i = 0; i < n; i++) {
  for (j = 0; j < n; j++) {
    t = a[i][j];
    a[i + 1][j + 1] = t;
  }
  for (j = 0; j < n; j++) {
    t = a[i][j];
    a[i + 1][j + 1] = t;
  }
}

My gut feeling is that we should stop checking individual subscripts
when the loop nesting have diverged. But don't know how to concretely
represent such dependencies.

I'm assuming in cases such as these:

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    a[i][0] = a[j][i]

we can have a dependence edge [S U | 0].

Let me know what you think!

Hi Sanjoy,

I wondered:

In LoopDependenceAnalysis::AnalyzePair, what's going to happen if we
have something like this

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
A[i][j]++;

versus

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
A[j][i]++;

I think this can be fixed by ordering the subscripts on the loop
nesting. This should be doable, since the add recurrences remember
the loop whose induction variable they're representing.

Yes, we need to record the results of the individual subscript tests in
a vector indexed by the loop-nesting depth.
The order of the subscripts is irrelevant.

Secondly, I think we can specialize on dealing only with linear add
recurrences and constants (symbolic or actual constants) at this
point. Since LLVM represents indices like

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i + j]++

as quite nicely as add recurrences of add recurrences we should be
able to get quite far along with this approach.

Yes, this will be fine. Indeed, I don't think we'll ever be able to do more.

And what's going to happen here?

for (i = 0; i < n; i++)
A[i+1][i+2] = A[i][i];

I left out handling coupled subscripts at this point because ignoring
them only results in more conservative safer inferences about
dependence. More sophisticated analysis like this can be added
incrementally, I think.

Yes, though my point was that you are currently recording
two different results, when there's actually only one loop level.

And here?

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
A[i] = A[i] + B[i][j];

I'm not too clear about this. Assuming A and B point to two different
locations, there seems to be only one loop independent dependence --
from A[i] to A[i].

Again, I'm trying to point out that the number and order of subscripts has
nothing to do with the loop levels. What we want to find here is that
there are 3 dependences:
1) a loop-carried anti dependence from the load of A to the store,
with a direction vector of [= S|<]
2) a loop-carried flow dependence from the store to the load of A,
with direction vector [= S|0]
3) a loop-carried output dependence from the store to the store, with
direction vector [= S|0]
(see Section 8.2.3 in Allen & Kennedy)

I think the following would be a good start at modelling dependencies
between two Values:

class Dependence {
public:
struct Level {
enum { LT = 1, EQ = 2, GT = 4, ALL = 7 } direction;
bool scalar;
const SCEV *distance; // NULL implies no distance available
const Loop *loop;
};

What is the loop value used for?
Seems redundant, in that we can get to the statements given the
source and destination pointers, and from there to the loop nest.
The relative position in the loop nest is given by the index of
the levels vector.

enum Kind { Flow, Anti, Output, Input };

Kind getKind() const {
return kind;
}

const Value *getSource() const {
return source;
}

const Value *getDestination() const {
return destination;
}

typedef SmallVector<const Level *, 4>::const_iterator const_iterator;
const_iterator begin() const { return levels.begin(); }
const_iterator end() const { return levels.end(); }

private:
Value *source, *destination;
Kind kind;
SmallVector<const Level *, 4> levels;

I'd malloc an ordinary vector of the appropriate length,
since we know the length at allocation time.

friend class LoopDependenceAnalysis;
};

More methods, like `isConsistent` and `isConfused` can be added easily
to this interface, since the raw data required to compute them is
available.

You'll also need flags for loop-independent, consistent, and confused.
I thought earlier that the latter two could be computed, but I was wrong.
Can't compute them when the source or destination isn't in the loop.

When a dependence is completely confused, we needn't represent the
direction vector (what you call levels), so we can perhaps save a bit of
storage in that common case.

Given two Values, then, we can first break them up into their
respective indices. We can then order them based on their loop
nesting. To actually compute the dependence, we can traverse down the
subscript lists of the two Values and keep checking individual
subscript pairs.

Sort of. I would break them up into two lists if subscripts.
Don't sort these lists! Instead, find the common nesting depth for
the 2 statements and allocate distance vectors of the appropriate length.
Then, as we get results from testing individual subscript pairs,
we record them in the direction vector at the appropriate level.

Initialize each level in the direction vector with ALL.
Set the Scalar flag for each level.

Imagine we're starting simple, with no attempt to take advantage of
coupled subscripts.
Examine each pair of subscripts in sequence.
If a pair is ZIV, we test hoping to prove independence.
If we can't, we move on to the next subscript.

If a pair is MIV, we ignore it and move on to the next subscript.

If a pair is SIV, we check to see if it's something we can handle
(strongSIV, weakzeroSIV, ...). If not, skip to next subscript.

If we can handle it, compute direction (if StrongSIV, compute distance).
Find the appropriate level based on the index variable, turn off the
Scalar flag,
and intersect the new direction with whatever is already stored there.
If the result is NONE, then there's no dependence and we're done.
If we have a distance, set it (if a distance already exists, compare to
see if they are the same. If not, then no dependence exists, and we're done).
Probably don't bother to record symbolic distances.
Move on to the next subscript.

That's the general idea.
I'll spend some time over the next week or so
trying to get it written up in more detail on my
website: Dependence Tests - Parallelization for LLVM

Does this make sense.
If not, please get back to me.

I'm not sure on how to deal with cases such as these:

for (i = 0; i < n; i++) {
for (j = 0; j < n; j++) {
t = a[i][j];
a[i + 1][j + 1] = t;
}
for (j = 0; j < n; j++) {
t = a[i][j];
a[i + 1][j + 1] = t;
}
}

My gut feeling is that we should stop checking individual subscripts
when the loop nesting have diverged. But don't know how to concretely
represent such dependencies.

We can compute dependence for imperfectly nested loops like this;
indeed, we must if we want to try and parallelize the outer loop.
But there's also the possibility of fusing the two inner loops.
If we want to do that (and we often do), then we need to look
for dependences between them.

Generally, dependences between 2 statements only have to do with
the common loops. So dependences between these two inner loops
should pay attention to subscripts that mention "i".
So we'd say that values flow from store in the first loop to
load in the second loop, with a distance vector [1|<]

However, in this special case, where we have two loops next to each
other with identical loop bounds, we'd like to fuse the loops, if it's legal.
Often very useful as a way to save loads and cache misses.
Now someother part of the compiler would notice the potential
for fusing (the adjacent loops with identical headers), but it would rely
on the dependence analyzer to make sure fusion was legal, i.e., that
no dependences would be violated. To do so, we need to pretend
that the fusion has already taken place, and check subscripts for
both the i and j loops.

Does this make sense?

I'd arrange for it by providing an extra entry point with an extra parameter
specifying how many loop levels to check.

I'm assuming in cases such as these:

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][0] = a[j][i]

we can have a dependence edge [S U | 0].

Scalar directions occur when a loop level isn't mentioned in any subscript.
For example,

for (i = 0; i < n; i++)
  for (j = 0; j < n; j++)
    a[i + 1] = a[i] + 1;

Here we see two dependences
1) a loop-carried flow dependence from the store to A and the load,
with dv of [1 S|0]
2) a loop-carried output dependence from the store to the store, with
dv of [0 S|0]
I don't think there's an anti dependence in this case.

Your example is more complicated. Let's look at it in detail

for (i = 0; i < n; i++)
for (j = 0; j < n; j++)
a[i][0] = a[j][i]

Comparing the store against itself, we find a loop-carried output
dependence with dv of [0 S|0].
Do you see how it happens? We have two stores with subscripts [i][0] and [i][0].
The j loop isn't mentioned, so its level is S.
The i loop is mentioned in the first subscript, so that level is 0 (or =).
And since we're talking about the same instruction, it can't be a
loop-independent dependence.

Comparing the load against the store, we see [j][i] and [i][0].
The subscripts are coupled, with an MIV and an SIV.
Looking at the SIV, we see that there can only be a dependence when i = 0.
Propagating that fact into the other subscript pair,
we find that the dependence can only exist when j = i = 0.
So there can't be a loop-carried dependence, so no flow dependence.
On the other hand, we can have an inconsistent loop-independent anti
dependence from the load to the store, with dv [0 0|<]

Preston

Hi,

private:
  Value *source, *destination;
  Kind kind;
  SmallVector<const Level *, 4> levels;

I'd malloc an ordinary vector of the appropriate length,
since we know the length at allocation time.

if the number of levels is usually small it is usually better to use a
SmallVector (like in the code above) and do:
   levels.reserve(known_size);
That way you avoid a malloc if known_size <= 4.

Ciao, Duncan.

Hi Sanjoy,

Reading through LoopDependenceAnalysis::analyseWeakZeroSIV(), it appears you give up some information. Sometimes it’s possible to prove that certain directions aren’t possible.

Consider this example

for (i = 0; i < N; i++) {

  1. t = A[0];
  2. A[i] = t;
    }

There’s a loop-carried flow dependence from (2) to (1) with distance vector [<=|0] and a loop-independent anti dependence from (1) to (2) with distance vector [=|<]. We should be able to find these using the Weak-Zero SIV test.

If the distance (difference of the two constant parts) is 0, as it is in this case, the direction vector is either <= (if the bCoefficent is 0) or = (if the aCoefficient is 0).

What’s cool about this case is that we can peel the first iteration and remove the loop-carried dependence.

Here’s similar case that happens a lot:

for (i = 0; i < N; i++) {

  1. t = A[N - 1];
  2. A[i] = t;
    }

This time there’s a loop-carried anti dependence from (1) to (2), with distance vector [<=|<]. Again, we should be able to find this with the Weak-Zero SIV test.

If the distance equals iterationCount - 1, as it does in this case, the direction vector is either <= (if the aCoefficient is 0) or = (if the bCoefficient is 0).

Again, we can do useful things here by peeling off the last iteration.

Make sense?

Thanks,
Preston

Hi Sanjoy,

Reading through LoopDependenceAnalysis::analyseStrongSIV(), I noticed one problem and one confusion.

My confusion related to your naming of the two instructions as A and B. It’s consistent all through LoopDependenceAnalysis. I’d prefer something like source and destination, so I can keep track of which is which. It didn’t matter so much when you were simply proving or disproving dependence, but when you compute direction, it’s suddenly crucial.

The problem is the computation of direction from distance. The code says:

if (distance->isZero())
S->Direction = Subscript::EQ;
else if (isGT)
S->Direction = Subscript::GT;
else
S->Direction = Subscript::LT;

While it looks sensible, it’s incorrect. Correct is

if (distance->isZero())
S->Direction = Subscript::EQ;
else if (isGT)
S->Direction = Subscript::LT; // !!
else
S->Direction = Subscript::GT; // !!

The Delta test paper (Section 1.3) and the 1st printing of Allen & Kennedy (Definition 2.10) are similarly incorrect.
See http://www.elsevierdirect.com/companion.jsp?ISBN=9781558602861
particularly the replacement for page 46.

I’m also confused about the math, but I’ll keep working on it.

Thanks,
Preston

Wait Are you really setting isGT to true if the distance is less than zero?
If so, I think your code is ok, but the naming could use a rework.

Preston

Hi Sanjoy,

I reworked the code for analyzeStrongSIV to fix a couple of mistakes, plus squeeze all the advantage possible from the symbolic manipulation provided by the SCEVs. It’s sketched out here: https://sites.google.com/site/parallelizationforllvm/strong-siv-test

Does it makes sense to you?

Thanks,
Preston

Hi,

Here is a preliminary (monolithic) version you can comment on. This
is still buggy, however, and I'll be testing for and fixing bugs over
the next few days. I've used your version of the strong siv test.

Thanks!

patch.diff (39.7 KB)

Hi Sanjoy,

Thanks for the update.

I reworked the code for analyseWeakZeroSIV, fixing bugs and exploiting the two special cases mentioned in the paper.
It’s here: https://sites.google.com/site/parallelizationforllvm/weak-zero-siv-test

I’ll rework analyseWeakCrossingSIV soon. I’ll send you a note when it’s ready.

The current code for analyseZIV is wrong, in that it sets

level->direction = Level::ALL;
level->distance = diffSCEV;
level->scalar = true;

Doesn’t make sense, since ZIV tests have no induction variable and therefore no associated loop level.
Suggests some work on analyseSubscript (where it’s assumed that the ZIV test relates to a loop)
and analysePair, which makes the same assumption.

In analysePair, there’s some code that finds the innermost loop containing the source value and the innermost loop containing the destination value. Then it decides that if neither contains the other, then no dependence exists. That’s not correct. Consider this example:

for (i = 0; i < n; i++) {
for (j = 0; j < m; j++)
A[j][i] = …
for (k = 0; k < m; k++)
A[k][i] = …
}

Wolfe writes:

Data dependence distances and directions between the two bodies are computed only for the common loops, those that surround both bodies.

So, we’ll expect to find a loop-independent output dependence from the 1st store to the 2nd store, with direction vector [= | <]. And, since the outer loop carries no dependence, we could safely run it in parallel. Even better, we could fuse the two inner loops, then collapse the resulting inner loop with the outer loop and run the whole thing in parallel :slight_smile:

Embellishing the example slightly

for (i = 0; i < n; i++) {
for (j = 0; j < m; j++)
A[j][i][0] = …
for (j = 0; j < m; j++)
for (k = 0; k < n; k++)
A[j][i][k] = …
}

we again find a loop-independent output dependence from the 1st store to the 2nd store, with direction vector [= | <]. The extra loop nesting doesn’t change anything.

Here’s a final variation:

for (i = 0; i < n; i++) {
for (j = 0; j < m; j++)
A[j][i][0] = …
for (j = 0; j < m; j++)
for (k = 0; k < n; k++)
A[j][i + 1][k] = …
}

This time there’s a loop-carried output dependence from the 2nd store back to the 1st store, with distance vector [1 | 0]. We found this by running the Strong SIV test on the middle subscript and ignoring the other two, since they don’t refer to a common loop.

Finally, there’s still no mention of loop-independent dependences. They’re important! When the dependence test is invoked, we need to pass in a flag indicating if a loop-independent dependence is possible; that is, if control can flow from the source to the destination without traversing a back edge. At the end of the dependence test, if dependence has not been disproved and the flag is set, we need to examine the direction vector. If all levels include the = direction, then we mark the dependence as loop independent.

Preston

Hi Sanjoy,

Here’s a slightly fixed & improved version of the weak-crossing SIV test: https://sites.google.com/site/parallelizationforllvm/weak-crossing-siv-test

Preston

Hi Sanjoy,

Here’s a version of Banerjee and Wolfe’s Exact SIV test: https://sites.google.com/site/parallelizationforllvm/weak-siv-test
It assumes you’ve already filtered out the easy cases handled by ZIV, strong SIV, etc.

I’m not confident about my uses of APInt. If you have any comments, I’d love to hear them.

Thanks,
Preston

Hi Sanjoy,

Here's a version of Banerjee and Wolfe's Exact SIV test:
Exact SIV test - Parallelization for LLVM
It assumes you've already filtered out the easy cases handled by ZIV,
strong SIV, etc.

I'm not confident about my uses of APInt. If you have any comments,
I'd love to hear them.

Are you worried about properly setting the bit lengths, or about loss
of generality by restricting to constants? I've not thought about this
too deeply, but are there cases where some or all of the relevant
inputs are not constants, but yet we can still determine whether the
expressions involving them satisfy the required inequalities? One of
the nice things about SCEV is its ability, to some extent, to handle
and simplify "algebraic" expressions, and if appropriate, I think we
should use that ability.

Thanks again,
Hal

> Here's a version of Banerjee and Wolfe's Exact SIV test:
> Exact SIV test - Parallelization for LLVM
> It assumes you've already filtered out the easy cases handled by ZIV,
> strong SIV, etc.
>
> I'm not confident about my uses of APInt. If you have any comments,
> I'd love to hear them.

Are you worried about properly setting the bit lengths,

Yes, among other things. There are several inputs. Do I need to make
efforts to ensure they're all the same bit length? Is it appropriate
to do so much computation with APInts, or should I be working with
64-bit integers?

or about loss of generality by restricting to constants? I've not thought about this
too deeply, but are there cases where some or all of the relevant
inputs are not constants, but yet we can still determine whether the
expressions involving them satisfy the required inequalities? One of
the nice things about SCEV is its ability, to some extent, to handle
and simplify "algebraic" expressions, and if appropriate, I think we
should use that ability.

Yes, I agree. This particular test is not very flexible about symbolic
computation (computing the GCD of non-constant inputs seems tough);
one of the advantages of the more specialized tests (Strong SIV, et
al.) is that they let us get a bit further with symbolic inputs.

I've been studying the SCEV code and have ideas about how to rework
some of the SIV tests to be a bit more flexible in this respect. I
think they'll turn out well; the SCEV package is really quite
wonderful.

Preston

Hi all,

Sorry for having been quiet for so long, I have my university exams
going on, and will be able to contribute only after the coming Friday.

Thanks!