[Polly] Analysis of the expensive compile-time overhead of Polly Dependence pass

Hi Sebastian,

Recently, I found the “Polly - Calculate dependences” pass would lead to significant compile-time overhead when compiling some loop-intensive sourc! e code. Tobias told me you found similar problem as follows:
http://llvm.org/bugs/show_bug.cgi?id=14240

My evaluation shows that “Polly - Calculate dependences” pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results ! can be seen on [http://llvm.org/bugs/show_bug.cgi?id=14240](http://llvm.org/bugs/show_bug.cgi?id=14
240).

Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated.

Thanks,
Star Tan

Hi Star Tan,

thanks for looking into this. Just to sum up, the test case that
shows this slowness is the following:

void bar();
void foo(short *input) {
   short x0, x1;
   char i, ctr;

   for(i = 0; i < 8; i++) {

     // SCoP begin
     for (ctr = 0; ctr < 8; ctr++) {
       x0 = input[i*64 + ctr*8 + 0] ;
       x1 = input[i*64 + ctr*8 + 1] ;

       input[i*64 + ctr*8 + 0] = x0 - x1;
       input[i*64 + ctr*8 + 1] = x0 + x1;
       input[i*64 + ctr*8 + 2] = x0 * x1;
     }
     // SCoP end

     bar(); // Unknown function call stops further expansion of SCoP
   }
}

Which is translated to the following scop:

     Context:
     [p_0, p_1, p_2] -> { : p_0 >= -2147483648 and p_0 <= 2147483647 and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and p_2 <= 2147483647 }
     p_0: {0,+,128}<%for.cond2.preheader>
     p_1: {2,+,128}<%for.cond2.preheader>
     p_2: {4,+,128}<%for.cond2.preheader>
[...]

The interesting observation is, that Polly introduces three parameters (p_0, p_1, p_2) for this SCoP, even though in the C source code only the variable 'i' is SCoP invariant. However, due to the way SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant SCEVExpr(essions) which are all translated into independent parameters.
However, as we can see, the only difference between the three
parameters is a different constant in the base of the AddRecExpr. If we would just introduce p_0 (the parameter where the scev-base is zero) and express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve this problem very quickly.
There are other ways to improve performance which include further tuning isl or reducing the precision of the analysis, but in this case
I don't think we need to look into them. The above fix should give us good performance and additionally also increases the precision of the result (as isl now understands the relation between the different parameters).

To fix this, you probably would like to look into the SCEVAffinator class and the way parameters are handled.

Cheers,
Tobias

>On 07/25/2013 09:01 PM, Star Tan wrote:
>> Hi Sebastian,
>>
>>
>> Recently, I found the "Polly - Calculate dependences" pass would lead to significant compile-time overhead when compiling some loop-intensive source code. Tobias told me you found similar problem as follows:
>> http://llvm.org/bugs/show_bug.cgi?id=14240
>>
>>
>> My evaluation shows that "Polly - Calculate dependences" pass consumes 96.4% of total compile-time overhead when compiling the program you proposed. It seems that the expensive compile-time overhead comes from those loop operations in ISL library. Some preliminary results can be seen on http://llvm.org/bugs/show_bug.cgi?id=14240.
>>
>>
>> Sebastian, I wonder whether you have further results or suggestions besides those words posted on the Bugzilla page? Any information or suggestion would be appreciated.
>
>Hi Star Tan,
>
>thanks for looking into this. Just to sum up, the test case that
>shows this slowness is the following:
>
>void bar();
>void foo(short *input) {
>   short x0, x1;
>   char i, ctr;
>
>   for(i = 0; i < 8; i++) {
>
>     // SCoP begin
>     for (ctr = 0; ctr < 8; ctr++) {
>       x0 = input[i*64 + ctr*8 + 0] ;
>       x1 = input[i*64 + ctr*8 + 1] ;
>
>       input[i*64 + ctr*8 + 0] = x0 - x1;
>       input[i*64 + ctr*8 + 1] = x0 + x1;
>       input[i*64 + ctr*8 + 2] = x0 * x1;
>     }
>     // SCoP end
>
>     bar(); // Unknown function call stops further expansion of SCoP
>   }
>}
>
>Which is translated to the following scop:
>
>     Context:
>     [p_0, p_1, p_2] -> {  : p_0 >= -2147483648 and p_0 <= 2147483647 
>and p_1 >= -2147483648 and p_1 <= 2147483647 and p_2 >= -2147483648 and 
>p_2 <= 2147483647 }
>     p_0: {0,+,128}<%for.cond2.preheader>
>     p_1: {2,+,128}<%for.cond2.preheader>
>     p_2: {4,+,128}<%for.cond2.preheader>
>[...]
>
>The interesting observation is, that Polly introduces three parameters 
>(p_0, p_1, p_2) for this SCoP, even though in the C source code only the 
>variable 'i' is SCoP invariant. However, due to the way 
>SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant 
>SCEVExpr(essions) which are all translated into independent parameters.
>However, as we can see, the only difference between the three
>parameters is a different constant in the base of the AddRecExpr. If we 
>would just introduce p_0 (the parameter where the scev-base is zero) and 
>express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve 
>this problem very quickly.
>There are other ways to improve performance which include further tuning 
>isl or reducing the precision of the analysis, but in this case
>I don't think we need to look into them. The above fix should give us 
>good performance and additionally also increases the precision of the 
>result (as isl now understands the relation between the different 
>parameters).
>
>To fix this, you probably would like to look into the SCEVAffinator 
>class and the way parameters are handled.
>
Thank you so much. Maybe I was in the wrong direction at first -:)

Yes, you are right. I should first investigate why Polly introduces three parameters rather than just one parameter.
Best wishes,
Star Tan

Hi Tobias,

I tried to investigated the problem related to ScopInfo, but I need your
help on handling some problems about ISL and SCEV.

I copied the list as the discussion may be helpful for others.

@Sven, no need to read all. Just search for your name.

[..]

The interesting observation is, that Polly introduces three parameters
(p_0, p_1, p_2) for this SCoP, even though in the C source code only the
variable 'i' is SCoP invariant. However, due to the way
SCEVExpr(essions) in LLVM are nested, Polly sees three scop-invariant
SCEVExpr(essions) which are all translated into independent parameters.
However, as we can see, the only difference between the three
parameters is a different constant in the base of the AddRecExpr. If we
would just introduce p_0 (the parameter where the scev-base is zero) and
express any use of p_1 as p_0 + 2 and p_2 as p_0 + 4, isl could solve
this problem very quickly.

Currently, Polly introduces three parameters because it treats (base, step) as a whole for each parameter. You are right, it seems we could usea single parameter to represent all these accesses.

I have attached a patch file for this purpose. This patch file is just a hack file to see whether three parameters can be merged into one parameter and whether the compile-time could be reduce after that. The key of this patch is the added source code inScop::addParams, which only add parameters if the new parameter has different "step" value. In other ways, we skip the "base" value when compare parameters.

With this patch file, the Scop becomes:

Context:
     p0: {0,+,128}<%for.cond2.preheader>
     Statements {
       Stmt_for_body6
             Domain :=
                 { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 };
             Scattering :=
                 { Stmt_for_body6[i0] -> scattering[0, i0, 0] };
             ReadAccess :=
                 [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 };
             ReadAccess :=
                 [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 };
             MustWriteAccess :=
                 [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = p_0 + 16i0 };
             MustWriteAccess :=
                 [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 };
             MustWriteAccess :=
                 [p_0] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : 2o0 = 4 + p_0 + 16i0 };
     }

Unfortunately, the compile-time is still very high!

This _looks_ already a lot better. Very nice!

I have investigated a little about it. Maybe this is becauseisl_union_map_add_mapin Polly-dependencewould still introduces new parameters based on the resulted produced by ScopInfo.

Without this patch file, the Read, Write, MayWrite (input to ISL library)calculated by collectInfo are:

Read:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_1 + 16i0 and i0 >= 0 and i0 <= 7) }
Write:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_1 + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = p_2 + 16i0 and i0 >= 0 and i0 <= 7) }
Maywrite:[p_0, p_1, p_2] -> { }
Schedule:[p_0, p_1, p_2] -> { Stmt_for_body6[i0] -> scattering[0, i0, 0] }

But with our patch file, they become:

Read: [p_0, p_0'] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 2 + p_0 + 16i0 and i0 >= 0 and i0 <= 7) }
Write: [p_0, p_0', p_0''] -> { Stmt_for_body6[i0] -> MemRef_input[o0] : (2o0 = p_0'' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 2 + p_0' + 16i0 and i0 >= 0 and i0 <= 7) or (2o0 = 4 + p_0 + 16i0 and i0 >= 0 and i0 <= 7) }
MayWrite: { }
Schedule: { Stmt_for_body6[i0] -> scattering[0, i0, 0] }
It seems that theisl_union_map_add_map function called from Dependences::collectInfoautomatically introduces extra parameters, but I have no idea why this happens.

Could you give me some further suggestion on this problem?

Oh, you are on the right way. Even though we only see parameters 'p_0', isl seems to believe those values 'p_0' are not identical. We see this as isl renames them such that the difference becomes obvious. Unfortunately, as long as isl does not believe those parameters are identical, it will still be stuck in long calculations.

To understand the above behaviour we need to understand how isl determines if two parameter dimensions refer to the same parameter. What you would probably expect is that the union of the two sets [p] -> {[i] = p} and [p] -> {[i] = p} is a single set [p] -> {[i] = p} as the two parameters have the very same name. However, this is not always true. Even though parameters share the same name, they may in fact be different isl_id objects. isl_id_alloc() takes a parameter name and a pointer and only if both of them are identical, the very same isl_id object is returned.

Sven: In terms of making the behaviour of isl easier to understand,
it may make sense to fail/assert in case operands have parameters that
are named identical, but that refer to different pointer values.

So the question is why do you obtain such different isl_ids and what
can be done to fix this. I don't have a solution ready, but I have
some questions in the patch that may help you when looking into this.

diff --git a/include/polly/ScopInfo.h b/include/polly/ScopInfo.h
index 8c56582..bab7763 100755
--- a/include/polly/ScopInfo.h
+++ b/include/polly/ScopInfo.h
@@ -445,6 +445,7 @@ class Scop {
    /// The isl_ids that are used to represent the parameters
    typedef std::map<const SCEV *, int> ParamIdType;
    ParamIdType ParameterIds;
+ ParamIdType ParameterIdsSpace;

Why are you introducing another member variable here? Why don't you keep
using ParameterIds? What is the this variable supposed to track?

diff --git a/lib/Analysis/Dependences.cpp b/lib/Analysis/Dependences.cpp
index 5a185d0..9f918f3 100644
--- a/lib/Analysis/Dependences.cpp
+++ b/lib/Analysis/Dependences.cpp
@@ -30,6 +30,9 @@
  #include <isl/map.h>
  #include <isl/set.h>

+#define DEBUG_TYPE "polly-dependence"
+#include "llvm/Support/Debug.h"
+
  using namespace polly;
  using namespace llvm;

@@ -88,8 +91,15 @@ void Dependences::collectInfo(Scop &S, isl_union_map **Read,
  void Dependences::calculateDependences(Scop &S) {
    isl_union_map *Read, *Write, *MayWrite, *Schedule;

+ DEBUG(dbgs() << "Scop: " << S << "\n");
+
    collectInfo(S, &Read, &Write, &MayWrite, &Schedule);

+ DEBUG(dbgs() << "Read: " << Read << "\n";
+ dbgs() << "Write: " << Write << "\n";
+ dbgs() << "MayWrite: " << MayWrite << "\n";
+ dbgs() << "Schedule: " << Schedule << "\n");
+
    if (OptAnalysisType == VALUE_BASED_ANALYSIS) {
      isl_union_map_compute_flow(
          isl_union_map_copy(Read), isl_union_map_copy(Write),
@@ -131,6 +141,8 @@ void Dependences::calculateDependences(Scop &S) {
    RAW = isl_union_map_coalesce(RAW);
    WAW = isl_union_map_coalesce(WAW);
    WAR = isl_union_map_coalesce(WAR);
+
+ DEBUG(printScop(dbgs()));
  }

This patch looks good/helpful by itself. I propose to submit it separately for commit.

  bool Dependences::runOnScop(Scop &S) {
diff --git a/lib/Analysis/ScopInfo.cpp b/lib/Analysis/ScopInfo.cpp
index aa72f3e..6fc838e 100644
--- a/lib/Analysis/ScopInfo.cpp
+++ b/lib/Analysis/ScopInfo.cpp
@@ -86,6 +86,14 @@ public:
        isl_aff *Affine =
            isl_aff_zero_on_domain(isl_local_space_from_space(Space));
        Affine = isl_aff_add_coefficient_si(Affine, isl_dim_param, 0, 1);
+ if (Scev->getSCEVType() == scAddRecExpr) {
+ const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(Scev);

The canonical pattern for this is:

if (SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(Scev)) {

+ const SCEVConstant *c = cast<SCEVConstant>(AR->getOperand(0));

This is obviously a hack. The base is not always a constant.

You can probably just call use something like,

isl_pw_aff *BaseValue = visit(AR->getOperand(0))
Affine = isl_pw_aff_sum(Affine, BaseValue);

+ isl_int t;
+ isl_int_init(t);
+ isl_int_set_si(t, c->getValue()->getSExtValue());

We now use isl_val instead of isl_int.

        return isl_pw_aff_alloc(Domain, Affine);
      }
@@ -717,6 +725,34 @@ void Scop::addParams(std::vector<const SCEV *> NewParameters) {
      if (ParameterIds.find(Parameter) != ParameterIds.end())
        continue;

+ if (Parameter->getSCEVType() == scAddRecExpr) {
+ int index, maxIndex = Parameters.size();
+ const SCEVAddRecExpr *AR = cast<SCEVAddRecExpr>(Parameter);
+
+ for (index = 0; index < Parameters.size(); ++index) {
+ const SCEV *pOld = Parameters[index];
+
+ if (pOld->getSCEVType() != scAddRecExpr)
+ continue;
+
+ const SCEVAddRecExpr *OAR = cast<SCEVAddRecExpr>(pOld);
+ if (AR->getNumOperands() != OAR->getNumOperands())
+ continue;
+
+ unsigned i, e;
+ for (i = 1, e = AR->getNumOperands(); i != e; ++i){
+ if (AR->getOperand(i) != OAR->getOperand(i))
+ break;
+ }
+
+ if (i == e) {
+ // Parameters.push_back(Parameter);
+ ParameterIds[Parameter] = index;
+ return;
+ }
+ }
+ }

I think this is the right idea, but probably the wrong place to put it.

I would put this into SCEVValidator::visitAddRecExpr. This function always adds the AddRecExpr itself as a parameter, whenever it is found to be parametric. However, what we should do is to create a new ScevExpr
that starts at zero and is otherwise identical. We then add this as a parameter. When doing this, we now also need to keep all the parameters
that have been found previously in the base expression.

      int dimension = Parameters.size();

      Parameters.push_back(Parameter);
@@ -777,9 +813,9 @@ void Scop::addParameterBounds() {

  void Scop::realignParams() {
    // Add all parameters into a common model.
- isl_space *Space = isl_space_params_alloc(IslCtx, ParameterIds.size());
+ isl_space *Space = isl_space_params_alloc(IslCtx, ParameterIdsSpace.size());

- for (ParamIdType::iterator PI = ParameterIds.begin(), PE = ParameterIds.end();
+ for (ParamIdType::iterator PI = ParameterIdsSpace.begin(), PE = ParameterIdsSpace.end();

I do not see why those changes are necessary.

         PI != PE; ++PI) {
      const SCEV *Parameter = PI->first;
      isl_id *id = getIdForParam(Parameter);

@@ -787,7 +823,7 @@ void Scop::realignParams() {
    }

    // Align the parameters of all data structures to the model.
- Context = isl_set_align_params(Context, Space);
+// Context = isl_set_align_params(Context, Space);

Also no idea why this one is necessary.

Hope this helps,
Tobias

Yes, this is how it is today. I wondered if there is actually a need to
allow the use of different identifiers with the same name (except all being unnamed?). I personally do not see such a need and would prefer isl to assert/fail in case someone tries to do so. This may avoid confusions as happened here. Do you see a reason why isl should allow this?

Cheers,
Tobias

Sven: In terms of making the behaviour of isl easier to understand,
it may make sense to fail/assert in case operands have parameters that
are named identical, but that refer to different pointer values.

No, you are allowed to have different identifiers with the same name.
I could optionally print the pointer values, but then I'd have
to think about what to do with them when reading a textual
representation of a set with such pointer values in them.

Yes, this is how it is today.

No, the pointer values are currently not printed.

I was referring to the first sentence. I do not think printing pointer values is what we want. It would make the output unpredictable not only when address space randomisation is involved.

I wondered if there is actually a need to
allow the use of different identifiers with the same name (except all being
unnamed?). I personally do not see such a need and would prefer isl to
assert/fail in case someone tries to do so. This may avoid confusions as
happened here. Do you see a reason why isl should allow this?

Removing this feature would break existing users.

Even if it would, the benefits for future users may outweigh this.
Also, are you aware of a user that actually breaks?

Anyway, on the Polly side we know the behaviour and can handle it. So this is nothing I am very strong about. I just mentioned it as it seemed to be a good idea.

Cheers,
Tobias

Hi Tobias and Sven,

Thanks for your discussion and suggestion.

@Sven: ISL actually allows users to have different identifiers with the same name. The problem that we have discussed is caused by incorrect usage of isl_space in Polly, so please do not worry about ISL library. You can skip the following information related to Polly implementation.

@Tobias and Polly developers:
I have attached a patch file to fix this! problem. The key idea is to keep ISL library always seeing the same single parameter for those memory accesses that use the same index variables even though they have different constant base values.

For the example we have seen before:
for(i = 0; i < 8; i++) { for (ctr = 0; ctr < 8; ctr++) { x1 = input[i*64 + ctr*8 + 1] ; x0 = input[i*64 + ctr*8 + 0] ; input[i*64 + ctr*8 + 0] = x0 - x1; input[i*64 + ctr*8 + 1] = x0 + x1; input[i*64 + ctr*8 + 2] = x0 * x1; }

Without this patch file, Polly would produce the Context as follows:

Context:
[p_0, p_1, p_2] → { : p_0 >= -9223372036854775808 and p_0 <= 9223372036854775807 and p_1 >= -9223372036854775808 and p_1 <= 9223372036854775807 and p_2 >= -9223372036854775808 and p_2 <= 9223372036854775807 } p0: {0,+,128}<%for.cond2.preheader> p1: {2,+,128}<%for.cond2.preheader> p2: {4,+,128}<%for.cond2.preheader> Statements { Stmt_for_body6 Domain := [p_0, p_1, p_2] → { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 }; Scattering := [p_0, p_1, p_2] → { Stmt_for_body6[i0] → scattering[0, i0, 0] }; ReadAccess := [p_0, p_1, p_2] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_0 + 16i0 }; ReadAccess := [p_0, p_1, p_2] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_0 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_1 + 16i0 }; MustWriteAccess := [p_0, p_1, p_2] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_2 + 16i0 }; }

Furthermore, it will produce very complex RAW, WAW and WAR dependence as follows:

RAW dependences: [p_0, p_1, p_2] → { Stmt_for_body6[i0] → Stmt_for_body6[o0] : (exists (e0 = [(p_1)/2], e1 = [(7p_1 + p_2 + 112i0)/16]: 16o0 = -p_0 + p_1 + 16i0 and 2e0 = p_1 and 16i0 >= -p_1 + p_2 and 16i0 <= 112 + p_0 - p_1 and p_1 >= 16 + p_0 and i0 >= 0 and 16e1 >= -15 + 7p_1 + p_2 + 112i0 and 16e1 <= -1 + 7p_1 + p_2 + 112i0 and p_2 >= 16 + p_0)) or … or (exists (e0 = [(p_1)/2], e1 = [(-p_1 + p_2)/16]: 16o0 = p_0 - p_1 + 16i0 and 2e0 = p_1 and 16e1 = -p_1 + p_2 and 16i0 >= -p_0 + p_2 and 16i0 <= 112 - p_0 + p_1 and p_1 <= -16 + p_0 and p_2 >= 16 + p_0)) }

It not only leads to significant compile-time overhead, but also produces incorrect dependence results. As we can see from the source code, there should be no RAW, WAW and WAR across loop iterations at all.

With this patch file, Polly would produce the following Context and RAW, WAW, WAR dependence:

!
Context:
[p_0] → { : p_0 >= -9223372036854775808 and p_0 <= 9223372036854775807 }
p0: {0,+,128}<%for.cond2.preheader>
Statements {
Stmt_for_body6
Domain :=
[p_0] → { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 };
Scattering :=
[p_0] → { Stmt_for_body6[i0] → scattering[0, i0, 0] };
ReadAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_0 + 16i0 };
Rea! dAccess :=
& nbsp; [p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 };
MustWriteAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_0 + 16i0 };
MustWriteAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = 2 + p_0 + 16i0 };
MustWriteAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = 4 + p_0 + 16i0 };
}

RAW dependences:
[p_0] → { }
WAR dependences:
[p_0] → { }
WAW dependences:
[p_0] → { }

We can see it only creates a single parameter and it can catch the exact RAR, WAW and WAW dependence.

To see whether our patch file could catch normal data dependence, I have constructed another example as follows:

for(i = 0; i < 8; i++) {
for (ctr = 0; ctr < 8; ctr++) {
x0 = input[i*64 + ctr + 0] ;
x1 = input[i*64 + ctr + 1] ;
input[i*64 + ctr + 0] = x0 ! - x1;
input[i*64 + ctr + 1] = x0 + x1;< /div>
input[i*64 + ctr + 2] = x0 * x1;
}

The original Polly would produce similar complex results. However, with the attached patch file, Polly would produce the following Context and RAW, WAW, WAR dependence:

Context:
[p_0] → { : p_0 >= -2147483648 and p_0 <= 2147483647 }
p0: {0,+,128}<%for.cond2.preheader>
Statements {
Stmt_for_body6
Domain :=
[p_0] → { Stmt_for_body6[i0] : i0 >= 0 and i0 <= 7 };
Scattering :=
! ; [p_0] → { Stmt_for_body6[i0] → scattering[0, i0, 0] };
ReadAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_0 + 2i0 };
ReadAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = 2 + p_0 + 2i0 };
MustWriteAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = p_0 + 2i0 };
MustWriteAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = 2 + p_0 + 2i0 };
! MustWriteAccess :=
[p_0] → { Stmt_for_body6[i0] → MemRef_input[o0] : 2o0 = 4 + p_0 + 2i0 };
}

RAW dependences:
[p_0] → { Stmt_for_body6[i0] → Stmt_for_body6[1 + i0] : exists (e0 = [(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6) }
WAR dependences:
[p_0] → { }
WAW dependences:
[p_0] → { Stmt_for_body6[i0] → Stmt_for_body6[1 + i0] : exists (e0 = [(p_0)/2]: 2e0 = p_0 and i0 >= 0 and i0 <= 6)

Results show that it still creates a single p! arameter and catches the exact RAR, WAW and WAW dependence. Of course, this patch file makes the compiling very faster, i.e., compile-time is reduced from several minutes to less then 1 second.

@Tobias:

>This is obviously a hack. The base is not always a constant.
>You can probably just call use something like,
>isl_pw_aff *BaseValue = visit(AR->getOperand(0))
>Affine = isl_pw_aff_sum(Affine, BaseValue);

Currently, I only handle constant base values because I have not found a good way to handle general base values. isl_pw_aff_add requires two isl_pw_aff parameters, but Affine is actually of isl_aff type. Perhaps we could first commit a patch file to handle common cases, then we can considering submitting another patch file to handle general cases.

>I think this is the right idea, but probably the wrong place to put it.
>I would put this into SCEVValidator::visitAddRecExpr. This function 
>always adds the AddRecExpr itself as a parameter, whenever it is found 
>to be parametric. However, what we should do is to create a new ScevExpr
>that starts at zero and is otherwise identical. We then add this as a 
>parameter. When doing this, we now also need to keep all the parameters
>that have been found previously in the base expression.
A lot of Polly functions access ParameterIds and Parameters using existing ScevExpr. If we create new ScevExpr and put them into Parameters and ParameterIds, it may require some extra tricks to handle the mapping from existing ScevExpr to newly created ScevExpr. I think we can consider fixing it later.
Cheers,
Star Tan

[0001-ScopInfo-Create-shared-parameter-for-memory-accesses.patch|attachment](upload://kXTOxlPTAtdlKMZF96IA3tCqxtM.patch) (6.54 KB)

Hi Tobias and Sven,

Thanks for your discussion and suggestion.>
@Tobias and Polly developers:
I have attached a patch file to fix this problem. The key idea is to
keep ISL library always seeing the same single parameter for those
memory accesses that use the same index variables even though they have
different constant base values.

Good.

[.. Very nice improvements .. ]

This is obviously a hack. The base is not always a constant.
You can probably just call use something like,
isl_pw_aff *BaseValue = visit(AR->getOperand(0))
Affine = isl_pw_aff_sum(Affine, BaseValue);

Currently, I only handle constant base values because I have not found a
good way to handle general base values. isl_pw_aff_add requires two
isl_pw_aff parameters, but Affine is actually of isl_aff type. Perhaps
we could first commit a patch file to handle common cases, then we can
considering submitting another patch file to handle general cases.

No, we should get this right from the start (and also add test cases).

isl provides a function to create an isl_pw_aff from a isl_aff. You can just use that.

I think this is the right idea, but probably the wrong place to put it.
I would put this into SCEVValidator::visitAddRecExpr. This function
always adds the AddRecExpr itself as a parameter, whenever it is found
to be parametric. However, what we should do is to create a new ScevExpr
that starts at zero and is otherwise identical. We then add this as a
parameter. When doing this, we now also need to keep all the parameters
that have been found previously in the base expression.

A lot of Polly functions access ParameterIds and Parameters using existing ScevExpr. If we create new ScevExpr and put them into Parameters and ParameterIds, it may require some extra tricks to handle the mapping from existing ScevExpr to newly created ScevExpr. I think we can consider fixing it later.

Mh, I would think similar tricks are needed to perform your parameter remapping. I would prefer we think this through now and then choose a solution that works for us.

Tobi