help with pointer-to-array conversion

I now understand that IndVarSimplify.cpp is capable of reproducing array references when the pointer initialization from the array address is found inside the immediately enclosing loop, such that in the following code:

int A[20000], B[100], Z;
int main()
{
         int i, j, *a, *b;
         for ( a = &A[0], i = 0; i != 200; i++ )
                 for ( b = &B[0], j = 0; j != 100; j++ )
                         Z += *a++ * *b++;
}

the pointer b can be transformed into an offset from B, but a cannot be transformed into an offset from A because the initialization is two levels up.

I started poking around in IndVarSimplify.cpp and I have a patch which should (at least partially) enable the transformation for a as well, but I find that although the code looks OK in debug printouts, the gccas output is unchanged. So two questions: 1) Am I on the right track with this transformation (patch enclosed) and 2) why does the output not reflect my changes?

Index: lib/Transforms/Scalar/IndVarSimplify.cpp

I now understand that IndVarSimplify.cpp is capable of reproducing array references when the pointer initialization from the array address is found inside the immediately enclosing loop, such that in the following code:

Ok.

int A[20000], B[100], Z;
int main()
{
       int i, j, *a, *b;
       for ( a = &A[0], i = 0; i != 200; i++ )
               for ( b = &B[0], j = 0; j != 100; j++ )
                       Z += *a++ * *b++;
}

the pointer b can be transformed into an offset from B, but a cannot be transformed into an offset from A because the initialization is two levels up.

Hrm, that should work.

I started poking around in IndVarSimplify.cpp and I have a patch which should (at least partially) enable the transformation for a as well, but I find that although the code looks OK in debug printouts, the gccas output is unchanged.

Ok

So two questions: 1) Am I on the right track with this transformation (patch enclosed) and

I haven't taken a look at the code. Could you please give a very simple example showing what it is supposed to change that is currently not handled? Also, can you make sure to wrap the lines to fit in 80-columns? Finally, please send diffs as "unified" diffs, this makes it easier to read. You can get unified diffs by passing "-u" to cvs diff or normal diff.

2) why does the output not reflect my changes?

There are a couple of different possibilities. I'm not sure exactly what you're trying to accomplish here, but one of the following may happen:

1. Something in the compiler is running after indvars that undoes what you
    are trying to do.
2. The input to indvars in not what you think it is, so running the pass
    in the middle of gccas is not giving you the same output as running it
    logically at the end of gccas with the opt program.

To isolate #2, it might be useful to get the full set of passes that gccas is doing. gccas will dump the passes if you run it like this:

gccas /dev/null -o /dev/null -debug-pass=Arguments

That way you can use the passes with 'opt' to see exactly what code is given to indvars.

-Chris

Index: lib/Transforms/Scalar/IndVarSimplify.cpp

RCS file: /var/cvs/llvm/llvm/lib/Transforms/Scalar/IndVarSimplify.cpp,v
retrieving revision 1.78
diff -r1.78 IndVarSimplify.cpp
310c310
< runOnLoop(*I);
---

        runOnLoop(LI, *I);

322,323c322,323
< void runOnLoop(Loop *L);
< void EliminatePointerRecurrence(PHINode *PN, BasicBlock *Preheader,
---

    void runOnLoop(LoopInfo *LI, Loop *L);
    void EliminatePointerRecurrence(LoopInfo *LI, Loop *L, PHINode *PN,

BasicBlock *Preheader,
361c361
< void IndVarSimplify::EliminatePointerRecurrence(PHINode *PN,
---

void IndVarSimplify::EliminatePointerRecurrence(LoopInfo *LI, Loop *L,

PHINode *PN,
366a367

Value *saved_trip_count = L->getTripCount(); // save before it's

clobbered
392c393,408
<
---

std::cerr << "\tGEPI: " << *GEPI;
      if (PHINode *phi = dyn_cast<PHINode>(GEPI->getOperand(0)))
      {
              if (ConstantExpr *CE =

dyn_cast<ConstantExpr>(phi->getIncomingValue(0)))

              if (CE->getOpcode() == Instruction::GetElementPtr) // We

want this to become explicit...

              {
std::cerr << "Block Before: " << *GEPI->getParent();
                      Instruction *mul =

BinaryOperator::createMul(saved_trip_count,

                              LI->getLoopFor( phi->getIncomingBlock(1) )
                              ->getCanonicalInductionVariable(), "mul.",

GEPI);

                      Instruction *add = BinaryOperator::createAdd(mul,

NewAdd, "add.", GEPI);

                      GEPI->setOperand( 0, phi->getIncomingValue(0) );
                      GEPI->setOperand( 1, add );
std::cerr << "Block After: " << *GEPI->getParent();
              }
      }

605c621,622
< void IndVarSimplify::runOnLoop(Loop *L) {
---

void IndVarSimplify::runOnLoop(LoopInfo *LI, Loop *L) {

617c634
< EliminatePointerRecurrence(PN, Preheader, DeadInsts);
---

      EliminatePointerRecurrence(LI, L, PN, Preheader, DeadInsts);

626c643
< runOnLoop(*I);
---

    runOnLoop(LI,*I);

652,653c669,670
< if (SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(SCEV))
< if (AR->getNumOperands() == 2 && isa<SCEVConstant>(AR->getOperand(1)))
---

        if (1)//SCEVAddRecExpr *AR = dyn_cast<SCEVAddRecExpr>(SCEV))
          if (1)//AR->getNumOperands() == 2 &&

isa<SCEVConstant>(AR->getOperand(1)))

Naftali

_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

-Chris

OK, thanks Chris, I've found that running

   opt -loopsimplify -instcombine -indvars -stats

gives me the setup I need for this transformation, and a small patch makes it happen in the simple case we discussed. However, I'm having some trouble when things get a bit more complicated with 3 nesting levels:

int A[3000000], B[20000], C[100], Z;
int main()
{
         int i, j, k, *a, *b, *c;
         for ( a = &A[0], i = 0; i != 300; i++ )
                 for ( b = &B[0], j = 0; j != 200; j++ )
                         for ( c = &C[0], k = 0; k != 100; k++ )
                                 Z += *a++ * *b++ * *c++;
}

My main problem is maintaining the loop nesting structure, which some pass in llvm seems to want to collapse, and then scalar evolution can't find all the predictable loop counts. However, when I insert a volatile memory operation inside of each loop, this preserves the loop structure and scalar evolution finds all the loop counts. What to do?

Thanks,
Naftali

OK, thanks Chris,

Hi Naftali, sorry I left you hanging. This dropped off my radar.

I've found that running
  opt -loopsimplify -instcombine -indvars -stats

gives me the setup I need for this transformation, and a small patch makes it happen in the simple case we discussed.

I have been working on a strength reduction pass for LLVM, which recently got the ability to reduce induction variables with variable strides.

Because of this, I took out the limitation of -indvars that prevented it from promoting indvars with variable strides (which, without strength reduction, would result in it introducing multiplies into the bodies of loops in the generated code, very bad).

However, I'm having some trouble when things get a bit more complicated with 3 nesting levels:

int A[3000000], B[20000], C[100], Z;
int main()
{
       int i, j, k, *a, *b, *c;
       for ( a = &A[0], i = 0; i != 300; i++ )
               for ( b = &B[0], j = 0; j != 200; j++ )
                       for ( c = &C[0], k = 0; k != 100; k++ )
                               Z += *a++ * *b++ * *c++;
}
My main problem is maintaining the loop nesting structure, which some pass in llvm seems to want to collapse, and then scalar evolution can't find all the predictable loop counts. However, when I insert a volatile memory operation inside of each loop, this preserves the loop structure and scalar evolution finds all the loop counts. What to do?

With current CVS (including the recent change to indvars), this now becomes three seperate loops:

$ llvm-gcc big.c -c -o - | opt -loopsimplify | analyze -loops
Printing analysis 'Natural Loop Construction' for function 'main':
Loop Containing: %loopentry.1, %loopexit.1, %loopexit.2, %no_exit.2, %no_exit.2.outer
     Loop Containing: %no_exit.2.outer, %loopexit.2, %no_exit.2
         Loop Containing: %no_exit.2

Note that loop simplify is needed to put the loops into canonical form (unmerge them). Without it, we only get two natural loops (one has two backedges):

$ llvm-gcc big.c -c -o - | analyze -loops
Printing analysis 'Natural Loop Construction' for function 'main':
Loop Containing: %loopentry.1, %loopexit.1, %loopexit.2, %no_exit.2
     Loop Containing: %no_exit.2, %loopexit.2

Please give current CVS a try and lemme know if you have any problems. I'm going to look at your GEP promotion change next.

Thanks,

-Chris