Computing loop trip counts with Scalar evolution

Hello.
     I tried to get the trip count of a loop with Scalar evolution. I got inspired from How to get loop bounds in LLVM? - Stack Overflow .
     However the analysis described there doesn't work well for the second inner loop of thes function below (although if we declare Bcols a short it works well):
       void MatMul(int Arows, int Acols, int Brows, int Bcols) {
         short i, j, k;
         for (i = 0; i < Arows; ++i) {
           for (j = 0; j < Bcols; ++j) {
             C[i][j] = 0;
             for (k = 0; k < Acols; ++k) {
               C[i][j] += A[i][k] * B[j][k];
             }
           }
         }
       }

     However, I discovered in LoopVectorize.cpp (http://llvm.org/docs/doxygen/html/LoopVectorize_8cpp_source.html) we have the method InnerLoopVectorizer::getOrCreateTripCount() that seems to do a better job at computing the trip count, even if the implementation differences are not big. The differences are subtle - first of all the method getOrCreateTripCount() doesn't call hasLoopInvariantBackedgeTakenCount().

     Please don't hesitate to comment why InnerLoopVectorizer::getOrCreateTripCount() works better. I will try to come back myself with more info.

   Thank you,
     Alex

PS: Could you please recommend me one important paper for Scalar evolutions?

Your function is weird because the compiler can't prove whether the induction variables overflow. Here's the output of "clang -O2 -emit-llvm -S | opt -analyze -scalar-evolution":

Determining loop execution counts for: @MatMul
Loop %for.body13: Unpredictable backedge-taken count.
Loop %for.body13: Unpredictable max backedge-taken count.
Loop %for.body13: Predicated backedge-taken count is (-1 + %Acols)
  Predicates:
     {1,+,1}<%for.body13> Added Flags: <nssw>

Loop %for.body6: Unpredictable backedge-taken count.
Loop %for.body6: Unpredictable max backedge-taken count.
Loop %for.body6: Predicated backedge-taken count is (-1 + %Bcols)
  Predicates:
     {1,+,1}<%for.body6> Added Flags: <nssw>

Loop %for.cond2.preheader: Unpredictable backedge-taken count.
Loop %for.cond2.preheader: Unpredictable max backedge-taken count.
Loop %for.cond2.preheader: Predicated backedge-taken count is (-1 + %Arows)
  Predicates:
     {1,+,1}<%for.cond2.preheader> Added Flags: <nssw>

-Eli

We have the predicated backedge taken count, so the loop should be vectorizable.

I think the fact that getOrCreateTripCount doesn’t use the predicated backedge taken count is probably a bug (and might blow up in certain conditions).

-Silviu