Question on induction variable simplification pass

pankajchawla · April 13, 2017, 12:21am

Hi all,

It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution’s analysis conservative which can lead to missed performance opportunities.

For example, consider this loopnest-

int i, j;

for(i=0; i< 40; i++)

for(j=0; j< i-1; j++)

A[i+j] = j;

We are mainly interested in the backedge taken count of the inner loop.

Before indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-

backedge-taken count is 39

max backedge-taken count is 39

Inner loop-

backedge-taken count is {-2,+,1}<%for.cond1.preheader>

max backedge-taken count is 37

After indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-

backedge-taken count is 39

max backedge-taken count is 39

Inner loop-

backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> to i64))

max backedge-taken count is -1

If we generate sext instead of zext, the information computed by ScalarEvolution is as follows-

Outer loop-

backedge-taken count is 39

max backedge-taken count is 39

Inner loop-

backedge-taken count is {-2,+,1}<%for.cond1.preheader>

max backedge-taken count is -2

We now have a simplified backedge taken count for the inner loop which is almost as precise as before indvars, except for the flag instead of flag. I think ScalarEvolution should be able to precisely deduce wrap flags in the presence of sext but this may require a separate fix. The reason for the conservative max backedge taken count may be the same.

Thanks,

Pankaj

Finkel_Hal_J · April 13, 2017, 12:38am

[+Sanjoy]

The fact that we lose information by widening during induction-variable simplification is certainly a known problem. I don’t recall if we’ve ever really decided on a path forward. I personally suspect that, as an information-destroying transformation, the widening should be moved to the lowering phase (i.e. near where we do vectorization, etc.). Unless the widening itself enables other transformations, I don’t see why we should do it early. The one exception I can think of is where it might enable us to collapse redundant PHIs, as is:

int i = 0; long j = 0;
for (; i < n; ++i, ++j) { … using i and j … }

but that seems like a special case we could handle separately.

Â -Hal

Sanjoy_Das1 · April 14, 2017, 3:34am

Hi Pankaj,

It looks like the induction variable simplification pass prefers doing a zero-extension
to compute the wider trip count of loops when extending the IV. This can sometimes result
in loss of information making ScalarEvolution's analysis conservative which can lead
to missed performance opportunities.

For example, consider this loopnest-

int i, j;
for(i=0; i< 40; i++)
for(j=0; j< i-1; j++)
A[i+j] = j;

We are mainly interested in the backedge taken count of the inner loop.

Before indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-
backedge-taken count is 39
max backedge-taken count is 39

Inner loop-
backedge-taken count is {-2,+,1}<%for.cond1.preheader>
max backedge-taken count is 37

After indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-
backedge-taken count is 39
max backedge-taken count is 39

Inner loop-
backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> to i64))
max backedge-taken count is -1

One approach is to use the facts:

- The inner loop will not be entered in the 0th iteration of
<%for.cond1.preheader>
- {-1,+,1}<%for.cond1.preheader> is s< 40

to simplify the above to {-2,+,1}<%for.cond1.preheader> (in i64). The
original expression was not -2 in the 0th iteration of
<%for.cond1.preheader>, but we don't care about that iteration of
<%for.cond1.preheader> since we won't enter the inner loop.

The other option is to widen of IVs late, as a "lowering"
transformation, like Hal said. That's a more invasive change, but if
you have time and resources, it would be nice to at least give it a
shot, measure and see what falls over.

If we generate sext instead of zext, the information computed by ScalarEvolution is
as follows-

Outer loop-
backedge-taken count is 39
max backedge-taken count is 39

Inner loop-
backedge-taken count is {-2,+,1}<%for.cond1.preheader>
max backedge-taken count is -2

We now have a simplified backedge taken count for the inner loop which is almost as precise
as before indvars, except for the flag instead of flag. I think ScalarEvolution

(JFYI: My mail client's compose ate the <nsw> and <nw>)

Can you please share the IR that you piped through SCEV?

My guess is that SCEV did not "try to" infer a more aggressive no-wrap
flag for {-2,+,1} -- most of the no-wrap inferring logic kicks in when
you try to sign/zero extend an add recurrence.

One suspicious bit here is the "max backedge-taken count is -2" bit.
I expected SCEV to have inferred the max trip count to be 37 in
this second case.

-- Sanjoy

pankajchawla · April 14, 2017, 11:55pm

Hi Sanjoy,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should try to preserve as much information as it can. This means that we should generate sext for signed IVs and vice-versa. I believe this is a better approach as it preserves the information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

Moving it to a different location can be done separately.

Do you agree?

Thanks,
Pankaj

indvars.ll (1.4 KB)

Sanjoy_Das1 · April 17, 2017, 4:29am

Hi Pankaj,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

Thanks!

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should
try to preserve as much information as it can. This means that we should generate sext
for signed IVs and vice-versa. I believe this is a better approach as it preserves the
information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

I'll be happy to review patches making indvars behave better here
(i.e. not "break" loop trip counts like this).

I don't think the IV is the most relevant bit here though -- it looks
like (only a guess) indvars is faltering here:

github.com

llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/IndVarSimplify.cpp#L2240


      
              return false;
          
            const SCEV *Step = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*SE));
            if (!Step || !Step->isOne())
              return false;
          
            int LatchIdx = Phi->getBasicBlockIndex(L->getLoopLatch());
            Value *IncV = Phi->getIncomingValue(LatchIdx);
            return (getLoopPhiForCounter(IncV, L) == Phi);
          }
          
          /// Search the loop header for a loop counter (anadd rec w/step of one)
          /// suitable for use by LFTR.  If multiple counters are available, select the
          /// "best" one based profitable heuristics.
          ///
          /// BECount may be an i8* pointer type. The pointer difference is already
          /// valid count without scaling the address stride, so it remains a pointer
          /// expression as far as SCEV is concerned.
          static PHINode *FindLoopCounter(Loop *L, BasicBlock *ExitingBB,
                                          const SCEV *BECount,
                                          ScalarEvolution *SE, DominatorTree *DT) {

and that logic needs to be made smarter to account for how much the
RHS of the LFTR'ed exit condition is simplified after extension.

Moving it to a different location can be done separately.

Do you agree?

Sounds good!

Thanks!
-- Sanjoy

Sanjoy_Das1 · April 17, 2017, 4:29am

Hi Pankaj,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

Thanks!

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should
try to preserve as much information as it can. This means that we should generate sext
for signed IVs and vice-versa. I believe this is a better approach as it preserves the
information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

I'll be happy to review patches making indvars behave better here
(i.e. not "break" loop trip counts like this).

I don't think the IV is the most relevant bit here though -- it looks
like (only a guess) indvars is faltering here:

github.com

llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/IndVarSimplify.cpp#L2240


      
              return false;
          
            const SCEV *Step = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*SE));
            if (!Step || !Step->isOne())
              return false;
          
            int LatchIdx = Phi->getBasicBlockIndex(L->getLoopLatch());
            Value *IncV = Phi->getIncomingValue(LatchIdx);
            return (getLoopPhiForCounter(IncV, L) == Phi);
          }
          
          /// Search the loop header for a loop counter (anadd rec w/step of one)
          /// suitable for use by LFTR.  If multiple counters are available, select the
          /// "best" one based profitable heuristics.
          ///
          /// BECount may be an i8* pointer type. The pointer difference is already
          /// valid count without scaling the address stride, so it remains a pointer
          /// expression as far as SCEV is concerned.
          static PHINode *FindLoopCounter(Loop *L, BasicBlock *ExitingBB,
                                          const SCEV *BECount,
                                          ScalarEvolution *SE, DominatorTree *DT) {

and that logic needs to be made smarter to account for how much the
RHS of the LFTR'ed exit condition is simplified after extension.

Moving it to a different location can be done separately.

Do you agree?

Sounds good!

Thanks!
-- Sanjoy

Sanjoy_Das1 · April 17, 2017, 4:31am

Hi Pankaj,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

Thanks!

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should
try to preserve as much information as it can. This means that we should generate sext
for signed IVs and vice-versa. I believe this is a better approach as it preserves the
information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

I'll be happy to review patches making indvars behave better here
(i.e. not "break" loop trip counts like this).

I don't think the IV is the most relevant bit here though -- it looks
like (only a guess) indvars is faltering here:

github.com

llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/IndVarSimplify.cpp#L2240


      
              return false;
          
            const SCEV *Step = dyn_cast<SCEVConstant>(AR->getStepRecurrence(*SE));
            if (!Step || !Step->isOne())
              return false;
          
            int LatchIdx = Phi->getBasicBlockIndex(L->getLoopLatch());
            Value *IncV = Phi->getIncomingValue(LatchIdx);
            return (getLoopPhiForCounter(IncV, L) == Phi);
          }
          
          /// Search the loop header for a loop counter (anadd rec w/step of one)
          /// suitable for use by LFTR.  If multiple counters are available, select the
          /// "best" one based profitable heuristics.
          ///
          /// BECount may be an i8* pointer type. The pointer difference is already
          /// valid count without scaling the address stride, so it remains a pointer
          /// expression as far as SCEV is concerned.
          static PHINode *FindLoopCounter(Loop *L, BasicBlock *ExitingBB,
                                          const SCEV *BECount,
                                          ScalarEvolution *SE, DominatorTree *DT) {

and that logic needs to be made smarter to account for how much the
RHS of the LFTR'ed exit condition is simplified after extension.

Moving it to a different location can be done separately.

Do you agree?

Sounds good!

-- Sanjoy

pankajchawla · April 17, 2017, 6:10pm

Hi Sanjoy,

Thanks for pointing me in the right direction. I am not really familiar with this piece of code. I will study it and then put up a patch for review.

Thanks,
Pankaj

Sanjoy_Das1 · April 17, 2017, 6:16pm

Btw, I just noticed the triple response; sorry about that. Not sure
what happened -- possibly a combination of operator error and spotty
internet connection.

-- Sanjoy

Topic		Replies	Views
IndVarSimplify too aggressive ? LLVM Dev List Archives	8	159	March 21, 2011
SimplifyIndVar looses nsw flags LLVM Dev List Archives	2	65	June 25, 2013
scalar-evolution + indvars fail to get the loop trip count? LLVM Dev List Archives	1	114	December 9, 2008
induction variables LLVM Dev List Archives	2	87	September 10, 2003
induction variable computation not preserving scev LLVM Dev List Archives	5	109	January 21, 2011

Question on induction variable simplification pass

Related topics