Question on induction variable simplification pass

Hi all,

It looks like the induction variable simplification pass prefers doing a zero-extension to compute the wider trip count of loops when extending the IV. This can sometimes result in loss of information making ScalarEvolution’s analysis conservative which can lead to missed performance opportunities.

For example, consider this loopnest-

int i, j;

for(i=0; i< 40; i++)

for(j=0; j< i-1; j++)

A[i+j] = j;

We are mainly interested in the backedge taken count of the inner loop.

Before indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-

backedge-taken count is 39

max backedge-taken count is 39

Inner loop-

backedge-taken count is {-2,+,1}<%for.cond1.preheader>

max backedge-taken count is 37

After indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-

backedge-taken count is 39

max backedge-taken count is 39

Inner loop-

backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> to i64))

max backedge-taken count is -1

If we generate sext instead of zext, the information computed by ScalarEvolution is as follows-

Outer loop-

backedge-taken count is 39

max backedge-taken count is 39

Inner loop-

backedge-taken count is {-2,+,1}<%for.cond1.preheader>

max backedge-taken count is -2

We now have a simplified backedge taken count for the inner loop which is almost as precise as before indvars, except for the flag instead of flag. I think ScalarEvolution should be able to precisely deduce wrap flags in the presence of sext but this may require a separate fix. The reason for the conservative max backedge taken count may be the same.

Thanks,

Pankaj

[+Sanjoy]

The fact that we lose information by widening during induction-variable simplification is certainly a known problem. I don’t recall if we’ve ever really decided on a path forward. I personally suspect that, as an information-destroying transformation, the widening should be moved to the lowering phase (i.e. near where we do vectorization, etc.). Unless the widening itself enables other transformations, I don’t see why we should do it early. The one exception I can think of is where it might enable us to collapse redundant PHIs, as is:

int i = 0; long j = 0;
for (; i < n; ++i, ++j) { … using i and j … }

but that seems like a special case we could handle separately.

 -Hal

Hi Pankaj,

It looks like the induction variable simplification pass prefers doing a zero-extension
to compute the wider trip count of loops when extending the IV. This can sometimes result
in loss of information making ScalarEvolution's analysis conservative which can lead
to missed performance opportunities.

For example, consider this loopnest-

int i, j;
for(i=0; i< 40; i++)
for(j=0; j< i-1; j++)
A[i+j] = j;

We are mainly interested in the backedge taken count of the inner loop.

Before indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-
backedge-taken count is 39
max backedge-taken count is 39

Inner loop-
backedge-taken count is {-2,+,1}<%for.cond1.preheader>
max backedge-taken count is 37

After indvars, the backedge information computed by ScalarEvolution is as follows-

Outer loop-
backedge-taken count is 39
max backedge-taken count is 39

Inner loop-
backedge-taken count is (-1 + (zext i32 {-1,+,1}<%for.cond1.preheader> to i64))
max backedge-taken count is -1

One approach is to use the facts:

- The inner loop will not be entered in the 0th iteration of
<%for.cond1.preheader>
- {-1,+,1}<%for.cond1.preheader> is s< 40

to simplify the above to {-2,+,1}<%for.cond1.preheader> (in i64). The
original expression was not -2 in the 0th iteration of
<%for.cond1.preheader>, but we don't care about that iteration of
<%for.cond1.preheader> since we won't enter the inner loop.

The other option is to widen of IVs late, as a "lowering"
transformation, like Hal said. That's a more invasive change, but if
you have time and resources, it would be nice to at least give it a
shot, measure and see what falls over.

If we generate sext instead of zext, the information computed by ScalarEvolution is
as follows-

Outer loop-
backedge-taken count is 39
max backedge-taken count is 39

Inner loop-
backedge-taken count is {-2,+,1}<%for.cond1.preheader>
max backedge-taken count is -2

We now have a simplified backedge taken count for the inner loop which is almost as precise
as before indvars, except for the flag instead of flag. I think ScalarEvolution

(JFYI: My mail client's compose ate the <nsw> and <nw>)

Can you please share the IR that you piped through SCEV?

My guess is that SCEV did not "try to" infer a more aggressive no-wrap
flag for {-2,+,1} -- most of the no-wrap inferring logic kicks in when
you try to sign/zero extend an add recurrence.

One suspicious bit here is the "max backedge-taken count is -2" bit.
I expected SCEV to have inferred the max trip count to be 37 in
this second case.

-- Sanjoy

Hi Sanjoy,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should try to preserve as much information as it can. This means that we should generate sext for signed IVs and vice-versa. I believe this is a better approach as it preserves the information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

Moving it to a different location can be done separately.

Do you agree?

Thanks,
Pankaj

indvars.ll (1.4 KB)

Hi Pankaj,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

Thanks!

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should
try to preserve as much information as it can. This means that we should generate sext
for signed IVs and vice-versa. I believe this is a better approach as it preserves the
information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

I'll be happy to review patches making indvars behave better here
(i.e. not "break" loop trip counts like this).

I don't think the IV is the most relevant bit here though -- it looks
like (only a guess) indvars is faltering here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/IndVarSimplify.cpp#L2240
and that logic needs to be made smarter to account for how much the
RHS of the LFTR'ed exit condition is simplified after extension.

Moving it to a different location can be done separately.

Do you agree?

Sounds good!

Thanks!
-- Sanjoy

Hi Pankaj,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

Thanks!

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should
try to preserve as much information as it can. This means that we should generate sext
for signed IVs and vice-versa. I believe this is a better approach as it preserves the
information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

I'll be happy to review patches making indvars behave better here
(i.e. not "break" loop trip counts like this).

I don't think the IV is the most relevant bit here though -- it looks
like (only a guess) indvars is faltering here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/IndVarSimplify.cpp#L2240
and that logic needs to be made smarter to account for how much the
RHS of the LFTR'ed exit condition is simplified after extension.

Moving it to a different location can be done separately.

Do you agree?

Sounds good!

Thanks!
-- Sanjoy

Hi Pankaj,

I have attached the IR I got by compiling with -O2. This is just before we widen the IV.

Thanks!

To get the backedge taken count info I ran indvars on it and then replaced zext with sext.

I think regardless of where we decide to add this transformation in the pipeline, it should
try to preserve as much information as it can. This means that we should generate sext
for signed IVs and vice-versa. I believe this is a better approach as it preserves the
information directly in the IR as opposed to relying on ScalarEvolution to deduce it.

I'll be happy to review patches making indvars behave better here
(i.e. not "break" loop trip counts like this).

I don't think the IV is the most relevant bit here though -- it looks
like (only a guess) indvars is faltering here:
https://github.com/llvm-mirror/llvm/blob/master/lib/Transforms/Scalar/IndVarSimplify.cpp#L2240

and that logic needs to be made smarter to account for how much the
RHS of the LFTR'ed exit condition is simplified after extension.

Moving it to a different location can be done separately.

Do you agree?

Sounds good!

-- Sanjoy

Hi Sanjoy,

Thanks for pointing me in the right direction. I am not really familiar with this piece of code. I will study it and then put up a patch for review.

Thanks,
Pankaj

Btw, I just noticed the triple response; sorry about that. Not sure
what happened -- possibly a combination of operator error and spotty
internet connection.

-- Sanjoy