"trunc"s generated by LSR cause problem for SCEV

Hi

I am working on a bug that is caused by Scalar Evolution not being able to compute the iteration count of an unrolled loop (PR 28363). While I believe there is enough information for SCEV to do its job, I think the code that is generated by earlier transformations can be simpler. There is one bug in IndVarSimplify for which Sanjoy Das suggested a fix. With that fix if I disable loop strength reduction the problem is fixed. Below I have copied the code before and after loop strength reduction.

For this code pattern, it is possible to prove that truncs generated by LSR can be avoided (see bottom of the email). Andy Trick says that LSR generally thinks that trunc is free, but there might be ways to work around it or improve LSR target hooks.

1- Does anyone has any suggestion on how to fix this in LSR?

2- Any reason that we should not fix LSR, and instead focus on Scalar Evolution so it can handle more complicated code patterns properly?

Before LSR:

for.body.preheader:
%xtraiter = and i32 %m, 7

for.body.preheader.new:
%unroll_iter = sub i32 %m, %xtraiter

for.body:
%niter = phi i32 [ %unroll_iter, %for.body.preheader.new ], [ %niter.nsub.7, %for.body ]
%indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.7, %for.body ]
%indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
%niter.nsub.7 = add nsw i32 %niter, -8
%niter.ncmp.7 = icmp eq i32 %niter.nsub.7, 0

After LSR:

for.body.preheader:
%xtraiter = and i32 %m, 7

for.body.preheader.new:
%unroll_iter = sub i32 %m, %xtraiter
%2 = zext i32 %unroll_iter to i64

for.body:
%indvars.iv = phi i64 [ 0, %for.body.preheader.new ], [ %indvars.iv.next.7, %for.body ]
%indvars.iv.next.7 = add nsw i64 %indvars.iv, 8
%tmp = trunc i64 %indvars.iv.next.7 to i32
%tmp80 = trunc i64 %2 to i32
%niter.ncmp.7 = icmp eq i32 %tmp80, %tmp

Why trunc is not needed: %indvars.iv starts from 0 and increments by 8. %2 is divsible by 8. If indvars.iv.next.7 ever reaches a value, which has a non-zero bit in its upper 32 bits, it will repeat that pattern until it overflows. But the definition of %indvars.iv.next.7 is marked nsw.

Adding a couple of points just to make sure I have been clear:

1- Without any trunc the code after LSR will directly compare %2 and %indvars.iv.next.7 in the loop control logic.

2- The argument for why trunc is not needed basically says that if we compare %2 and %indvars.iv.next.7, the loop will finish while the upper 32 bits of %indvars.iv.next.7 are still all zero. So the behavior remains the same as the current behavior.

I am going to look into LSR a little bit to see if I can teach it not to generate those truncs. If those truncs are needed for some reason, please let me know.