missing simplification in ScalarEvolution?

Hi,

I have this small test case-

%struct1 = type { i32, i32 }

@glob_const = internal constant [4 x %struct1] [%struct1 { i32 4, i32 5 }, %struct1 { i32 8, i32 9 }, %struct1 { i32 16, i32 0 }, %struct1 { i32 32, i32 10 }], align 16

define void @foo() {
entry:
  br label %loop

loop: ; preds = %loop, %entry
  %iv = phi %struct1* [ getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 0, i64 0), %entry ], [ %iv.inc, %loop ]
  %gep = getelementptr inbounds %struct1, %struct1* %iv, i64 0, i32 0
  %ld = load i32, i32* %gep, align 8
  %iv.inc = getelementptr inbounds %struct1, %struct1* %iv, i64 1
  %cmp = icmp ult %struct1* %iv.inc, getelementptr inbounds ([4 x %struct1], [4 x %struct1]* @glob_const, i64 1, i64 0)
  br i1 %cmp, label %loop, label %exit

exit:
  ret void
}

I was expecting the backedge taken count of the loop to be evaluated to 3 by ScalarEvolution but instead it evaluates to this-

Loop %loop: backedge-taken count is ((-1 + (-1 * @glob_const) + ((8 + @glob_const)<nuw><nsw> umax (32 + @glob_const)<nsw>)) /u 8)
Loop %loop: max backedge-taken count is 3, actual taken count either this or zero.

Can we deduce the final value of the IV (32 + @glob_const) to be greater than the initial value (8 + @glob_const) and simplify the backedge taken count to 3?

Thanks,
Pankaj

Try adding a datalayout with pointer size information to your reduced case, and see what happens. Not sure this is your problem, but I've been bitten by this before with hand reduced examples...

Philip

Thanks for the suggestion but datalayout info did not solve the problem!

-Pankaj

I didn't step through this in a debugger, but I suspect we could fix
the problem by:

1. Teaching SCEV to infer nuw for (32 + @glob_const)<nsw> (if legal)
2. Generalize ScalarEvolution::isKnownPredicateViaNoOverflow to work
with this case

That should fold the umax and give us a more precise trip count.

-- Sanjoy

Hi Sanjoy,

Thanks for the reply!
Your approach sounds good to me!

I think 1) is legal as address wraparound in unsigned range doesn't make sense given a positive offset, but I am not sure.
I think umax will not be added if we can prove the predicate as known.
I am not sure whether umax will get simplified if we add nuw to the expressions.

-Pankaj

Hi Sanjoy,

Thanks for the reply!
Your approach sounds good to me!

I think 1) is legal as address wraparound in unsigned range doesn't make sense given a positive offset, but I am not sure.

Non-inbounds GEPs are just modular arithmetic so it is ok for a
non-inbounds GEP to have unsigned / signed wrap. So at least the GEP
will have to be an inbounds GEP. And you'll also have to check
`isSCEVExprNeverPoison`.

-- Sanjoy

Okay, that makes sense.
Thanks for the helpful tips!