Rewriting compare instructions to avoid materializing previous induction variable

I've noticed cases where LSR generates IR like the following:

for.cond: ; preds = %for.body = add nuw nsw i64 %indvars.iv, 1 ;; i++
  %2 = add i64, -1 ;; previous i for cmp
  %tmp = trunc i64 %2 to i32
  %cmp = icmp slt i32 %tmp, %0 ;; i < e
  br i1 %cmp, label %for.body, label %for.end.loopexit

Basically, the comparison is happening after the induction variable is
incremented, so LSR derives the previous induction variable by subtracting
1. (Without LSR we actually use a register to save the previous value of
the induction variable, so I think deriving the value from the incremented
induction variable is goodness; no need to keep a register live across
loop iterations).

For my test case (on AArch64), we generates assembly like this:

        ldr w12, [x10, x11, lsl #2]
        cbz w12, .LBB0_4
        add x11, x11, #1
        sub w12, w11, #1
        cmp w12, w9 .LBB0_2

However, I believe this is equivalent to:

        ldr w12, [x10, x11, lsl #2]
        cbz w12, .LBB0_4
        add x11, x11, #1
        cmp w11, w9
        b.le .LBB0_2

We transform the comparison from (i < e) -> (i+1 <= e), so that we don't
have to materialize the previous value of i.

If my assumptions are correct, my question is how should this be
implemented? My first thought was to try something in CodeGenPrepare (as
LSR is run rather late), but I have limited experience with this pass.
Alternatively, I think I could write this as an InstCombine, which I
believe will be called by the CodeGenPrepare pass.



FWIW, InstCombineCmp already has a similar solution, but it isn't able to
handle the intervening trunc. Working on a fix now. :smiley: