RFC: inductive range check elimination via iteration space splitting

Hi all,

I've been working on a pass that tries to eliminate range checks like
this:

  void foo(int *a, int n, int a_len) {
    assert(a_len >= 0);
    for (int i = 0; i < n; i++ /* wrapping add */) {
      if (0 <= (i - 2) && (i - 2) < a_len) {
        a[i - 2] = 42;
      } else {
        out_of_bounds_error(i + 2); // noreturn
        unreachable;
      }
    }
  }

Since n is unrelated to a_len in the above loop, we cannot eliminate
the range checks in the loop as-is. But if we split the iteration
space for `i` into:

[0, min(n, 2)) \cup [min(n, 2), min(a_len + 2, n)) \cup [min(a_len + 2, n), n).

(with the convention that for "a >= b", [a, b) is an empty set) we can
elide the range checks in the middle split. We realize this split
iteration space by breaking up the original loop into three loops; and
only have the first and third loop have range checks.

This approach can be extended to range checks on expressions like (P +
Q * I) where I is the canonical induction variable -- there will be a
set of disjoint contiguous ranges for which (P + Q * I) is provably
within bounds. If one of those contiguous ranges can be "naturally
split out" from I's normal iteration space, we can do so and elide the
range checks in that subspace. It can also be extended to multiple
range checks in the loop body.

The first question I have: am I reinventing the wheel here? Has this
approach been tried in LLVM and discarded because of fundamental
issues? Are there better ways to eliminate range checks on affine
functions on the canonical induction variable? The closest thing I
could find to this is the LoopIndexSplit pass which was removed
in 2010.

Implementing this requires some design choices I'd like input on:

  1. computing the splits in the iteration range -- the math is
     slightly tricky because we have to deal with wrapping addition,
     but in principle this can be done. The output of this phase is
     to split the iteration space of the canonical IV into [b0, e0)
     \cup [e0, b2) \cup [b2, e2).

  2. code to transform the loop -- I could not find any helper
     functions in llvm to base this on. As far as I can tell there
     are two approaches:

      a. write a helper function that creates three (maybe two in some
         cases) copies of the loop and constrains their ranges as
         needed.

      b. abuse the loopunroll pass -- instead of creating the three
         loops "by hand", wrap the loop to be transformed into an
         outer loop that runs exactly thrice, add an exit from the
         inner to the outer loop on

            (outer_i == 0 && inner_i \in [b0, e0)) ||
            (outer_i == 1 && inner_i \in [e0, b2)) ||
            (outer_i == 2 && inner_i \in [b2, e2))

(the expression can be optimized, but semantically this is
what we want)

and ask llvm::UnrollLoop to completely unroll this outer
loop.

     The advantage of (a) is that it feels cleaner; and potential
     code-sharing with loop unrolling and loop unswitching. (b)'s
     advantages are simplicity, but it feels like somewhat of a hack;
     and we still have the problem of transforming (a < x && a < y) to
     (a < min(x, y)). I'm personally leaning towards (a).

  3. eliding the range checks -- once the transform has been done, we
     need to teach llvm to remove the now redundant range checks.
     LLVM does a lot of this already, but it may need some more help.
     In any case, there is a design choice:

       a. run a pass (passes?) that eliminates the redundant range
          checks from loop bodies

       b. have the transform elide the range checks itself

     (b) is faster while (a) is safer. What is the idiomatic solution
     here?

Thanks,
-- Sanjoy