Hi all,

I've been working on a pass that tries to eliminate range checks like

this:

void foo(int *a, int n, int a_len) {

assert(a_len >= 0);

for (int i = 0; i < n; i++ /* wrapping add */) {

if (0 <= (i - 2) && (i - 2) < a_len) {

a[i - 2] = 42;

} else {

out_of_bounds_error(i + 2); // noreturn

unreachable;

}

}

}

Since n is unrelated to a_len in the above loop, we cannot eliminate

the range checks in the loop as-is. But if we split the iteration

space for `i` into:

[0, min(n, 2)) \cup [min(n, 2), min(a_len + 2, n)) \cup [min(a_len + 2, n), n).

(with the convention that for "a >= b", [a, b) is an empty set) we can

elide the range checks in the middle split. We realize this split

iteration space by breaking up the original loop into three loops; and

only have the first and third loop have range checks.

This approach can be extended to range checks on expressions like (P +

Q * I) where I is the canonical induction variable -- there will be a

set of disjoint contiguous ranges for which (P + Q * I) is provably

within bounds. If one of those contiguous ranges can be "naturally

split out" from I's normal iteration space, we can do so and elide the

range checks in that subspace. It can also be extended to multiple

range checks in the loop body.

The first question I have: am I reinventing the wheel here? Has this

approach been tried in LLVM and discarded because of fundamental

issues? Are there better ways to eliminate range checks on affine

functions on the canonical induction variable? The closest thing I

could find to this is the LoopIndexSplit pass which was removed

in 2010.

Implementing this requires some design choices I'd like input on:

1. computing the splits in the iteration range -- the math is

slightly tricky because we have to deal with wrapping addition,

but in principle this can be done. The output of this phase is

to split the iteration space of the canonical IV into [b0, e0)

\cup [e0, b2) \cup [b2, e2).

2. code to transform the loop -- I could not find any helper

functions in llvm to base this on. As far as I can tell there

are two approaches:

a. write a helper function that creates three (maybe two in some

cases) copies of the loop and constrains their ranges as

needed.

b. abuse the loopunroll pass -- instead of creating the three

loops "by hand", wrap the loop to be transformed into an

outer loop that runs exactly thrice, add an exit from the

inner to the outer loop on

(outer_i == 0 && inner_i \in [b0, e0)) ||

(outer_i == 1 && inner_i \in [e0, b2)) ||

(outer_i == 2 && inner_i \in [b2, e2))

(the expression can be optimized, but semantically this is

what we want)

and ask llvm::UnrollLoop to completely unroll this outer

loop.

The advantage of (a) is that it feels cleaner; and potential

code-sharing with loop unrolling and loop unswitching. (b)'s

advantages are simplicity, but it feels like somewhat of a hack;

and we still have the problem of transforming (a < x && a < y) to

(a < min(x, y)). I'm personally leaning towards (a).

3. eliding the range checks -- once the transform has been done, we

need to teach llvm to remove the now redundant range checks.

LLVM does a lot of this already, but it may need some more help.

In any case, there is a design choice:

a. run a pass (passes?) that eliminates the redundant range

checks from loop bodies

b. have the transform elide the range checks itself

(b) is faster while (a) is safer. What is the idiomatic solution

here?

Thanks,

-- Sanjoy