Why not eliminate range checks for non-indvars?

Our current approach to elimination of range checks is very IV-centric. We can remove something like

for (int i = start; i < end; i += step) {
  if (i < 0 || i >= length) {
    // break to error code
  }
  // do some work
}

For that purpose, we use all power of SCEV to deal with range checks. And by now, this stuff is very, very powerful. Range check are very common in Java, but I guess C++ world also faces them a lot. However, it seems that our compiler absolutely ignores range checks against anything that is not IV, no matter how simple they are.

Here is the simple motivating example from Java world:

char array[];
int indices[];
...
if (array.length() < 128)  // [1]
  throw Exception("too short");

for (int index : indices) {
  if (index >= 128) // [2]
    throw Exception("bad index");
  char c = array[index];
  // do some work with c
}

In Java, all array accesses are preceded by range checks. So in fact, before accessing array, there will be a check that index <u array.length(). But in fact, this check can be easily eliminated, because [1] and [2] gives us inequality index < 128 <= array.length().

In IR, the example (slightly more generalized, I used %min_length instead of 128) looks like this:

; Here we can remove range check because:
; length >= min_length (from entry)
; idx < min_length (from in_bounds check)
; therefore, idx < length is trivially true.
define i32 @test_01(ptr %p, ptr %array, i32 %length, i32 %min_length) {
entry:
  %length_check = icmp uge i32 %length, %min_length
  br i1 %length_check, label %loop, label %failed

loop:                                             ; preds = %backedge, %entry
  %iv = phi i32 [ 0, %entry ], [ %iv.next, %backedge ]
  %elem_ptr = getelementptr i32, ptr %p, i32 %iv
  %idx = load i32, ptr %elem_ptr, align 4
  %in_bounds = icmp ult i32 %idx, %min_length
  br i1 %in_bounds, label %range_check_block, label %out_of_bounds

range_check_block:                                ; preds = %loop
  %range_check = icmp ult i32 %idx, %length
  br i1 %range_check, label %backedge, label %range_check_failed

backedge:                                         ; preds = %range_check_block
  %arr_ptr = getelementptr i32, ptr %array, i32 %idx
  store i32 %iv, ptr %arr_ptr, align 4
  %iv.next = add i32 %iv, 1
  %loop_cond = call i1 @cond()
  br i1 %loop_cond, label %loop, label %exit

exit:                                             ; preds = %backedge
  %iv.lcssa = phi i32 [ %iv, %backedge ]
  ret i32 %iv.lcssa

failed:                                           ; preds = %entry
  unreachable

out_of_bounds:                                    ; preds = %loop
  ret i32 -1

range_check_failed:                               ; preds = %range_check_block
  %iv.lcssa.rc = phi i32 [ %iv, %range_check_block ]
  call void @failed_range_check(i32 %iv.lcssa.rc)
  unreachable
}

declare i1 @cond()

declare void @failed_range_check(i32)

The fact we need to prove is a trivial implication. Should there be an induction variable instead of %idx, it would easily be eliminated. However, because we have a range check against non-indvar, we don’t even try to do anything. Should we ask SCEV “can you prove this can be removed”, it would say “yes”, and we’d be happy. But no one even asks SCEV about this.

If we generalize this idea, it may sound like “why don’t we prove all implied conditions basing on their guards/ranges/SCEV knowledge, and only do it for induction variable checks?”

Maybe someone tried this before, but it seems that GVN might be a good place to do it conceptually. Just “if check A dominates check B and we can prove implication A → B, then B can be emilinated”. Proof can be done through SCEV means.

Any concerns why it’s a bad idea?

It’s worth mentioning that there is a ConstraintElimination pass (which is not yet enabled by default), which does reasoning about variable constraints, while other passes like CVP/LVI tend to reason only about constant ranges. cc @fhahn

Regarding SCEV, I think you can already guess the answer, and it’s compile-time. SCEV implied condition reasoning is extremely, embarrassingly slow, and I think every attempt to increase its usage resulted in significant compile-time regressions – and those were generally minor extensions. I don’t think anyone tried doing it yet, but I would expect that performing an implied condition check between every pair of dominating conditions would have catastrophic effects.

I think there needs to be some kind of fundamental change here before this becomes viable. Many parts of SCEV already use applyLoopBounds() instead of known predicates (which ultimately go through implied conditions of loop guards) to perform certain checks, both because it is more powerful in practice (for those kinds of checks), and does not seem to have the same compile-time issues.

I don’t think compile time must be a blocking factor. Off course we don’t want to break default, but let’s think about other non-default pipelines. There can be a separate pass, or a separate mode that says “I don’t care what is compile time, I just want maximum performance”. I agree that we should do all we can to speed up inference engine, but sometimes we just want best code we can get, no matter the price.

BTW I’ve overestimated SCEV, it seems that it’s unable to prove this particular fact. Nevertheless, we can construct simpler examples of this, or improve inference engine to handle this one.