Does LLVM hoist bounds checks out of inner loops?
LLVM does loop unswitching, so yes in some cases it does.
-Chris
LLVM itself doesn’t do any bounds checks, but if the language front end puts in any, they may be optimized (and hoisted) just as other comparison operators and branches might. In other words, the LLVM passes don’t recognize any bounds checks specifically.
Does LLVM hoist bounds checks out of inner loops?
In practice, LLVM is frequently going to have conservative behaviors which will prevent hoisting the explicit bounds checks your front-end must generate. Specifically, LLVM must use alias analysis to disprove the hypothesis that the array length variable is updated by the body of the loop. Consider:
struct array { int length; int values; };
extern void x(void);
void f(struct array *arr) {
for (int i = 0; i < arr->length; ++i) {
if (i < 0 || i >= arr->length)
throw; // trivially unreachable
x();
if (i < 0 || i >= arr->length)
throw; // unreachable only if ‘x() changed arr->length’ is provably false
}
}
You can make alias analysis stronger by:
• using -anders-aa instead of -basic-aa
• performing link-time optimization
• using the new ‘readonly’ and ‘readnone’ function attributes whenever possible
Since you’re working with a functional language, you may be able to know that ‘arr’ itself and ‘arr.length’ are both immutable (based on properties of your language/object model). If so, use only a single SSA value for each rather than 'load’ing them over again at each use. This eliminates the dependency on alias analysis (at the cost of increased register pressure). With that out of the way, it becomes a matter of building a pass pipeline which either hoists or eliminates the checks. opt -mem2reg -gvn -predsimplify -licm -dce -simplifycfg may be a starting point for elimination. Any of the passes in lib/Transforms/Scalar which include the string ‘SCEV’ may be helpful for hoisting.
— Gordon
Brilliant, thanks. I'm extremely busy ATM but my plan is to implement a
compiler for a pure first-order language and use it to benchmark various
different approaches. I might even reuse my ray tracer benchmark for
this...