Evan Cheng <evan.cheng <at> apple.com> writes:
> Eli is right. We do need to see some benchmark numbers and understand how the
pass will fit in the target
> independent optimizer. While we encourage contribution, we typically don't
commit new passes unless it
> introduce new functionalities that have active clients. It would also help if
you provide us with compile
> time numbers.
Let's restart this conversation ...
I propose that I test OSR versus LSR compile time performance via
opt shell.ll -mem2reg -licm -osr -instcombine -stats -time-passes -disable-output
opt shell.ll -mem2reg -licm -loop-reduce -instcombine -stats -time-passes -disable-output
Caveats which may skew any compile time testing:
1. LST is target dependent and was developed to run closer to code generation time. Thus, we are testing LSR out of its normal context. Also as mentioned previously, -instcombine is not normally called after LSR (though LSR does create opportunities for -instcombine)
2. OSR finds loops in the SSA graph and will find more reduction opportunities than LSR. Thus, OSR may spend more time doing strength reduction, since OSR does more work.
3. From Dan Gohman's reply, "LSR's primary goal is to reduce integer register pressure, with strength reduction being just one of its available tools." Thus, LSR may be doing more than just strength reduction.
4. It is nontrivial to compare OSR and LSR optimizations in a given program, especially the larger tests in the test suite where the compile time differences may be the largest.
These caveats makes it a bit like comparing apples and oranges.
I created a simple test for the shell sort with a triple nested loop. I hand verified that both OSR and LSR performed the same strength reductions. Here are the results:
% opt shell.ll -mem2reg -licm -osr -instcombine -stats -time-passes -disable-output