I and my team also observe improvements in our internal benchmarks.
And it is worth to mention that this change does not strongly affect compilation time: we see 3.65% (for LSR pass) in worst case.
What is your opinion: may default value of ComplexityLimit be changed (at least for arm/aarch64), or it is better to leave it as it is for saving compilation time?
I think the performance looks promising as the LSR pass is doing a
heuristic search on possible fix-ups. Do you also have data for
compile time increase when ComplexityLimit is increased?
I seem to remember I played with the limit because we aggressively unroll for Arm Cortex-M and LSR just wasn’t doing as well as it could with larger loops. I can’t remember the kind of gains we saw, but we stuck with a higher limit, though I can’t remember what, for our downstream toolchain.
I think it’s definitely worth exploring increasing the default.
Experiment on the RISC-V backend with increased complexity limit shows worse performance than the original limit. I suspect there is still problems for IR transformed by LSR for RISC-V, causing some overwrites to make things worse.
The overall performance decreased by 0.19%. With sjeng decreasing the most with 0.40%. From the perspective of RISC-V backend developer, I would discourage the default change for now until we find better codegen in the backend.