I tried using the opt pipeline option but it doesn’t show which pass transforms the code like this, only that it’s still good in the output of LoopRotatePass but rewritten in the input of the following SimplifyCFGPass
Any thoughts on what’s going on here and why LLVM (much like the other compilers, except for GCC with -Os) seems to be hell-bent on duplicating the loop body in a basic strcmp implementation?
I think you’re misreading the opt pipeline output. It looks like loop rotation to me. (The output just isn’t showing the code copied outside the loop.)
OK, once I reluctantly enable the “Dump Full Module” option and find the corresponding LoopRotatePass, I can indeed see the change being done there.
So anyway, some quick googling tells me this is meant for vectorization? I have many questions about what’s even supposed to happen there but primarily I’m wondering where does one report this to get it looked at? The bug reporting page seems to be only about crashes (no mention of performance issues).
We also track missed optimizations at Issues · llvm/llvm-project · GitHub if you have a reduced testcase, although they often remain unfixed…
Loop rotation generally makes analyzing a loop easier; it simplifies the control flow inside the loop, and means that more of the loop is executed every iteration. This helps vectorization, but also other optimizations like LICM. So LLVM tends to rotate loops pretty aggressively.
The problem for your particular testcase is that all the logic ends up inside the loop header. Instead of cloning the header, we should just move the “add” instruction into the header.