I wonder what we can do about this in a general sense. As far as I can tell, the jump threading algorithm is really conservative, which is one reason this isn’t working as well as I’d hope; however, we don’t want to produce irreducible control flow that the other passes would work less effectively on. Thoughts?
I wonder what we can do about this in a general sense. As far as I can tell, the jump threading algorithm is *really* conservative, which is one reason this isn't working as well as I'd hope; however, we don't want to produce irreducible control flow that the other passes would work less effectively on. Thoughts?
Just brainstorming, really, but what about running a less conservative
version of JumpThreading later, _after_ loop passes have run?
I'm confused. In your godbold link you run it with O0 which disables
almost all transformations. If we take the IR, remove the optnone
attribute, run mem2reg and jump-threading we get what I think is
reasonable: Compiler Explorer
Please correct me if I misunderstand anything here.
Here’s a better example. https://godbolt.org/z/fpTyFS
I don’t know what exactly you would need to change in JumpThreading to improve this, but I’m open to ideas.
Well ideally it’d do the same thing as before (with two if statements). Something like:
ret i64 4
It’s not really about needing to be as good as gcc, it’s more that I wonder why the same concept with just two if-else blocks gets optimized yet more doesn’t.
An interesting point to make is that I don’t think it’s jump threading optimizing the original example fully: https://godbolt.org/z/rGvHJk
I am not convinced this is a jump-threading issue, but in the domain
of ScalarEvolution. ScEv would need to find out which of the exits are
taken or whether it loops infinitely. One can see that it exits after
10000 iterations, but ScalarEvolution cannot tell:
Ah, I see. I think there’s something else going on here too, though. https://godbolt.org/z/dCqdvv is optimized away but ScEv doesn’t know whether this loop terminates either.
Wait, you used the same example as I did. I’m confused then; if ScEv is having troubles but it still gets optimized away somewhere, what do you think is doing it?
Hm. I assumed that JumpThreading would be the primary factor in optimizing code like this. Guess not. I’ll need to look into SimplifyCFG to see what prevents it from doing the same thing to the other loop: https://godbolt.org/z/F6NjdG
I think SimplifyCFG could be abusing the undef value for this. If num is defined before the loop, it no longer becomes optimized: https://godbolt.org/z/woHRQf
Your code and the one you have seen in https://bugs.llvm.org/show_bug.cgi?id=44679 are structurally
different. In that latter, setting n to a constant value causes a
branch taken in the next iteration. Your's is a more classic for-loop
from 0 to 2. It is ScalarEvolution's job to find this out. SimplyCFG
job is just to normalize the IR into a form understood by
ScalarEvolution. That is, ScalarEvolution is the primary engine
enabling this.
Ok, thanks for the clarification. So looking into ScalarEvolution is what I’d look into if I wanted to optimize this a bit more? I’m mainly interested in how I’d go about “de-looping” this.
Ideally, with only 2 loop iteration with different code being executed
per execution, it is fully unrolled (-loop-unroll). But for this to
happen, LoopUnroll needs to know the iteration count from
ScalarEvolution. Your code is structured a bit differently than the
typical for-loop, making it a bit more difficult. Canonicalization
(-simplifycfg -loop-rotate -loop-simplify -indvarsimplify) should make
it recognizable.