Speculative phi elimination at the top of a loop?

I am working on heavily optimising unusually static C++ code, and have encountered a situation where I basically want an optimiser that would speculatively unroll a loop to see if the first round of the loop could be optimised further. (I happen to know that it is possible.) The previous optimisations that produce the loop in the first place already do a magical job (relying heavily on constant propagation), transforming cross-class tail recursion into the loop that I am now addressing. Hence, there is probably little than can be done in the previous optimisations.

So, has anyone worked on an optimisation where the optimiser unrolls a loop so a way that allows it to speculatively try if the first round can be further simplified?

In my case, the loop actually calls a C++ virtual method, but since the instance object is a *constant* on the first round of the loop, it is possible to resolve the method call into a static function, and then inline the static function, which in this case essentially eliminates the first round of the loop.

Here is an example. Let me have a simple loop bb.i, calling a C++ inner class virtual method in SSA:

Hi,

The combination of GVN-PRE and SCCVN in GCC will do this, and we
actually block it from doing this in a lot of cases.

Basically, it can discover some value is constant on the first run
through a loop, and will create additional phi nodes to represent
those values.

In most cases, this is not that valuable (IE it would discover i = i +
1 is 2 on the first run through the loop, and replace it with a phi of
(2, <new calculation>) , and in fact, was significantly confusing to
SCEV analysis. So we blocked it. In your case, it would be valuable
however.

--Dan

Hi Devang,

I am working on heavily optimising unusually static C++ code, and have encountered a situation where I basically want an optimiser that would speculatively unroll a loop to see if the first round of the loop could be optimised further. (I happen to know that it is possible.) The previous optimisations that produce the loop in the first place already do a magical job (relying heavily on constant propagation), transforming cross-class tail recursion into the loop that I am now addressing. Hence, there is probably little than can be done in the previous optimisations.

So, has anyone worked on an optimisation where the optimiser unrolls a loop so a way that allows it to speculatively try if the first round can be further simplified?

In my case, the loop actually calls a C++ virtual method, but since the instance object is a constant on the first round of the loop, it is possible to resolve the method call into a static function, and then inline the static function, which in this case essentially eliminates the first round of the loop.

The combination of GVN-PRE and SCCVN in GCC will do this, and we
actually block it from doing this in a lot of cases.

Basically, it can discover some value is constant on the first run
through a loop, and will create additional phi nodes to represent
those values.

In most cases, this is not that valuable (IE it would discover i = i +
1 is 2 on the first run through the loop, and replace it with a phi of
(2, ) , and in fact, was significantly confusing to
SCEV analysis. So we blocked it. In your case, it would be valuable
however.

I would suggest trying to do this on any pointer-typed const-expr. Thoughts?

Nick

Would the best way be to add an option to -loop-unroll, and hack away at lib/Transforms/Utils/LoopUnroll.cpp?

Instead, the better alternative is to write another pass similar to
LoopUnrollPass.cpp (say LoopPeelPass.cpp) and add new option
-loop-peel. The new pass could use llvm::UnrollLoop() utility
function. Feel free to adjust this utility function if required, but
the core utility function should be used by both passes

It took some time (mainly because of the rest of the project), but I have now an early version of LoopPeelPass.cpp implemented. It uses the llvm::UnrollPass() utility function; I added one argument, "bool Peeling", to the UnrollPass() function, with the default value of "false".

There still some problems related to the block merging in the end of llvm::UnrollPass(), but otherwise it seems to work.

I'm afraid it will take some more time before I have a good and clean patch ready, but in the mean time if anyone would like to review the changed, I'd be more than happy to send a diff off-list.

--Pekka