LoopStrengthReduce.cpp

Hi,

I am looking for a way to rewrite induction variables to use an addition of -1 whenever possible (and not otherwise unprofitable). This is needed to utilize hardware loop instructions, which are present on SystemZ (branch on count). Later in the backend, an 'add -1; compare w/ 0; jne 0'-sequence can be replaced with a brct instruction.

I could not find any way in the LSR pass to generate a formula for this (neither to express the lower target cost). I would like to generate a formula with a step of -1 and make the cost for it to be less than that of a similar formula with another constant.

Could someone please give me any clue as to if there is a way to currently do this, or if not, what would be the normal way?

thanks,

Jonas paulsson

Hi Jonas,

Are you talking specifically about the induction variable? You might look at what I did for PowerPC's counter-based loops (lib/Target/PowerPC/PPCCTRLoops.cpp, etc.).

-Hal

Hi Hal,

yes, it's all about the induction variable. SystemZ has a late pass (pre-emit) that looks for MI sequences that can be rewritten to 'branch on count'. Currently only about half the number of BRCTs are output compared to gcc on the same benchmarks. One reason for this is that when a loop gets unrolled, the loop gets a greater increment / decrement than 1, which makes the late peephole transformation not work.

Since LSR generates formula for induction variables and evaluates them, I thought this pass could somehow be made to generate and emit preferred forms of induction variables. This seems like a good idea since this should generally help all targets with similar h/w loop instructions that begins with the trip count in a register and decrements and loops until zero.

Could this be done somehow, or is it really so that all targets have to have their own passes to do this?

/Jonas

In the Hexagon backend we also have a separate pass that converts compare+branch loops into hardware loops. We recognize several different patterns of the controlling induction variable, including cases where the increment is not 1 or -1. If we decide to generate a hardware loop, we may add some instructions to calculate the iteration count, if it's not already available.

The general problem for us is that not every loop can be converted to a hardware loop. At the same time, making each loop iterate over a counter that decrements down to 0 may add extra code that may end up being unnecessary. It is easier for us to recognize cases where we want a hardware loop and do work to make it happen.

On that note, I think that in general it would be useful to have some target-independent (CodeGen) pass that would do the majority of the work for hardware loop generation. I have thought about it, but I won't be able to do anything in the short term.

-Krzysztof

On that note, I think that in general it would be useful to have some target-independent (CodeGen) pass that would do the majority of the work for hardware loop generation. I have thought about it, but I won't be able to do anything in the short term.

-Krzysztof

I think a first and useful step would be to let targets optionally have the loop induction variable which controls the back-branching be reformulated as a decrement towards zero with a -1 step. Should this be an extension of LSR, or should it be a simple beginning of a HW-loop pass running after it?

/Jonas

Hi Jonas,

Are you talking specifically about the induction variable? You might look at what I did for PowerPC's counter-based loops >(lib/Target/PowerPC/PPCCTRLoops.cpp, etc.).

-Hal

Hi Hal,

thanks for the pointer! I have made a pass based on PPCCTRLoops.cpp for review on http://reviews.llvm.org/D18900. I took most of the code straight, but simplified it a bit to use the L->getLatch() method, which made more sense to me. I would appreciate any suggestions / comments.

>On that note, I think that in general it would be useful to have some target-independent (CodeGen) pass that would do the majority of the work for >hardware loop generation. I have thought about it, but I won't be able to do anything in the short term.
>
>-Krzysztof

I guess either of PPCCTRLLoops or SystemZBRCTLoops pass could potentially easily form the beginning of such a loop pass.

LSR seemed to me to be too complex already to dare try and add a lot of more new functionality to it. It seems to only care about compare with zero right
now, and I couldn't see an obvious way to make it generate a formula for any compare with step -1 to 0. Or did I miss something here?

/Jonas