Lowering For Loops to use architecture "loop" instruction

Hi,

I’m working on project which involves writing a backend for a hypothetical architecture. I am currently trying to figure out the best way to translate for loops to use a specialized “loop” instruction the architecture supports. The instruction is similar X86’s loop instruction, where a register is automatically decremented and the condition is automatically checked to see if loop execution should continue.

I was wondering what is the best way to go about implementing this. I tried looking to see how X86 implemented their loop instruction, but couldn’t really find where the IR instructions were being lowered to the loop instruction.

It seems like there is no way to really express this loop instruction in a pattern, since the loop instruction depends on a set of instructions that occur in different locations (induction variable initialization, the condition calculation, and the branch).

Right now, I thought of using the loop pass to somehow mark the instructions that should be lowered to a loop instruction, and translating them in the selection phase, but I’m not entirely sure how to do that or if that is even the right strategy.

Thanks,
-Dilan Manatunga

This sounds like it's not something the TableGen patterns could
handle, but ought to be fairly easy during ISelDAGToDAG. You'd be
looking for a something like "(br_cc SETGT, (sub $iter, 1), 0,
$GoAgain)". The C++ code is needed because your final LOOP instruction
also has an output (induction variable for the next iteration) that
needs to be propagated from the "(sub $iter, 1)".

Some very vague and untested pseudo-code would be:

    // To produce a (LOOP Chain, InductionVariable, Dest)
    SDNode *SelectLOOPFromBR_CC(SDNode *N) {
      // TODO: check whatever conditions (sub 1, SETGT, ..).
      SDValue IV = N->getOperand(1);
      SDNode *LOOP = CurDAG->getMachineNode(XYZ::LOOP, DL,
              CurDAG->getVTList(MVT::Other, IV.getType()),
IV.getOperand(0), N->getOperand(3));
      CurDAG->ReplaceAllUsesWith(IV, SDValue(Loop, 1));
      return LOOP;
    }

where the key point is that you have to manually call
ReplaceAllUsesWith to move your induction variable onto the LOOP
instruction. The rest is just bog-standard "match a BR_CC" code.

The LOOP instruction would output a GPR (new induction variable) and
take the old one as an input (tied via "let Constraints = ..." if
they're really the same register), as well as a destination basic
block.

Hope this is a little helpful.

Tim.

P.S. Beware the position of the Chain changes between
pre-isel/post-isel and I can never remember which way round it should
be. There's at least a 50% chance I've got it wrong here. Probably
closer to 75%.

I guess my area of confusion is how do I know that the br_cc instruction is for a loop and not just an if statement. I am still getting familiar with the backend process, so sorry for any dumb questions.

-Dilan

Do you need to know that? Does the CPU handle non-loop uses of this
instruction poorly, for example? Otherwise, it seems like as long as
the semantics match up you're good to go.

If you *do* need to know, I think I'd probably go for a
MachineFunction pass later on that can use MachineLoopInfo to decide
when to combine the instructions. I don't think ISelDAGs have access
to loop information (at least I've never seen it used).

The annoyance with that approach is that dealing with MachineInstrs is
messy. But it's certainly viable.

Cheers.

Tim.

A better example might PowerPC's CTR loop pass.

Joerg

I'm working on project which involves writing a backend for a hypothetical
architecture. I am currently trying to figure out the best way to translate
for loops to use a specialized "loop" instruction the architecture
supports. The instruction is similar X86's loop instruction, where a
register is automatically decremented and the condition is automatically
checked to see if loop execution should continue.

I was wondering what is the best way to go about implementing this. I tried
looking to see how X86 implemented their loop instruction, but couldn't
really find where the IR instructions were being lowered to the loop
instruction.

A better example might PowerPC's CTR loop pass.

Joerg

Or Hexagon's hardware loops pass.

Roel