Branches which return values in SelectionDAG

Hi all,

I am working on modeling an instruction similar to SystemZ’s ‘BRCT’, which takes a register, decrements it, and branches if the register is nonzero. I saw that the LLVM backend for SystemZ generates the instruction in a MachineFunctionPass as part of a pass intended to eliminate or combine compares.

I then looked at ARM, where it uses the HardwareLoops pass first, and then a combine that occurs in the ARM ISel stage. It replaces branch instructions with special ‘WLS’ and ‘LE’ nodes that are custom selected into t2WhileLoopStart and t2LoopEnd pseudo instructions with isBranch and isTerminator set. These pseudo instructions are finalized in a later MachineFunctionPass.

I had originally intended to use the HardwareLoops pass to do most of the initial transformation and bookkeeping, allowing me to utilize the generated intrinsics in my own pass to further transform and customize the loop.

What I found out, however, is that I don’t know enough about the SelectionDAG to know if this is possible.

Trying to combine the two concepts (Value-returning branches and handling them in the selection DAG), I wrote my backend to generate:

header:

%InitialVal = N

body:

%IndVar = PHI(%InitialVal, %header, %DecVal, %body)

%MultipleReturns = call {i32, i1} compare_and_maybe_decrement(%IndVar, 1)

%DecVal = extract {i32, i1} %MultipleReturns 0

%Cond = extract {i32, i1} %MultipleReturns 1

br %Cond, body, exit

exit:

Then, I attempted to combine the intrinsic, extractions, and branch together in the SelectionDAG.

What I found, however, is that this concept, which seems fine in the LLVM IR, is not fine in the DAG.

Specifically, there is a CopyToReg in the DAG that occurs between the intrinsic and the branch that saves off %DecVal. I presume it’s there because the value is leaving the DAG (to be copied from in the next iteration). With the branch node returning that value instead, it seems like there’s no legal location in which to place this necessary CopyToReg. If you order it after the fused branch, I believe it’s illegal because it’s logically incorrect (only copy if we’re terminating the loop?). If you order it before, I don’t think the DAG makes sense anymore:

t1 = CopyToReg %1, t2 ; Copying a value before it’s defined???

t2 = Target::BR_DEC …

Indeed, I get the abort “Operand not processed?” for the CopyToReg when I tried it, indicating something was amiss.

I’m more than willing to provide more context such as DAG dumps if people have ideas, I just didn’t want to fill this email with debug.

Is what I’m doing possible? Or does it make sense to keep the special and separate compare_and_maybe_decrement operation until after selection is finished so that I can fuse using MachineInstrs instead?

Thanks for any help!

J.B. Nagurne

Code Generation

Texas Instruments

I don’t know about SystemZ’s instructions, but what you described sounded exactly like a hardware loop construct to me. That’s why I am wondering why using the hardware loop pass (and some friends) isn’t working for you, that wasn’t entirely clear to me. After the HardwareLoop pass we have something like this:

@set.loop.iterations
hwloop:

@loop.decrement.reg
icmp
br hwloop

For ARM we indeed then have something like this using pseudos after isel:

t2DoLoopStart

hwloop:

$lr = t2LoopDec $lr
t2LoopEnd $lr, …
tB %bb.2, …

These pseudos do a decrement of the register holding the hwloop counter, which is consumed by branch instruction. This seems to match the semantics that you described: " which takes a register, decrements it, and branches if the register is nonzero", unless I miss something of course… Very late in the optimisation pipeline we have an ARM hardware loop pass that converts this in a hwloop and we just have something like this left:

$lr = DLS $r2
hwloop:

$lr = LE $lr

And while I think the semantics of our LE instructions is slightly different I think, I don’t think it matters (again, unless I miss something).

Sorry for not answering your actual isel question. Can’t answer that without digging into it, perhaps someone else can.

Cheers,
Sjoerd.

I need to do fixups after hardwareloops because of the operation order.

Instead of “if (–x) goto body;”
My instruction is “if (x) { x–; goto body; }”

Thus, I have to model this conditional decrement operation (as well as fixing up the starting value because it will iterate an extra time).

So, in the simplest terms, I’d have:

%reg = phi(N, %nextreg)

%cond = icmp ne, %reg, 0
%nextreg = loop.conditional.decrement.reg %cond, %reg, 1
br %cond, body, exit

I could (and probably will) take ARM’s path here and match multiple psedo instructions that get combined later, unless someone knows a way to avoid the CopyToReg problem caused by a terminator generating a value.

In summary, though, I took ARM’s implementation one step further and tried to combine everything into a single branch-like node, and that’s where the issues lie.

J.B. Nagurne
Code Generation
Texas Instruments