Terminators which define values

This is somewhat of a followup/continuation to my original discussion regarding LLVM IR and branch instructions.

The main question, before I dive into my research, is: How are terminators that define values supported at the IR, DAG, and MI layers, if at all?

My target has a branch instruction which writes to a register. In C: “if (Rx) { Rx–; goto destination; }”

My original implementation attempted to combine the elemental operations into a SelectionDAG node representing the whole operation, called BR_DEC (BRanch with DECrement).

This seemed to be completely unsupported.

The error, when I attempted to do this, took form in the following transformation from IR:

%2 = PHI(%2, body, N, preheader)

%1 = sub %2, 1

%3 = cmp ne %1, 0

%4 = br %3, body, exit

to SelectionDAG:

t2: ch= CopyFromReg %2

t4: i32 = sub t2, 1

t6: ch = CopyToReg %1, t4 ; Required because t2 is live in the successor (body)

t8: i1 = setcc, t4, 0, setne

t10: ch = brcond t8, body

t12: ch = br exit

Post-combine (Combines sub, compare, branch together)

t6: i32 = CopyToReg %0, t4

t4: i32, ch = TARGET::BR_DEC t2, 1, body ; If t2 is not 0, decrement t2 and goto body

This is illegal. Because of the combination, there now seems to be no legal location for the CopyToReg, as far as I know. How does a value defined by a terminator in SelectionDAG make it back around the backedge (i.e., the successor) of the loop?

The stopgap solution to just move forward was to follow ARM’s model, which keeps the operations separate in the form of “Loop End”, LE, and “Loop Decrement”, LOOP_DEC.

However, that ran into another problem, with another orthogonal set of prior art.

After selection, I had the following:

%6 = PHI %19, %bb.1, %27, %bb.4

%27 = LoopDec %6, 1

LoopEnd %27, %bb.4

I added a finalize step that took the LoopDec and LoopEnd pseudo-instructions, and formed the BR_DEC:

%6 = PHI %19, %bb.1, %27, %bb.4

%27 = BR_DEC %6(tied-def 0), 1, %bb.4, 0 ; if (%6 > 0) %6–, goto %bb.4

BR_DEC is a Terminator, and that fact is the issue.

The PHI Elimination pass attempts to find locations to insert COPY operations for tied defs. In this case, it fails to insert the COPY instruction in a legal location. Instead, I observe:

%6 = COPY killed %32

%32 = COPY killed %27

%27 = BR_DEC killed %6(tied-def 0), 1, %bb.5, 0

The default algorithm inserts the “PHISourceCopy” at the end of the basic block, even if a terminator is the instruction that defs its source. The comment in findPHICopyInsertPoint states: “This needs to be after any def of SrcReg, but before any subsequent point where control flow might jump out of the basic block.” This error is only caught in the verifier, much later, as “Def does not dominate all uses”

The prior art I mentioned is the existence of the “createPHISourceCopy” target function, added for the AMDGPU target.

This target has, in particular, one instruction which is a terminator but which also defines a value, “SI_IF”. When creating a COPY to handle a PHI, the target function checks to see if the source register is the result of SI_IF, and then generates a curious looking terminator-move instruction after it, which is a wrapper for a real move instruction, but has the IsTerminator bit set.

Can I override createPHISourceCopy and introduce a temporary move-like terminator that can be removed later? How do I reconcile the ordering that makes it seem like I’m conditionally branching, and then, if I don’t branch, assigning the value? There’s still the nagging thought that there must be an easier way. The simple description of what I want is: “This branch defines a value, which is live-out of the block”, and that doesn’t seem like a scenario that should have been this difficult to comprehend and account for. Thus, I’m hoping I’m missing something fundamental in my understanding that can help me make sense of all of all this taken together.

Regards,

J.B. Nagurne

Code Generation

Texas Instruments