MachinePipeliner: Representing operand latencies around the backedge

DragonDisciple · February 12, 2021, 8:40pm

One fundamental calculation in software pipelining is the recurrence minimum ii (RecMII). The LLVM MachinePipeliner pass augments the DAG with edges to assist in this calculation in the function “updatePhiDependences”, which notes that it must modify the DAG because “ScheduleDAGInstrs no longer processes dependences for PHIs”.

In this function, we find instructions in the loop header that def a register which is used in a phi at the top of the loop header. This indicates that the register is live across the backedge of the loop. When this happens, the function adds an anti-edge from the phi to the instruction with latency 1. There is more that this function does, but I will avoid talking about that since that anti-edge is the topic of my discussion/question.

Consider an instruction that writes to a register in 2 cycles. The minimum amount of cycles that must occur before that value can be used is, therefore, 2.

Now put this instruction near the bottom of a loop that is a candidate for software pipelining.

Header:

%0 = PHI %1, Preheader, %2, Header

…

%2 = two_cycle_op %0

… ; terminators

Given the calculations as-written now, there will be an anti-edge between the PHI and two_cycle_op of latency 1, leading to a RecMII of 1.

Now, unroll this loop once:

Header:

%0 = PHI %1, Preheader, %3, Header

…

%2 = two_cycle_op %0

…

%3 = two_cycle_op %2

… ; terminators

Since the latency between the def of %2 and its use is 2, and the anti-edge between the PHI and def of %3 is 1, we get a RecMII of 3.

This shows that the algorithm is treating the backedge latency differently than it would otherwise have been treated in straight-line code.

I have trialed a potentially upstreamable solution that utilizes computeOperandLatency and works like the following:

Consider the following loop where two iterations have been linearized together for clarity:

loop_header:

; Iteration n

%0 = PHI %1, outside_loop, %2, loop_header

1 |%2 = do_something %0

; Iteration n+1

2 |%0 = PHI %1, outside_loop, %2, loop_header

3 |%2 = do_something %0

%2 of instruction 1 is the original def. It is defined in iteration n and used in iteration n+1.

%0 of instruction 2 is the phi def. It is the representation of OrigDef coming from a previous iteration.

%0 of instruction 3 is the true use of the original def. We want to get the latency between the original def and this operand.

If there are multiple true (non-PHI) uses of the original def, take the maximum.

If anyone has history or comments to add on the original approach, or would like to talk more about my approach/upstreaming the change in a review in some form, please let me know.

J.B. Nagurne

Code Generation

Texas Instruments

Topic		Replies	Views
LLVM Scheduler and Itinieraries: Negative latency? LLVM Dev List Archives	3	155	April 14, 2011
[MachinePipeliner] Replace `SwingSchedulerDAG` with directed graph that allows cycles Common CodeGen Infrastructure	11	659	September 25, 2024
extra one cycle of getOperandLatency LLVM Dev List Archives	1	154	December 31, 2013
Question on instruction itineraries LLVM Dev List Archives	3	132	August 17, 2011
Way to specify instruction latency in itinerary scheduling model LLVM Dev List Archives	2	205	November 12, 2015

MachinePipeliner: Representing operand latencies around the backedge

Related topics