Now I’d like to append to the MachinePipeliner
an algorithm that takes into account register pressure when searching Initiation Interval (II).
Background
Modulo scheduling of software pipelining tends to increase register pressure due to the overlaps of each iteration.
Large register pressure can result in additional register spill/fill, which can cause performance degradation.
Therefore, considering register pressure when scheduling may yield better results.
In fact, there are several scheduling algorithms that attempt to reduce the resister pressure, such as the one in this paper.
Algorithm
The approach in the above paper is based on Iterative Modulo Scheduling algorithm, which differs from current MachinePipeliner’s one (Swing Modulo Scheduling).
So I come up with a simple algorithm to reduce the register pressure.
The idea is to reject an II where the maximum register pressure (corresponds to the term Maxlives
in the paper) is so high that register spill/fill may occur.
Calculating the maximum register pressure is not so difficult: simply unroll the scheduled instructions, then checks the live intervals of each register.
Details
In current MachinePipeliner
’s implementation, searching II is done in SwingSchedulerDAG::schedulePipeline
.
Here is a code fragment of the function and what I’d like to add.
bool scheduleFound = false;
for (unsigned II = MII; II <= MAX_II && !scheduleFound; II++) {
// Try to schedule with II
...
// At this point, the `scheduleFound` is true if there are no problems (i.e. scheduling went well)
// Our proposal is to insert an additional check that calculates the maximum register pressure of the schedule and rejects the II if the pressure is too high, as shown below
if (Register pressure is too high)
scheduleFound = false
}
The following is a pseudo code that calculates the maximum register pressure.
function CalcMaxRegPress(ScheduledInstructions)
N <- Number of Pressure Set
CurRegPress <- [0, 0, ..., 0] // N elements array, indexed by Pressure Set ID
MaxRegPress <- [0, 0, ..., 0]
Unrolled <- Unrolled ScheduleInstructions for simulating the loop from prolog to kernel
for Instr in Unrolled
for Def in AllDefs(Instr)
Increase CurRegPress corresponds to Def
end for
for LastUse in AllLastUses(Instr)
Decrease CurRegPress corresponds to LastUse
end for
for i in 1 to N
MaxRegPress[i] <- Max(MaxRegPress[i], CurRegPress[i])
end for
end for
return MaxRegPress
end function
Does anyone have any advice or thoughts?
Thanks.