MachineScheduler: Latency of edges to call schedule boundary

Hi all, I have an example using a contrived target to help discuss an issue I’m seeing.

Specifically, I’m finding that VR-to-MR moves for arguments to a call are being given non-zero edge latencies. This confuses me because they aren’t really ‘uses’ in the traditional sense, and shouldn’t have a latency other than 0.

  • Consider a target with 4 registers X0-X3 and 2 register classes, X = {X0-X3} and SubX = {X0-X1}.
  • Consider a calling convention for this target such that X0 and X1 are used for passing
  • InstrItineraries are being used
  • And finally, consider a program where a SubX-class VR is live across a call which uses both X0 and X1.
%1:SubX = COPY %2:SubX
$x0 = ...
$x1 = ...
call ... implicit $x0, implicit $x1
; Use %1
  • The call implicitly uses X0 and X1 by calling convention
  • The call is a scheduling boundary (isSchedBoundary in MachineScheduler)
  • ScheduleDAGInstrs::addSchedBarrierDeps will add Uses with no associated MachineOperand because the Use SU is ExitSU
  • addPhysRegDataDeps will use SchedModel.computeOperandLatency with the VR-to-MR Copy as DefMI, and null as RegUse

The problem I’m seeing is that the DAG being built will assign an edge to the call of latency 1 to the defs of $x0 and $x1, and no edge from the def of %1 to the call. This results in the ready queue at bottom cycle 0 to contain only the def of %1, so it must be selected. This is bad news because $x0 and $x1 are live across that instruction, and require the entirey of the SubX register class. Unsurprisingly, register allocator fails to allocate %2:

$x0 = ...
$x1 = ...
%1:SubX = COPY %2:SubX ; No register for %2/%1, $x0 and $x1 must be live
call ... implicit $x0, implicit $x1
; Use %1

Indeed, if all edges are given 0 latency, the cost function for the def of %1 is negative, as it causes excess pressure on SubX. The scheduler correctly orders at least one of $x0 or $x1 definitions before %1’s def.

My question is: there’s a lot of moving parts here and I’m not sure what is the root of this problem.

  • computeOperandLatency will have difficulty identifying this issue as UseMI and UseOperIdx are both empty, meaning finding the call is more difficult
  • adjustSchedDependency, likewise, does not have the use available, but does know that the UseSU is the BoundaryNode. From there, you can infer that the use is a call.
  • Mutations are another option, but seem somewhat analagous to adjustSchedDependency

Or is there some other fundamental misunderstanding that I’m not seeing?