Incorrect RecMII Calculations in MachinePipeliner

RecMII in the MachinePipeliner pass is calculated as follows:

  • Un-acyclify the DAG by reversing anti-edges
  • Construct an simple adjacency matrix from the DAG using a common, fairly straightforward algorithm
  • Perform Johnson’s circuit-finding algorithm. Each circuit is used to construct a ‘NodeSet’ object into a list
    • Each NodeSet calculates a field, ‘Latency’
  • The maximum Latency field in the list of NodeSets is the effective RecMII (the distance calculations found in the literature is currently ignored upstream)

My observation is in the calculation of ‘Latency’. I’ve found that given a particular circuit, the RecMII calculation is incorrect. I think this is a generic issue, and I’ll try to illustrate as best I can.

Consider a subgraph in a loop whose DAG nodes (indicated by their NodeNum field) are structured as the following, and whose edges are all weighted 1:

This subgraph, when passed through Johnson’s algorithm, results in 4 circuits:

  1. 3 → 5 → 9 → 13 → 17 → 3
  2. 3 → 5 → 6 → 9 → 13 → 17 → 3
  3. 3 → 5 → 9 → 13 → 14 → 17 → 3
  4. 3 → 5 → 6 → 9 → 13 → 14 → 17 → 3

Specifically, consider circuit 2)

The NodeSet constructor calculates the latency for this circuit. It does this through the following pseudocode:

map<SUnit *, unsigned> SuccessorLatencies
Latency = 0;
For each Node in Circuit
  SuccessorLatencies.clear()
  For each successor 'Succ' of Node
    continue if Succ is not in Circuit
    SuccessorLatencies[Succ] = max(Latency of all edges from Node to Succ)
  Latency += sum(SuccessorLatencies.values)

Given the circuit 3 → 5 → 6 → 9 → 13 → 17 → 3, I’d expect ‘Latency’ to be 6, as there are 6 jumps in the circuit.

However, the above algorithm will calculate Latency as 7. Node 5 has successors 6 and 9. Nodes 6 and 9 are both in the circuit, and will each therefore contribute 1 cycle to the final calculation, even though there is no direct path 5 → 9 in the circuit.

Adding some debug to the upstream NodeSet calculation confirms this:

Node 5:
  9
  9 contributes 1
  6
  6 contributes 1

This follows for circuits 3 (Expect Latency 6, get Latency 7):

Node 13:
  17
  17 contributes 1
  14
  14 contributes 1

and 4 (Expect Latency 7, get Latency 9), whose debug contains both of the above debug prints.

The solution seems straightforward. I understand that the maximum is being calculated because there may be multiple edges between Node and Succ, but there’s no reason to be looking at any successor that isn’t the immediate successor in the circuit.

In the example: Inspecting all edges from 5 → 6 is reasonable for circuit 2 and 4, but not 5 → 9. Similarly, inspecting all edges from 5 → 9 is reasonable for circuits 1 and 3, but not 5 → 6. The latter is solved via the ‘is not in Circuit’ check.

2 Likes