Suggestion for a more flexible DFA Packetizer

Hi All,
I would like to suggest (and implement) an improvement to LLVM’s DFA Packetizer.
I am currently implementing a backend for LLVM to compile towards a VLIW ISA.
One of the specificities of this ISA is that it is not a pure VLIW : it has 5 executions slots (2 ALUs, 1 MAU/FPU, 1 BCU and 1 LSU) but they are not totally independant. For example Some instructions use the MAU + part of the LSU, forbidding other instructions to be packed in the same bundle. This makes the architecture difficult to describe in the current state of Instruction itinerary description of LLVM.

To describe the instruction itineraries correctly I tried to used 0 cycles latency pipeline stages (multiple InstrStage<1, [FUs], 0> for a same itineraries but it seems the standard DFA based Packetizer only look at the first pipeline stages when selecting the units used by an instruction (for DFA state init or transition).

So I made the following changes:
→ exend the TableGen FuncUnit class towards
class FuncUnit<list subunits = []> {
list SubUnits = subunits;
}
→ duplicate the DFAPacketizerEmitter (one of TableGen backends) towards a new backend which works with unit state which are not only one bit set in the unsigned value but multiple bits (masks),
In fact an architecture is now described by simple units (SubUnits == []) and complex units (SubUnits != []) both are encoded by a single bit set in an unsigned mask when generated by the SubtargetEmitter, but when computing the Packetizer DFA we swap bit resource for mask resources to encode that a single units may in fact use several simple/basic resources.

This require the modificaiton of
addInsnClass and canAddInsnClass Function + DFA class + DFAPacketizerEmitter class

for example in canAddInsnClass rather than testing (~*SI & InsnClass) to see if an InsnClass is compatible with the current state we check than none of the multiple resource (bitmask) of the InsnClass is set in the state. It looks like this
for (unsigned j = 0; j < sizeof(unsigned) * 8; ++j) {
if ((1 << j) & InsnClass) { // the j-th is a possibility for InsnClass
InsnResource = unit_resource[j];
if (SI & InsnResource == 0) return true;
}
unit_reousrce is an unsigned array containing the bit mask of cumulated resource for every FuncUnits. It is computed when reading the FU list of the Itinerary list, and contains one bit for simple units and multiple bits for compound units.

the addInsnClass is modified in a similar way

  unsigned ResultingResourceState = thisState | (0x1 << j);
becomes 
  unsigned ResultingResourceState = thisState | unit_resources[j];

The modification implies some slowdown: in canAddInsnClass we change a bitwise test by a loop of tests, we must first compute unit_resources and transmit it from DFAPacketizerEmitter to DFA and State objects. Moreover, generally the architecture being described will be more complex with a more complex DFA. In my backend, I did not notice any visible increase in LLVM build time, but I will try to make some measures and try to improve the implementation if the community want to go forward with it.

The modifications are working in my version, they will need some clean-ups before an actual patch can be submitted. I want to know if those modifications have not been suggested (and rejected) before and if they might be of interest for someone before doing the work of generalizing the work I did for my backend to other.

best regards,
Nicolas

Glad to find this post, sad to see it didn’t get an answer. I bumped into the same problem.
I was surprised that DFA doesn’t seem to implement a ScheduleHazardRecognizer, and, conversely, that there isn’t a base Packetizer that uses one.
I would dearly have the two different languages (DFA & Scoreboard) on top of itineraries be harmonized, e.g. lift ComboFunctions to both the scoreboard and the DFA.
Can I find your implementation somewhere, and would it port easily to 15.3?

Kind regards
Martien