Mismatch between DFA and Scoreboard

Hi,

I see a discrepancy between the ScoreBoard and the DFA when deciding on available resources. I think I can trace that down to DFAPacketizerEmitter.cpp not heeding the NextCycles field of the raw itinerary data. In essence I have two itineraries that can be scheduled in the same cycle. In scoreboard form:

InsA:
001
000
000
100

InsB
010
100

The two empty slots of InsA are coded by NextCycles=2 in the first stage (actually 7, but this captures the essence). I run the PostRASched with ScoreboardHazardRecognizer on an example with just these two instructions and it schedules them in the same cycle, with the scoreboard in the debug output looking entirely sane. DFAPacketizer, run immediately after, says the reources are not available.

My questions

  • Am I right in assuming that the scoreboard and the DFA should always agree about resource conflicts?
  • Should I rewrite my itineraries to fully expanded form, i.e. without using NextCycles?
  • Am I in DFA state explosion territory with eight basic functional units, seven pipeline stages and variable timing as shown by the instructions above?

A less urgent, more philosophical question:

  • Is there a difference between NextCycles = -1 and NextCycles = 1?

Kind regards,
Martien

NextCycles is also known as TimeInc I guess. And answring my last question: -1 will take the default from cycles, 1 is just 1.

I am not an expert on the DFA, but I have used it. You’re correct that there are differences between the DFA and scoreboard. The DFA, as used by Hexagon, encodes information about issue slots, but doesn’t account for latency or instruction cycles. That is, it simply determines if a set of n instructions be legally issued together. When the DFA was added to LLVM, Hexagon instructions executed in a single cycle. That is no longer true and instructions can take multiple cycles to complete, but the resources used after the issue slot aren’t modelled by the DFA.

For example, say an instruction is issued at cycle 0 on functional unit A, but takes m cycles to complete. Only the issue slot is modeled by the DFA, at cycle 0. The resources used for the m-1 cycles is not modeled (for Hexagon). That means it’s possible to issue another instruction at cycle 1 on functional unit A because the prior instruction is at a different pipeline stage.

If you’re modelling pipeline stages with the DFA, then I wouldn’t be surprised if you see DFA state explosion, or even other issues as that type of functionality probably isn’t tested well.

That’s my recollection of the DFA, so I may be missing some details…

Thanks,
Brendon

Thanks @bcahoon, that seems to confirm my findings. My problem is to code a register write conflict which has instruction dependent timing. So a load writes in later stage than a mul, which writes in a later stage than an add. My scoreboard has 8 independent FuncUnits and is seven deep. It will need all the time-shifted states to keep track of the register write conflict.

For now, I have eliminated the need for the DFA by doing packetization based on the scoreboard. I would like to have the DFA available under the ScheduleHazardRecognizer interface, so that I can plug it into the MultiHazardRecognizer as a fast shortcut. As it is now, I would need two scheduler models to have that conjunction. ComboFunctions aren’t interpreted by the scoreboard hazard recognizer, whereas cycles and time steps aren’t recognized by the DFA. I may be flattered into contributing a DFA implementation of the Scoreboard that gives a quick ‘hasHazard’ check. before entering the full scoreboard query.

Regards,
Martien