VLIW Scheduling

VLIW (Very Long Instruction Word) is a long instruction format (called
"group" hereafter) contains several instructions. These instructions
are not dependent on each other and could be issued in a single cycle.

At this moment there is no correspondent class for VLIW. MachineInstr
object can only represent one instruction. Usually the number of
instructions in a group is fixed. The grouping could be implicitly
maintained by inserting dummy nops to the instruction sequence. For
example, there are at most two instructions in a VLIW, if items in
MachineBasicBlock::Insts is:

  MUL
  ADD
  ADD
  NOP
  MUL
  NOP

It implies the scheduling:

  MUL - ADD
  ADD - NOP
  MUL - NOP

Each pairs could be issued in parallel.

However, "one single MachineInstr" is the "unit" of many passes, e.g.
LiveVariables and LiveIntervals. Without modifications, some code-gen
result may be inefficient (or incorrect).

For example:

  mul %reg1025, %reg1024, 1
  add %reg1026, %reg1024, 2

Suppose the life interval of %reg1024 ends at ADD (no use after ADD).
These two instructions are not dependent and could be scheduled
together:

  mul %reg1024, %reg1025, 1 - add %reg1026, %reg1025, 2

The life intervals of vi%reg1024 and %reg1025 are actually _not_
interfered with each other,

  %reg1024 => r1
  %reg1025 => r1
  %reg1026 => r2

  mul r1, r1, 1 - add r2, r1, 2

Note the physical register r1 could be re-used. r1 is the source
register of both instructions, its valued is fetched and used as the
inputs to MUL and ADD, before the MUL updates r1. It's not an
anti-dependency.

But without modification, the live interval analysis and the register
allocator will consider %reg1024 and %reg1025, and allocate different
physical registers to each of them. What's the minimal changes I have
to make to "cheat" the LiveIntervals and LiveVariables?

What else I may have to change for VLIW scheduling?

VLIW (Very Long Instruction Word) is a long instruction format (called
"group" hereafter) contains several instructions. These instructions
are not dependent on each other and could be issued in a single cycle.

At this moment there is no correspondent class for VLIW. MachineInstr
object can only represent one instruction. Usually the number of
instructions in a group is fixed. The grouping could be implicitly
maintained by inserting dummy nops to the instruction sequence. For
example, there are at most two instructions in a VLIW, if items in
MachineBasicBlock::Insts is:

MUL
ADD
NOP
MUL
NOP

It implies the scheduling:

MUL - ADD
ADD - NOP
MUL - NOP

Each pairs could be issued in parallel.

Yup, this is currently the approach we're taking with IA64, for example. It has bundles which are similar in spirit to what a VLIW has (though not identical).

However, "one single MachineInstr" is the "unit" of many passes, e.g.
LiveVariables and LiveIntervals. Without modifications, some code-gen
result may be inefficient (or incorrect).

For example:

mul %reg1025, %reg1024, 1
add %reg1026, %reg1024, 2

Suppose the life interval of %reg1024 ends at ADD (no use after ADD).
These two instructions are not dependent and could be scheduled
together:

mul %reg1024, %reg1025, 1 - add %reg1026, %reg1025, 2

The life intervals of vi%reg1024 and %reg1025 are actually _not_
interfered with each other,

%reg1024 => r1
%reg1025 => r1
%reg1026 => r2

mul r1, r1, 1 - add r2, r1, 2

I see.

Note the physical register r1 could be re-used. r1 is the source
register of both instructions, its valued is fetched and used as the
inputs to MUL and ADD, before the MUL updates r1. It's not an
anti-dependency.

Right.

But without modification, the live interval analysis and the register
allocator will consider %reg1024 and %reg1025, and allocate different
physical registers to each of them. What's the minimal changes I have
to make to "cheat" the LiveIntervals and LiveVariables?
What else I may have to change for VLIW scheduling?

I think the best way to handle this is to make "macro instructions". The current pipeline we have is this:

1. Instruction Selection (LLVM code to a target DAG)
2. Pre-pass Scheduling (target DAG to machineinstrs)
3. Register allocation
4. ...

Given this setup, I think it would make the most sense for you, to have a target-specific phase #2.5, that takes the individual operations and makes the VLIW bundles you need for proper register allocation. In the case above, this would turn into:

  mul_add r1, r1, 1 , r2, r1, 2

The problem with this is that you'll obviously have an N^2 explosion of instructions. If this isn't an option for you, you could set up a few classes of macro ops, so you could have something like this:

  vliw_rri_rri "mul", r1, r1, 1 , "add", r2, r1, 2

where "mul" and "add" are immediates that are rendered as the appropriate opcodes by the asmprinter. If you did this, you'd only need one instruction for each "pattern" (in this case, a VLIW packet where the first takes a register/register/immediate and the second takes the same).

This should expose to the RA the property that all inputs are read before the outputs are produced.

If you have your own custom isel/scheduler, it might make sense to have the scheduler build these packets for you.

-Chris