How to partition registers into different RegisterClass?

Hi, everyone.

I' have three set of registers - read-only regs, general purpose regs
(read and write), and write-only regs. How should I partition them
into different RegisterClasses so that I can easy define the
instruction?

All RegisterClasses must be mutally exclusive. That is, a register can
only be in a RegisterClass. Otherwise TableGen will raise an error
message.

  def ReadOnlyRegClass : RegisterClass<...>;
  def GeneralPurposeRegClass : RegisterClass<...>;
  def WriteOnlyRegClass : RegisterClass<...>;

  def MOV : BinaryInst<2, (ops GeneralPurposeRegClass :$dest,
GeneralPurposeRegClass :$src), "mov $dest, $src">;

There can be only one RegisterClass defined for each instruction
operand, but actually the destition operand could be
'GeneralPurposeRegClass ' or 'WriteOnlyRegClass ', and the source
operand can be 'ReadOnlyRegClass' or 'GeneralPurposeRegClass'.

I' have three set of registers - read-only regs, general purpose regs
(read and write), and write-only regs. How should I partition them
into different RegisterClasses so that I can easy define the
instruction?

[snip]

  def MOV : BinaryInst<2, (ops GeneralPurposeRegClass :$dest,
  GeneralPurposeRegClass :$src), "mov $dest, $src">;

There can be only one RegisterClass defined for each instruction
operand, but actually the destition operand could be
'GeneralPurposeRegClass ' or 'WriteOnlyRegClass ', and the source
operand can be 'ReadOnlyRegClass' or 'GeneralPurposeRegClass'.

Presumably, when you write your instruction selector, you know when you
want to have a write-only vs. general purpose and read-only vs. general
purpose register. Some things are naturally read-only, such as status
registers that are only modified as a side effect of instructions (fp
errors, overflow, etc.) and some are write-only registers (I'm guessing
video and network cards have these for outputs).

So a solution that may work for you would be to define permutations of
the instructions that explicitly specify what type of operands your
instruction takes (because different instructions need different LLVM
opcodes) but the instructions would print out the same, so in your
example:

MOVgg : General, General
MOVwg : Write-only, General
MOVwr : Write-only, Read-only

Note that if they all have the same "mov $dest, $src" print string, they
will be printed out identically, but you need to separate them anyway.
If you look at the X86 backend, you might find many flavors of MOV
instructions as well, but for a different reason.

The best way is to lazily add permutations that you need as you go
along, so you don't have to convert your entire .td file(s) over to a
complete set of permutations immediately.

Good luck,

All registers in my hardware are 4-element vector registers (128-bit).
Some are floating point registers, and the others are integer
registers.

I typedef two packed classes: [4 x float] and [4 x int], and add an
enum 'packed' to MVT::ValueType (ValuesTypes.h).

I declared all 'RegisterClass'es to be 'packed' (first argument of
RegisterClass):

  def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
  def INT_ReadOnlyRC : RegisterClass<packed, 128, [I0, I1]>;
  def FP_ReadOnlyRC : RegisterClass<packed, 128, [F0, F1]>;
  
  def MOVgg : BinaryInst<0x51, (
    ops GeneralPurposeRC :$dest,
    ope GeneralPurposeRC :$src), "mov $dest, $src">;

  def MOVgi : BinaryInst<0x52, (
    ops GeneralPurposeRC :$dest,
    ope INT_ReadOnlyRC :$src), "mov $dest, $src">;

  def MOVgf : BinaryInst<0x52, (
    ops GeneralPurposeRC :$dest,
    ope FP_ReadOnlyRC :$src), "mov $dest, $src">;

In the instruction selector, SDOperand::getValueType() always returns
'MVT::packed' for all operands. I cannot distinguish between
GeneralPurposeRC, INT_ReadOnlyRC, FP_ReadOnlyRC. But a correct
register class is necessary for SSARegMap to create a virtual
register.

All registers in my hardware are 4-element vector registers (128-bit).
Some are floating point registers, and the others are integer
registers.

I typedef two packed classes: [4 x float] and [4 x int], and add an
enum 'packed' to MVT::ValueType (ValuesTypes.h).

I declared all 'RegisterClass'es to be 'packed' (first argument of
RegisterClass):

def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
def INT_ReadOnlyRC : RegisterClass<packed, 128, [I0, I1]>;
def FP_ReadOnlyRC : RegisterClass<packed, 128, [F0, F1]>;

...

In the instruction selector, SDOperand::getValueType() always returns
'MVT::packed' for all operands. I cannot distinguish between
GeneralPurposeRC, INT_ReadOnlyRC, FP_ReadOnlyRC. But a correct
register class is necessary for SSARegMap to create a virtual
register.

What does a 'read only' register mean? Is it a constant (e.g. returns 1.0)? Otherwise, how can it be a useful value?

-Chris

I' have three set of registers - read-only regs, general purpose regs
(read and write), and write-only regs. How should I partition them
into different RegisterClasses so that I can easy define the
instruction?

[snip]

  def MOV : BinaryInst<2, (ops GeneralPurposeRegClass :$dest,
  GeneralPurposeRegClass :$src), "mov $dest, $src">;

There can be only one RegisterClass defined for each instruction
operand, but actually the destition operand could be
'GeneralPurposeRegClass ' or 'WriteOnlyRegClass ', and the source
operand can be 'ReadOnlyRegClass' or 'GeneralPurposeRegClass'.

Presumably, when you write your instruction selector, you know when you
want to have a write-only vs. general purpose and read-only vs. general
purpose register. Some things are naturally read-only, such as status
registers that are only modified as a side effect of instructions (fp
errors, overflow, etc.) and some are write-only registers (I'm guessing
video and network cards have these for outputs).

So a solution that may work for you would be to define permutations of
the instructions that explicitly specify what type of operands your
instruction takes (because different instructions need different LLVM
opcodes) but the instructions would print out the same, so in your
example:

MOVgg : General, General
MOVwg : Write-only, General
MOVwr : Write-only, Read-only

Note that if they all have the same "mov $dest, $src" print string, they
will be printed out identically, but you need to separate them anyway.
If you look at the X86 backend, you might find many flavors of MOV
instructions as well, but for a different reason.

The best way is to lazily add permutations that you need as you go
along, so you don't have to convert your entire .td file(s) over to a
complete set of permutations immediately.

Good luck,
--
Misha Brukman :: http://misha.brukman.net :: http://llvm.cs.uiuc.edu

-Chris

Yes, it's a constant register.

Because the instruction cannot contain an immediate value, a constant
value may be stored in a constant register, and it's defined _before_
the program starts by API. For example:

  SetConstantValue( 5, Vector4( 1, 2, 3, 4 ); // C5 = <1,2,3,4>
  HANDLE handle = LoadCodeFromFile( filename );
  SetCode( handle ); // C5 is referenced here
  Execute();

Ah, ok. In that case, you want to put all of the registers in one register file, and not make the constant register allocatable (e.g. see X86RegisterInfo.td, and note how the register classes include EBP and ESP, but do not register allocate them (through the definition of allocation_order_end()).

-Chris

Thanks, I think it can solve my problem.

But please allow me to explain the hardware in detail. Hope there is
more elegant way to solve it.

The hardware is a "stream processor". That is, It processes samples
one by one. Each sample is associated with several 128-bit
four-element vector registers, namely:

* input registers - the attributes of the sample, the values of the
registers are different and initialized for each sample before
execution. READ-ONLY (can only be declared once by 'dcl' instruction).

* constant registers - sample-invariant. READ-ONLY (can only be
defined once by 'def' instruction). All samples shares the same set of
constant register values.

* general purpose registers - values are not initialized before the
execution and destroyed after execution. They can be read and written.

* output registers - WRITE-ONLY.

Sample program converted to pseudo-LLVM assembly (SSA):

  %Vec4 = type < 4 x float>

  // declare input registers and
  // define constant register values
  %v1 = dcl %Vec4 xyz
  %v2 = dcl %Vec4 color
  %c1 = def %Vec4 <1,2,3,4>

  // v1, v2, c1 are not allowed to be destination register
  // of any instruction hereafter.
  
  %r1 = add %Vec4 v1, c1
  %r2 = mul %Vec4 v1, c2
  %o1 = mul %Vec4 r2, v2 // write the output register 'o1'

I planed to partition the register into different RegisterClass:
input, output, general purpose, constant, etc.

  def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
  def InputRC : RegisterClass<packed, 128, [V0, V1]>;
  def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;

def ADDgg : BinaryInst<0x51, (
   ops GeneralPurposeRC :$dest,
   ope GeneralPurposeRC :$src), "add $dest, $src">;

def ADDgi : BinaryInst<0x52, (
   ops GeneralPurposeRC :$dest,
   ope InputRC :$src), "add $dest, $src">;

def ADDgc : BinaryInst<0x52, (
   ops GeneralPurposeRC :$dest,
   ope ConstantRC :$src), "add $dest, $src">;

The problem is: SDOperand alwasy return the 'type' of the value (in
this case, 'packed', the first argument of RegisterClass<>), but not
the 'RegisterClass'. With two 'packed' operands, the instruction
selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be
generated (BuildMI() function).

The same problem exists when there are two types of costant registers,
floating point and integer, and each is declared 'packed' ([4xfloat]
and [4xint]). The instruction selector doesn't know which instruction
it should produce because the newly defined MVT type 'packed' is
always used for all operands (registers), even if it's acutally a
[4xfloat] or [4xint].

Hope I understand you correctly:

def C0 : ConstFpReg<0, "c0">;
...
def C200 : ConstFpReg<199, "c200">;

def I0 : ConstIntReg<0, "i0">;
...
def I100 : ConstIntReg<100, "i100">;

def R0 : TempReg<0, "r0">;
def R32 : TempReg<31, "r32">;

def V0 : InputReg<0, "v0">;
..
def V10 : InputReg<9, "v10">;

def O0 : OutputReg<0, "o0">;
..
def O4 : OutputReg<4, "o4">;

def FloatingPointRC : RegisterClass<packed, 128,
    [R0, R1, R2, ..., R32,
     C0, C1, ..., C200,
     V0, ..., V10,
     O1, O2, O3, O4]> {
  let Methods = [{
    iterator allocation_order_end(MachineFunction &MF) const {
      return end()-(4+10+200); // only TempReg can be allocated
  }];
}
     
def IntegerRC : RegisterClass<packed, 128,
    [I0, I1, ..., I100]>;

And linearly assigning the read-only registers for each definition of them?

Yes, this is exactly what I meant.

-Chris

But please allow me to explain the hardware in detail. Hope there is
more elegant way to solve it.

Sounds good!

The hardware is a "stream processor". That is, It processes samples
one by one. Each sample is associated with several 128-bit
four-element vector registers, namely:

* input registers - the attributes of the sample, the values of the
registers are different and initialized for each sample before
execution. READ-ONLY (can only be declared once by 'dcl' instruction).

Ok.

* constant registers - sample-invariant. READ-ONLY (can only be
defined once by 'def' instruction). All samples shares the same set of
constant register values.

Ok. I don't think the definition of these should be represented in your code. The code should just read them when needed.

* general purpose registers - values are not initialized before the
execution and destroyed after execution. They can be read and written.

Yup, these should be register allocated.

* output registers - WRITE-ONLY.

And these should be explicitly defined once, also not register allocated.

Sample program converted to pseudo-LLVM assembly (SSA):

%Vec4 = type < 4 x float>

// declare input registers and
// define constant register values
%v1 = dcl %Vec4 xyz
%v2 = dcl %Vec4 color
%c1 = def %Vec4 <1,2,3,4>

// v1, v2, c1 are not allowed to be destination register
// of any instruction hereafter.

%r1 = add %Vec4 v1, c1
%r2 = mul %Vec4 v1, c2
%o1 = mul %Vec4 r2, v2 // write the output register 'o1'

Here, the v1/v2/c1/o1 registers should be represented as explicit registers, and the GPRs should be virtual registers. This would give you code that looks something like this:

%reg1024 = add v1, c1
%reg1025 = mul v1, c2
%reg1026 = mul %reg1024, %v2
%o1 = mov %reg1026

The 'mov' register-to-register copy instruction will be coallesced and eliminated by the register allocator. The regalloc will eliminate the virtual registers, assigning physical GPRs. This is what the 'allocation order' is to cover.

I planed to partition the register into different RegisterClass:
input, output, general purpose, constant, etc.

def GeneralPurposeRC : RegisterClass<packed, 128, [R0, R1]>;
def InputRC : RegisterClass<packed, 128, [V0, V1]>;
def ConstantRC : RegisterClass<packed, 128, [C0, C1]>;

The way you want to partition these is based on how the instruction set works. If there is a single 'add' instruction that can operate on any of these registers, there should be a single register class. If there are two adds (as it looks like you have below, judging by the opcode) with different register constraints, then you should partition the registers so that each the register classes line up with the instruction operand requirements.

def ADDgg : BinaryInst<0x51, (
  ops GeneralPurposeRC :$dest,
  ope GeneralPurposeRC :$src), "add $dest, $src">;

def ADDgi : BinaryInst<0x52, (
  ops GeneralPurposeRC :$dest,
  ope InputRC :$src), "add $dest, $src">;

def ADDgc : BinaryInst<0x52, (
  ops GeneralPurposeRC :$dest,
  ope ConstantRC :$src), "add $dest, $src">;

The problem is: SDOperand alwasy return the 'type' of the value (in
this case, 'packed', the first argument of RegisterClass<>), but not
the 'RegisterClass'. With two 'packed' operands, the instruction
selector doesn't know whether a ADDgg, ADDgi, or an ADDgc should be
generated (BuildMI() function).

Right. You don't want to do this sort of partitioning. All of the 'computed' values should be virtual registers which will end up being assigned to GPRs. The register allocator will attempt to coallesce the GPR into an output or input register if possible. To allow this coallescing to happen, implement the TargetInstrInfo::isMoveInstr virtual method for your target.

The same problem exists when there are two types of costant registers,
floating point and integer, and each is declared 'packed' ([4xfloat]
and [4xint]). The instruction selector doesn't know which instruction
it should produce because the newly defined MVT type 'packed' is
always used for all operands (registers), even if it's acutally a
[4xfloat] or [4xint].

It might make sense to add two MVT enums: one for packed integers, and one for packed floats?

-Chris

I thought about that too, but what if:
* there are many packed types, 16 and 32-bit floating points, 16 and
32-bit integers, a lot of enums will be needed.
* there number of elements in a packed type could vary.

There could be a more general way to support packed type. The member
fucntion SequentialType::getElementType() returns the type of the
packed elements:

File: include/llvm/Type.h
<code>
  class SequentialType : public CompositeType {
  public:
    inline const Type *getElementType() const { return ContainedTys[0]; }
  };

  class PackedType : public SequentialType {
  public:
    inline unsigned getNumElements() const { return NumElements; }
  };
</code>

If SDOperand can return a "const Type *", the element type of the
packed type can be obtained, and only one enum value 'packed' needed
to be added to MVT::Type. Not only the element type can be available,
but also the number of elements.

Tzu-Chien Chiu wrote:

The same problem exists when there are two types of costant registers,
floating point and integer, and each is declared 'packed' ([4xfloat]
and [4xint]). The instruction selector doesn't know which instruction
it should produce because the newly defined MVT type 'packed' is
always used for all operands (registers), even if it's acutally a
[4xfloat] or [4xint].

It might make sense to add two MVT enums: one for packed integers, and one
for packed floats?

I thought about that too, but what if:
* there are many packed types, 16 and 32-bit floating points, 16 and
32-bit integers, a lot of enums will be needed.
* there number of elements in a packed type could vary.

You're right. However, I would still prefer that you add an enum value for each format that you need.

If SDOperand can return a "const Type *", the element type of the
packed type can be obtained, and only one enum value 'packed' needed
to be added to MVT::Type. Not only the element type can be available,
but also the number of elements.

The problem with this approach is that it will inflate the size of the MVT representation. The MVT is supposed to be an enum that can be used to index into tables. Adding pointers to other types would make that not work.

In the future, as the number of different formats grows, we can reevaluate this decision.

-Chris