RFC: Implement variable-sized register classes

I have posted a patch that switches the API to one that supports this (yet non-existent functionality) earlier:
https://reviews.llvm.org/D24631

The comments from that were incorporated into the following RFC.

Motivation:

Certain targets feature "variable-sized" registers, i.e. a situation where the register size can be configured by a hardware switch. A common instruction set would then operate on these registers regardless of what size they have been configured to have. A specific example of that is the HVX coprocessor on Hexagon. HVX provides a set of vector registers, and can be configured in one of two modes: one in which vectors are 512 bits long, and one in vectors are 1024 bits in length. The size only determines the number of elements in the vector register, and so the semantics of each HVX instruction does not change: it performs a given operation on all vector elements. The encoding of the instruction does not change between modes, in fact, it is possible to have a binary that runs in both modes.

Currently the register size (strictly speaking, "spill slot size") and related properties are fixed and immutable in a RegisterClass. In order to allow multiple possible register sizes, several RegisterClass objects may need to be defined, which then will require each instruction to be defined twice. This is what the HVX code does. Another approach may be to define several sets of physical registers corresponding to different sizes, and have a large RegisterClass which would be the union of all of them. This could avoid having to duplicate the instructions, but would lead to problems with getting the actual spill slot size or alignment.

Since the number of targets allowing this kind of variability is growing (besides Hexagon, there is RISC-V, MIPS, and out of tree targets, such as CHERI), LLVM should allow convenient handling of this type of a situation. See comments in https://reviews.llvm.org/D23561 for more details.

General approach:

1. Introduce a concept of a hardware "mode". This "mode" should be immutable, that is, it should be treated as a fixed property of the hardware throughout the execution of the program being compiled. This is different from, for example, floating point rounding mode, which can be changed at run-time. In LLVM, the mode would be determined by subtarget features (reflected in TargetSubtargetInfo).

2. Move the register/spill size and alignment information from MCRegisterClass, and into TargetRegisterInfo. This means that this data will no longer be available to the MC layer. Note that the size/alignment information will be provided by the TargetRegisterInfo object, and not by each individual TargetRegisterClass. A TargetRegisterInfo object would be created for a specific hardware mode, so that it would be able to provide the necessary information without having to consult TargetSubtargetInfo.

3. Introduce TableGen support for specifying instruction selection patterns involving data types depending on the hardware mode.

4. Require that the sub-/super-class relationships between register classes are the same across all hardware modes.

The largest impact of this change would be on TableGen, since it needs to be aware of the fact that value types under consideration would depend on a hardware mode. For example, when having an add-registers instruction defined to work on 64-bit registers, providing an additional selection pattern for 128-bit registers would present difficulties:

   def AddReg : Instruction {
     let OutOperandList = (outs GPR64:$Rd);
     let InOperandList = (ins GPR64:$Rs, GPR64:$Rt);
     let Pattern = [(set GPR64:$Rd, (add GPR64:$Rs, GPR64:$Rt))]>;
   }

the pattern

   def: Pat<(add GPR128:$Rs, GPR128:$Rt), (AddReg $Rs, $Rt)>;

would result in a type interference error from TableGen. If the class GPR64 was amended to also allow the value type i128, TableGen would no longer complain, but may generate invalid instruction selection code.

To solve this, TableGen would need to be aware of the association between value types and hardware modes. The rest of this proposal describes the programming interface to provide necessary information to TableGen.

1. Define a mode class. It will be recognized by TableGen as having a special meaning.

   class HwMode<list<Predicate> Ps> {
     // List of Predicate objects that determine whether this mode
     // applies. This is used for situation where the code generated by
     // TableGen needs to determine this, as opposed to TableGen itself,
     // for example in the isel pattern-matching code.
     list<Predicate> ModeDef = Ps;
   }

From the point of view of the code generated by TableGen, HwMode is equivalent to a list of Predicate objects. The difference is in how TableGen itself treats it: TableGen will distinguish two objects of class HwMode if they have different names, regardless of what sets of predicates they contain. One way to think of it is that the name of the object would serve as a tag denoting the hardware mode.

In the example of the AddReg instruction, we could define two modes:

   def Mode64: Mode<[...]>;
   def Mode128: Mode<[...]>;

but so far there would not be much more that we could do.

2. To make a use of the mode information, provide a class to associate a HwMode object with a particular value. This will be done by having two lists: one with HwMode objects and another with the corresponding values. Since TableGen does not provide a way to define class templates (in the same sense as C++ does), the actual interface will be split in two parts. First is the "mode selection" base class:

   class HwModeSelect<list<HwMode> Ms> {
     list<HwMode> Modes; // List of unique hw modes.
   }

This will be a "built-in" class for TableGen. It will be a base class, and treated as "abstract" since it only contains half of the information. Each derived class would then need to define a member "Values", which is a list of corresponding values, of the same length as the list of modes. The following definitions will be useful for defining register classes and selection patterns:

   class IntSelect<list<Mode> Ms, list<int> Is>
       : HwModeSelect<Ms> {
     // Select an integer literal.
     list<int> Values = Is;
   }

   class ValueTypeSelect<list<Mode> Ms, list<ValueType> Ts>
       : HwModeSelect<Ms> {
     // Select a value type.
     list<ValueType> Values = Ts;
   }

   class ValueTypeListSelect<list<Mode> Ms, list<list<ValueType>> Ls>
       : HwModeSelect<Ms> {
     // Select a list of value types.
     list<list<ValueType>> Values = Ls;
   }

3. The class RegisterClass would get new members to hold the configurable size/alignment information. If defined, they would take precedence over the existing members RegTypes/Size/Alignment.

   class RegisterClass {
     ...
     ValueTypeListSelect VarRegTypes; // The names of these members
     IntSelect VarRegSize; // could likely be improved...
     IntSelect VarSpillSize; //
     IntSelect VarSpillAlignment //
   }

To fully implement the AddReg instruction, the target would then define the register class:

   class MyRegisterClass : RegisterClass<...> {
     let VarRegTypes = ValueTypeListSelect<[Mode64, Mode128],
             [[i64, v2i32, v4i16, v8i8], // Mode64
              [i128, v2i64, v4i32, v8i16, v16i8]]>; // Mode128
     let VarRegSize = IntSelect<[Mode64, Mode128], [64, 128]>;
     let VarSpillSize = IntSelect<[Mode64, Mode128], [64, 128]>;
     let VarSpillAlignment = IntSelect<[Mode64, Mode128], [64, 128]>;
   }

   def MyIntReg: MyRegisterClass { ... };

And following that, the instruction:

   def AddReg: Instruction {
     let OutOperandList = (outs MyIntReg:$Rd);
     let InOperandList = (ins MyIntReg:$Rs, MyIntReg:$Rt);
     let AsmString = "add $Rd, $Rs, $Rt";
     let Pattern = [(set MyIntReg:$Rd, (add MyIntReg:$Rs,
                                            MyIntReg:$Rt))]>;
   }

-Krzysztof

I have posted a patch that switches the API to one that supports this (yet
non-existent functionality) earlier:
https://reviews.llvm.org/D24631

The comments from that were incorporated into the following RFC.

Motivation:

Certain targets feature "variable-sized" registers, i.e. a situation where
the register size can be configured by a hardware switch. A common
instruction set would then operate on these registers regardless of what
size they have been configured to have. A specific example of that is the
HVX coprocessor on Hexagon. HVX provides a set of vector registers, and can
be configured in one of two modes: one in which vectors are 512 bits long,
and one in vectors are 1024 bits in length. The size only determines the
number of elements in the vector register, and so the semantics of each HVX
instruction does not change: it performs a given operation on all vector
elements. The encoding of the instruction does not change between modes, in
fact, it is possible to have a binary that runs in both modes.

Currently the register size (strictly speaking, "spill slot size") and
related properties are fixed and immutable in a RegisterClass. In order to
allow multiple possible register sizes, several RegisterClass objects may
need to be defined, which then will require each instruction to be defined
twice. This is what the HVX code does. Another approach may be to define
several sets of physical registers corresponding to different sizes, and
have a large RegisterClass which would be the union of all of them. This
could avoid having to duplicate the instructions, but would lead to
problems with getting the actual spill slot size or alignment.

Since the number of targets allowing this kind of variability is growing
(besides Hexagon, there is RISC-V, MIPS, and out of tree targets, such as
CHERI), LLVM should allow convenient handling of this type of a situation.
See comments in https://reviews.llvm.org/D23561 for more details.

ARM SVE sounds like it will have similar issues:
https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture

-- Sean Silva

From glancing over the slides, it seems like SVE has dynamically sized (i.e. you don’t know yet at compile time) registers which would be a step further than this. Of course the stuff in here wouldn’t hurt for that as it pushes the code into a direction to rely less on well-known/fixed register sizes.

  • Matthias

I have posted a patch that switches the API to one that supports this
(yet non-existent functionality) earlier:
https://reviews.llvm.org/D24631

The comments from that were incorporated into the following RFC.

Motivation:

Certain targets feature "variable-sized" registers, i.e. a situation
where the register size can be configured by a hardware switch. A common
instruction set would then operate on these registers regardless of what
size they have been configured to have. A specific example of that is the
HVX coprocessor on Hexagon. HVX provides a set of vector registers, and can
be configured in one of two modes: one in which vectors are 512 bits long,
and one in vectors are 1024 bits in length. The size only determines the
number of elements in the vector register, and so the semantics of each HVX
instruction does not change: it performs a given operation on all vector
elements. The encoding of the instruction does not change between modes, in
fact, it is possible to have a binary that runs in both modes.

Currently the register size (strictly speaking, "spill slot size") and
related properties are fixed and immutable in a RegisterClass. In order to
allow multiple possible register sizes, several RegisterClass objects may
need to be defined, which then will require each instruction to be defined
twice. This is what the HVX code does. Another approach may be to define
several sets of physical registers corresponding to different sizes, and
have a large RegisterClass which would be the union of all of them. This
could avoid having to duplicate the instructions, but would lead to
problems with getting the actual spill slot size or alignment.

Since the number of targets allowing this kind of variability is growing
(besides Hexagon, there is RISC-V, MIPS, and out of tree targets, such as
CHERI), LLVM should allow convenient handling of this type of a situation.
See comments in https://reviews.llvm.org/D23561for more details.

ARM SVE sounds like it will have similar issues: https://community.arm.
com/groups/processors/blog/2016/08/22/technology-update-
the-scalable-vector-extension-sve-for-the-armv8-a-architecture

From glancing over the slides, it seems like SVE has dynamically sized
(i.e. you don't know yet at compile time) registers which would be a step
further than this.

From what Krzysztof wrote, it sounds like HVX has a similar situation ("it

is possible to have a binary that runs in both modes").

-- Sean Silva

Yes. The instruction and register encodings are identical between the modes. The mode is controlled by a bit in some system configuration register, otherwise the application does not know what mode it works in. Vector loads and stores are indexed in a similar way as in VLA, i.e.
   vmem(r0+#2) = v0
will store vector register v0 at the address r0 + 2*VL.

In practice, HVX programs are usually compiled for one of the modes. I think that the biggest complication in writing dual-mode programs is that the application does not have a good way of finding out what mode it runs in by querying the hardware (IIRC you need to run in the supervisor mode to examine the configuration bit). Another thing is that HVX users generally have a specific mode in mind when developing programs and being able to run in a different mode is not a high priority for them. At least for now...

-Krzysztof

Though I wouldn't expect the spilling (particularily the stackframe layout) code to handle dynamically sized registers properly.

- Matthias

I have posted a patch that switches the API to one that supports this (yet
non-existent functionality) earlier:
https://reviews.llvm.org/D24631

The comments from that were incorporated into the following RFC.

Thank you for writing this up. Your proposal is now much clearer to me.

1. Introduce a concept of a hardware "mode". This "mode" should be
immutable, that is, it should be treated as a fixed property of the hardware
throughout the execution of the program being compiled. This is different
from, for example, floating point rounding mode, which can be changed at
run-time. In LLVM, the mode would be determined by subtarget features
(reflected in TargetSubtargetInfo).

2. Move the register/spill size and alignment information from
MCRegisterClass, and into TargetRegisterInfo. This means that this data will
no longer be available to the MC layer. Note that the size/alignment
information will be provided by the TargetRegisterInfo object, and not by
each individual TargetRegisterClass. A TargetRegisterInfo object would be
created for a specific hardware mode, so that it would be able to provide
the necessary information without having to consult TargetSubtargetInfo.

Having thought about it somewhat, the ways that come to my mind of
approaching this problem are:

* Put up with the code duplication and duplicate everything for
different register classes (current approach taken by in-tree
backends)
* Make use of a multiclass to define multiple instructions with
minimal duplication. I trialled this, but only on a RISC-V
InstrInfo.td that doesn't yet support codegen
https://reviews.llvm.org/P7637
* Use a for loop in tablegen and some !cast<> magic to do something
with a similar effect to the multiclass approach
* Extend TableGen with some sort of AST macro support that would
again, allow you to generate a second (and third..) version of each
instruction with with a different RegisterClass substituted
* Add support for implicit parameterisation. e.g. allowing def MyRC :
Predicated<Is32Bit, GPR32, GPR64>. Invasive and complex, but still an
option.
* Adding support for variable-sized register classes, as you've done
here. This definitely feels like the least invasive and is potentially
less fiddly than using multiclasses.

1. Define a mode class. It will be recognized by TableGen as having a
special meaning.

  class HwMode<list<Predicate> Ps> {
    // List of Predicate objects that determine whether this mode
    // applies. This is used for situation where the code generated by
    // TableGen needs to determine this, as opposed to TableGen itself,
    // for example in the isel pattern-matching code.
    list<Predicate> ModeDef = Ps;
  }

<snip>

2. To make a use of the mode information, provide a class to associate a
HwMode object with a particular value. This will be done by having two
lists: one with HwMode objects and another with the corresponding values.
Since TableGen does not provide a way to define class templates (in the same
sense as C++ does), the actual interface will be split in two parts. First
is the "mode selection" base class:

  class HwModeSelect<list<HwMode> Ms> {
    list<HwMode> Modes; // List of unique hw modes.
  }

This will be a "built-in" class for TableGen. It will be a base class, and
treated as "abstract" since it only contains half of the information.

<snip>

3. The class RegisterClass would get new members to hold the configurable
size/alignment information. If defined, they would take precedence over the
existing members RegTypes/Size/Alignment.

  class RegisterClass {
    ...
    ValueTypeListSelect VarRegTypes; // The names of these members
    IntSelect VarRegSize; // could likely be improved...
    IntSelect VarSpillSize; //
    IntSelect VarSpillAlignment //
  }

To fully implement the AddReg instruction, the target would then define the
register class:

  class MyRegisterClass : RegisterClass<...> {
    let VarRegTypes = ValueTypeListSelect<[Mode64, Mode128],
            [[i64, v2i32, v4i16, v8i8], // Mode64
             [i128, v2i64, v4i32, v8i16, v16i8]]>; // Mode128
    let VarRegSize = IntSelect<[Mode64, Mode128], [64, 128]>;
    let VarSpillSize = IntSelect<[Mode64, Mode128], [64, 128]>;
    let VarSpillAlignment = IntSelect<[Mode64, Mode128], [64, 128]>;
  }

  def MyIntReg: MyRegisterClass { ... };

My concern is that all of the above adds yet more complexity to what
is already (in my view) a fairly difficult part of LLVM to understand.
The definition of MyRegisterClass is not so bad though, and perhaps it
doesn't matter how it works under the hood to the average backend
writer.

What if RegisterClass contained a `list<RCInfo>`. Each RCInfo contains
RegTypes, RegSize, SpillSize, and SpillAlignment as well as a
Predicate the determines whether this individual RCInfo is the one
that should apply. To my taste this seems easier to understand than
the {Int,ValueType,ValueTypeList}Select mechanism.

def Is64Bit : Predicate<"Subtarget->is64Bit()">;
def RCInfo64 : RCInfo<Is64Bit> {
  let RegTypes = [i64, v2i32, v4i16, v8i8];
  .....
}

class MyRegisterClass : RegisterClass<...> {
  let RCInfos = [RCInfo32, RCInfo64]
}

Then for e.g. RISC-V I might end up with one GPR RegisterClass that
contains RCInfo for 32-bit and 64-bit which is used in the definition
of all instruction. I might also want to define an explicit GPR32
RegisterClass for use with instructions like ADDW where the two input
operands will always come from the 32-bit subregisters.

Alex

My concern is that all of the above adds yet more complexity to what
is already (in my view) a fairly difficult part of LLVM to understand.
The definition of MyRegisterClass is not so bad though, and perhaps it
doesn't matter how it works under the hood to the average backend
writer.

I agree with the complexity, but I would hope that more documentation, examples and explanations would clarify it.

What if RegisterClass contained a `list<RCInfo>`. Each RCInfo contains
RegTypes, RegSize, SpillSize, and SpillAlignment as well as a
Predicate the determines whether this individual RCInfo is the one
that should apply. To my taste this seems easier to understand than
the {Int,ValueType,ValueTypeList}Select mechanism.

The "select" mechanism was intended to be extendable to be able to select any object of any type based on the predefined mode. It is entirely possible to use it in a similar way to what you describe below.

def Is64Bit : Predicate<"Subtarget->is64Bit()">;
def RCInfo64 : RCInfo<Is64Bit> {
  let RegTypes = [i64, v2i32, v4i16, v8i8];
  .....
}

class MyRegisterClass : RegisterClass<...> {
  let RCInfos = [RCInfo32, RCInfo64]
}

With the RCInfo data, the new register class definition would be something like

class MyRegisterClass : RegisterClass<...> {
   let RCInfos = HwModeSelect<[Is32Bit, Is64Bit, Is128Bit],
                              [RCInfo32, RCInfo64, RCInfo128]>;
}

In either case, aggregating the info in a RCInfo class would require additional changes in TableGen so that it picks up the size/alignment/type data from the RCInfos list, instead of from individual members. This is doable and there are no technical barriers to do it. It may actually be a good idea, since it would isolate the part of the register class definition into a single object.

On a side note---there is a distinction between "mode" and "predicate": modes are distinguished by name, which is necessary because they need to be distinguishable during the run-time of TableGen. Predicates are evaluated after TableGen is done, during the run-time of the code generated by it. I didn't want to differentiate predicates based on their names, since that would go against expectations of how predicates have behaved so far.

-Krzysztof

My concern is that all of the above adds yet more complexity to what
is already (in my view) a fairly difficult part of LLVM to understand.
The definition of MyRegisterClass is not so bad though, and perhaps it
doesn't matter how it works under the hood to the average backend
writer.

I agree with the complexity, but I would hope that more documentation,
examples and explanations would clarify it.

Agreed.

What if RegisterClass contained a `list<RCInfo>`. Each RCInfo contains
RegTypes, RegSize, SpillSize, and SpillAlignment as well as a
Predicate the determines whether this individual RCInfo is the one
that should apply. To my taste this seems easier to understand than
the {Int,ValueType,ValueTypeList}Select mechanism.

The "select" mechanism was intended to be extendable to be able to select
any object of any type based on the predefined mode. It is entirely possible
to use it in a similar way to what you describe below.

<snip>

class MyRegisterClass : RegisterClass<...> {
  let RCInfos = HwModeSelect<[Is32Bit, Is64Bit, Is128Bit],
                             [RCInfo32, RCInfo64, RCInfo128]>;
}

I think what I'm really suggesting is that rather than adding this
special HwModeSelect mechanism where both HwMode and HwModeSelect are
treated specially by TableGen, we instead make the RegisterClass
itself (specifically its RCInfos field) be treated specially by
TableGen.

On a side note---there is a distinction between "mode" and "predicate":
modes are distinguished by name, which is necessary because they need to be
distinguishable during the run-time of TableGen. Predicates are evaluated
after TableGen is done, during the run-time of the code generated by it. I
didn't want to differentiate predicates based on their names, since that
would go against expectations of how predicates have behaved so far.

I think I don't fully understand the design limitations here. How
exactly are HwModes used at tblgen execution time? As I understand it,
the chosen HwMode couldn't be selected at tblgen time (after all,
that's a subtarget property that will be known only when the compiler
is invoked) but from what you say, there's a point where different
HwModes must be differentiated?

Also how will the generated output be different? e.g. right now in
MIPS for OR in MipsGenInstrInfo we have:
  { 1754, 3, 1, 4, 232,
0|(1ULL<<MCID::Commutable)|(1ULL<<MCID::Rematerializable), 0x1ULL,
nullptr, nullptr, OperandInfo25, -1 ,nullptr }, // Inst #1754 = OR
  { 1757, 3, 1, 4, 232,
0|(1ULL<<MCID::Commutable)|(1ULL<<MCID::Rematerializable), 0x1ULL,
nullptr, nullptr, OperandInfo43, -1 ,nullptr }, // Inst #1757 = OR64

Where OperandInfo25 and OperandInfo43 obviously differ in terms of
register class. As I understand it, with this proposal only one entry
would be generated and OperandInfoNN would be defined in terms of our
variable-sized register class. But for MipsGenDAGISel.inc, would
multiple patterns be implicitly generated (one for each HwMode)?

Thanks,

Alex

I think what I'm really suggesting is that rather than adding this
special HwModeSelect mechanism where both HwMode and HwModeSelect are
treated specially by TableGen, we instead make the RegisterClass
itself (specifically its RCInfos field) be treated specially by
TableGen.

The mode/select approach is general---you can make just about anything be specific to a particular hw mode. Changing TableGen to treat RCInfos specially is going to accomplish only that, nothing more.

> On a side note---there is a distinction between "mode" and "predicate":
> modes are distinguished by name, which is necessary because they need to be
> distinguishable during the run-time of TableGen. Predicates are evaluated
> after TableGen is done, during the run-time of the code generated by it. I
> didn't want to differentiate predicates based on their names, since that
> would go against expectations of how predicates have behaved so far.

I think I don't fully understand the design limitations here. How
exactly are HwModes used at tblgen execution time? As I understand it,
the chosen HwMode couldn't be selected at tblgen time (after all,
that's a subtarget property that will be known only when the compiler
is invoked) but from what you say, there's a point where different
HwModes must be differentiated?

Type inference in TableGen relies on knowing the exact set of types allowed for a particular expression. This is exactly why this HwMode is needed: if a register class MyRegClass can hold i32 in one mode and i64 in another mode, TableGen must know that the list of allowable types is either [i32] or [i64], and it cannot be [i32, i64]. Tagging each type with a mode would instead make it look like [i32:Mode32, i64:Mode64], which is equivalent to saying "Mode32 -> [i32], Mode64 -> [i64]", or "[Mode32, Mode64], [i32, i64]" with the understanding that corresponding list elements are to be taken together.

This is really only needed for selection patterns. If you just want to be able to define instructions (via def xxx : Instruction<...>), then the mode/select is not necessary.

Also how will the generated output be different? e.g. right now in
MIPS for OR in MipsGenInstrInfo we have:
  { 1754, 3, 1, 4, 232,
0|(1ULL<<MCID::Commutable)|(1ULL<<MCID::Rematerializable), 0x1ULL,
nullptr, nullptr, OperandInfo25, -1 ,nullptr }, // Inst #1754 = OR
  { 1757, 3, 1, 4, 232,
0|(1ULL<<MCID::Commutable)|(1ULL<<MCID::Rematerializable), 0x1ULL,
nullptr, nullptr, OperandInfo43, -1 ,nullptr }, // Inst #1757 = OR64

Where OperandInfo25 and OperandInfo43 obviously differ in terms of
register class. As I understand it, with this proposal only one entry
would be generated and OperandInfoNN would be defined in terms of our
variable-sized register class. But for MipsGenDAGISel.inc, would
multiple patterns be implicitly generated (one for each HwMode)?

From the point of view of instruction descriptors nothing would really change. A ShortIntegerRegisterClass would still be different from LongIntegerRegisterClass. The difference would be that when you query them from TargetRegisterInfo about spill slot sizes, etc, you could get different answers for different subtargets.

-Krzysztof

I think what I'm really suggesting is that rather than adding this
special HwModeSelect mechanism where both HwMode and HwModeSelect are
treated specially by TableGen, we instead make the RegisterClass
itself (specifically its RCInfos field) be treated specially by
TableGen.

The mode/select approach is general---you can make just about anything be
specific to a particular hw mode. Changing TableGen to treat RCInfos
specially is going to accomplish only that, nothing more.

I agree it's definitely more general, I'm just wondering whether there
are really additional areas you'd want to use mode/select. Though on
balance I think I agree - treating RCInfo specially wouldn't reduce
the TableGen changes all that much so probably isn't worth the loss in
generality.

Type inference in TableGen relies on knowing the exact set of types allowed
for a particular expression. This is exactly why this HwMode is needed: if a
register class MyRegClass can hold i32 in one mode and i64 in another mode,
TableGen must know that the list of allowable types is either [i32] or
[i64], and it cannot be [i32, i64]. Tagging each type with a mode would
instead make it look like [i32:Mode32, i64:Mode64], which is equivalent to
saying "Mode32 -> [i32], Mode64 -> [i64]", or "[Mode32, Mode64], [i32, i64]"
with the understanding that corresponding list elements are to be taken
together.

Thank you, that makes things much clearer in my mind.

I feel the duplication that will be removed by variable-sized register
classes is very valuable, and I'm certainly stumped for ideas on how
to achieve the same goal in a simpler way. I'd need to try out the
multiclass approach on an InstrInfo.td with instruction patterns for a
full comparison, but suspect this variable-sized register class
proposal will end up being much cleaner.

Alex

    [...]
    ARM SVE sounds like it will have similar
    issues: https://community.arm.com/groups/processors/blog/2016/08/22/
technology-update-the-scalable-vector-extension-sve-for-the-
armv8-a-architecture
    <https://community.arm.com/groups/processors/blog/2016/08/
22/technology-update-the-scalable-vector-extension-sve-for-
the-armv8-a-architecture>

    From glancing over the slides, it seems like SVE has dynamically
    sized (i.e. you don't know yet at compile time) registers which
    would be a step further than this.

From what Krzysztof wrote, it sounds like HVX has a similar situation
("it is possible to have a binary that runs in both modes").

Yes. The instruction and register encodings are identical between the
modes. The mode is controlled by a bit in some system configuration
register, otherwise the application does not know what mode it works in.
Vector loads and stores are indexed in a similar way as in VLA, i.e.
  vmem(r0+#2) = v0
will store vector register v0 at the address r0 + 2*VL.

In practice, HVX programs are usually compiled for one of the modes. I
think that the biggest complication in writing dual-mode programs is that
the application does not have a good way of finding out what mode it runs
in by querying the hardware (IIRC you need to run in the supervisor mode to
examine the configuration bit).

Although it is sort of hacky, one could zero a block of memory and then see
where `vmem(r0+#2) = v0` ends up (making sure that v0 is all 1's for
example).

Another thing is that HVX users generally have a specific mode in mind
when developing programs and being able to run in a different mode is not a
high priority for them. At least for now...

At least in the ARM SVE case, it seems like VLA could maybe be useful for
big.LITTLE arrangements where the big cores may have wider vector width.
That may not be possible in practice actually (would need program
restrictions on how it relies on the vector length so that when a core
migration little<->big you would have to paper over the VL change (e.g. big
core can run temporarily in a "narrow" mode maybe?); that's a bit
hand-wavey... not sure it would actually work)

-- Sean Silva

<snip>

One thing I'll note is that the RISC-V "V" (Vector) extension is likely
to work this feature very hard indeed - see the following papers/slides/
talks:

"A Case for MVPs: Mixed-Precision Vector Processors"
http://hwacha.org/papers/hwacha-mvp-prism2014.pdf

"2nd RISC-V Workshop: Vector Extension Proposal"
http://riscv.wpengine.com/wp-content/uploads/2015/06/riscv-vector-
workshop-june2015.pdf

In such a design, it's very likely that the width of the registers in the
vector processor may change between individual stripmine loops - that is,
in fact, rather the point.

This proposal only deals with situations where the register size remains constant (and is known) at compile-time. It aims at reducing duplication of instruction definitions and selection patterns.

The case with the V extension for RISC-V and the VLA for ARM is different in a way that the register size is neither a compile-time constant, nor is it known. Handling of that would require a different set of changes in the compiler.

-Krzysztof

If there are no objections, I'd like to start working on this soon...

For the AMDGPU target this implies that RC->getSize will no longer be available in the MC layer.

-Krzysztof

Another advantage of this work that hasn't been mentioned yet is it
will reduce the number of uses of isCodeGenOnly. The comment in
Target.td indicates the long-term plan is to remove the distinction
between isPseudo and isCodeGenOnly.

A closely related to variable-sized register classes is the case where
you have multiple registers with the same AsmName. This crops up in
the same kind of cases where you have multiple instructions with the
same encoding. Without a workaround, an assert is tripped in
llvm-tblgen when trying to produce a StringSwitch for
MatchRegisterName. The solution in Mips, PPC and others seems to be
involve the generation of MatchRegisterName. What has been discussed
so far with regards to HwMode and variable-size register classes
points to a solution, but I don't think it's quite enough. Options
include:

1. Only have one set of register definitions, and have the variable
sized register class determine the bit width. The problem is there are
often some instructions where I think you need to have registers
modelled as subregisters. e.g. SLLW, ADDW etc in 64-bit RISC-V. These
operate on 32-bit values and write the results sign-extended to the
target 64-bit register.

2. Define both the 64-bit registers and the 32-bit subregisters, but
make MatchRegisterName's behaviour change based on the HwMode. This
works around the fact there are multiple registers with the same
AsmName. Although I doubt this would actually cause problems, this
still isn't quite right. For an `SLLIW x1, x2, 5` I think the correct
interpretation would have x1 as a 64-bit target register and x2 as the
32-bit subregister that happens to have the same AsmName as the 64-bit
x2 register.

Have you thought about how the HwMode/variable-sized register class
proposal might interact with register AsmNames at all?

This old patch that never landed
<http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20141201/246835.html&gt;
is also I think related. Backends like Mips and PPC end up defining
RegisterOperand with a ParserMatchClass (in the Mips case, this
specified the 'parseAnyRegister' ParserMethod. Adding a
ParserMatchClass field to RegisterClass would be a minor
simplification.

Best,

Alex

I've thought about this some more. In a future world supporting
variable-sized register classes, you'd define one main set of
registers with AltNames and and AsmName. The auto-generated
MatchRegisterName and MatchRegisterAltName can be used for these. You
might define 32-bit subregisters as well as an associated GPR32 reg
class for use in instructions that need it, but these have no AltNames
or AsmName. The AsmParser can convert parsed register numbers when
desired (e.g. for SLLIW).

In my case, I would define Xnn_XLEN registers for RISC-V. These are
included in the 'GPR' variable-sized register class which makes use of
HwMode. I would also need to define these as having Xnn_32
subregisters and define a GPR32 regclass. I think something similar
could be done for MIPS.

Apologies for thinking out loud, I'm just trying to work through how
everything would fit together should we go with this approach.

Alex

I just had some time to think about it. The issue with this is that the register names are of interest to the MC layer, while the variable register size is handled on the Target level (i.e. TargetRegisterInfo).

Instructions like ADDW and SLLIW still take 64-bit registers in a 64-bit mode, but they only access the low 32 bits. In your example, "SLLIW x1, x2, 5", both x1 and x2 would be 64-bit registers, but only the low 32 bits of x2 would be used. In the assembly source, the names of the 64-bit registers would be used, and the instruction semantics (ADD vs ADDW) would be the determining factor whether the whole register, or only a part of it is used (at least this is how I read the RISC-V spec).

-Krzysztof

Everything you say makes sense, though the way this situation is
modelled by current in-tree architectures is to have the GPR32
register class contain registers that are subregisters of the GPR64
regs. See the instructions in Mips64InstrInfo.td that take a
GPR32Opnd, or in PPCInstr64Bit.td that take a gprc, It's possible that
it's not really essential to model the distinction between the 64-bit
register with AsmName 'x4' and its 32-bit subregister that also has
AsmName 'x4'. If the distinction isn't important, then obviously
solely relying on the 'HwMode' is sufficient.

Alex