64 bit special purpose registers

On Mips 32 there is traditionally a 64 bit HI/LO register for the result of multiplying two 64 bit numbers.

There are corresponding instructions to load the LO and HI parts into individual 32 registers.

On Mips with the DSP ASE (an application specific extension), there are actual 4 such pairs of
registers.

Is there a way to have special purpose 64 bit registers without actually having to tell LLVM that you have a 64 bit processor?

But it's still possible to use the individual parts of the 64 register as temporaries.

The only true 64 bit operation is multiplying two 32 bit numbers.

This can be done by declaring a register class with these registers and only using that register class as an operand in the instructions where it is legal.
You then set as sub registers what you want to represent as the hi and lo registers for those 64bit registers.

So something like this:
def lo_comp : SubRegIndex;
def hi_comp : SubRegIndex;
def R1 : Register<1>;
def R2 : Register<2>;
def R3 : Register<1>;
def R4 : Register<2>;
def D1 : RegisterWithSubRegs<1, [R1, R2], [lo_comp, hi_comp]>;

This says that D1 is a register with two components, lo and hi. When you allocate D1, you also use R1/R2.
def GPR32 : RegisterClass<..., [i32], [32], (add (sequence "R%u", 1, 4))> ...
def GPR64 : RegisterClass<..., [i64], [64], (add D1)> ...;

So in your instruction it would be something like:
def mul : Inst<(dst GPR64:$dst), (src GPR32:$src0, GPR32:$src1), ...>;

This would mean you take in two inputs and you have 64bit output. When D1 is not being used, R1/R2 will get allocated to instructions that use GPR32 register class, otherwise they will be seen as used and not get allocated.

Hope this helps,
Micah

Micah,

Do you mean we should make GPR64 available to register allocator by calling addRegisterClass?

addRegisterClass(MVT::i64, &GPR64RegClass)

If we add register class GPR64, type legalization will stop expanding i64 operations because i64 is now a legal type.
Then we will probably have to write lots of code to custom-lower unsupported 64-bit operations during legalization. Note that mips32/16 lacks support for most of the basic 64-bit instructions (add, sub, etc.).

I don’t think setting operation action by calling setOperationAction(… ,MVT::i64, Expand) would work either. Judging from the code I see in Legalize.cpp, operation legalization doesn’t seem to do much to expand unsupported i64 operations.

Micah,

Do you mean we should make GPR64 available to register allocator by calling addRegisterClass?

addRegisterClass(MVT::i64, &GPR64RegClass)

If we add register class GPR64, type legalization will stop expanding i64 operations because i64 is now a legal type.
[Villmow, Micah] You'll have to set everything that you don't support to 'Expand' and everything you do support into 'Legal'.

Then we will probably have to write lots of code to custom-lower unsupported 64-bit operations during legalization. Note that mips32/16 lacks support for most of the basic 64-bit instructions (add, sub, etc.).

I don't think setting operation action by calling setOperationAction(... ,MVT::i64, Expand) would work either. Judging from the code I see in Legalize.cpp, operation legalization doesn't seem to do much to expand unsupported i64 operations.
This can be done by declaring a register class with these registers and only using that register class as an operand in the instructions where it is legal.
You then set as sub registers what you want to represent as the hi and lo registers for those 64bit registers.

So something like this:
def lo_comp : SubRegIndex;
def hi_comp : SubRegIndex;
def R1 : Register<1>;
def R2 : Register<2>;
def R3 : Register<1>;
def R4 : Register<2>;
def D1 : RegisterWithSubRegs<1, [R1, R2], [lo_comp, hi_comp]>;

This says that D1 is a register with two components, lo and hi. When you allocate D1, you also use R1/R2.
def GPR32 : RegisterClass<..., [i32], [32], (add (sequence "R%u", 1, 4))> ...
def GPR64 : RegisterClass<..., [i64], [64], (add D1)> ...;

So in your instruction it would be something like:
def mul : Inst<(dst GPR64:$dst), (src GPR32:$src0, GPR32:$src1), ...>;

This would mean you take in two inputs and you have 64bit output. When D1 is not being used, R1/R2 will get allocated to instructions that use GPR32 register class, otherwise they will be seen as used and not get allocated.

Hope this helps,
Micah

Forgot one last part, set everything to expand and then fill in the holes in Legalize.

I really don’t see another way of doing it.

Micah

Hi Akira, Micah,

Micah,

Do you mean we should make GPR64 available to register allocator by calling addRegisterClass?

addRegisterClass(MVT::i64, &GPR64RegClass)

I have a related question to this thread. Does the RA use target lowering information?
Because if it doesn’t, you don’t need to register your i64 reg class.

Ivan

Here is the problem explained more.

Normally there is a 64 bit register that is the result of certain multiply and divide instructions.
It's really 2 32 bit registers.

This is like HI[0]/Lo[0]

In fact there are four such pairs, only the 0th pair available to basic multiply and divide.

But DSP instructions have access to 4 , Hi[i],Lo[i], i=0..3

We want the register allocator to allocate them for us but also we need to have them paired,
i.e. Hi[1],Lo[1]

So in principle if you have a 64 bit register you can have two 32 bit registers inside.

If you tell the register allocator that you have 64 bit registers, then it wants to assume that 64 bit
is a legal operand type and then llvm assumes that you have native instructions for all the 64 bit
types, and we don't have that in mips32, for example. So you would have to lower them all yourself.

Here is the problem explained more.

Normally there is a 64 bit register that is the result of certain multiply
and divide instructions.
It's really 2 32 bit registers.

This is like HI[0]/Lo[0]

In fact there are four such pairs, only the 0th pair available to basic
multiply and divide.

But DSP instructions have access to 4 , Hi[i],Lo[i], i=0..3

We want the register allocator to allocate them for us but also we need to
have them paired,
i.e. Hi[1],Lo[1]

Sounds exactly the same as ARM to support double registers (pairing of
2 float registers.). You may look into ARM for details.

So in principle if you have a 64 bit register you can have two 32 bit
registers inside.

If you tell the register allocator that you have 64 bit registers, then it
wants to assume that 64 bit
is a legal operand type and then llvm assumes that you have native
instructions for all the 64 bit
types, and we don't have that in mips32, for example. So you would have to
lower them all yourself.

You can explicitly specify all of them as 'Expand' so LLVM will expand
64-bit operation into 32-bit one.

- Michael

Here is the problem explained more.

Normally there is a 64 bit register that is the result of certain multiply
and divide instructions.
It’s really 2 32 bit registers.

This is like HI[0]/Lo[0]

In fact there are four such pairs, only the 0th pair available to basic
multiply and divide.

But DSP instructions have access to 4 , Hi[i],Lo[i], i=0…3

We want the register allocator to allocate them for us but also we need to
have them paired,
i.e. Hi[1],Lo[1]

Sounds exactly the same as ARM to support double registers (pairing of
2 float registers.). You may look into ARM for details.

So in principle if you have a 64 bit register you can have two 32 bit
registers inside.

If you tell the register allocator that you have 64 bit registers, then it
wants to assume that 64 bit
is a legal operand type and then llvm assumes that you have native
instructions for all the 64 bit
types, and we don’t have that in mips32, for example. So you would have to
lower them all yourself.

You can explicitly specify all of them as ‘Expand’ so LLVM will expand
64-bit operation into 32-bit one.

I am suspecting the code in SelectionDAGLegalize won’t expand 64-bit operations to 32-bit ones. For example, I see this code in SelectionDAGLegalize::ExpandNode (near line 3090):

case ISD::SUB: {
EVT VT = Node->getValueType(0);
assert(TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&
TLI.isOperationLegalOrCustom(ISD::XOR, VT) &&
“Don’t know how to expand this subtraction!”);

If we mark the action of SUB, ADD and XOR as ‘Expand’, the code will assert.

So you have to either make i64 illegal or mark the nodes as ‘Custom’ and write code to lower them.

If no i64 reg classes are registered, then type-legalization will expand a 32b x 32b = 64b multiply node into a 32-bit mult node with two i32 results (for example, SMUL_LOHI). The problem is that there isn’t an easy way to have RA assign two consecutive hi/lo registers to the two i32 registers, once the 64-bit result is split into two 32-bit results.

Is there a constraint I can use (something like register hints) to force RA to allocate consecutive registers?

If no i64 reg classes are registered, then type-legalization will expand a 32b x 32b = 64b multiply node into a 32-bit mult node with two i32 results (for example, SMUL_LOHI). The problem is that there isn’t an easy way to have RA assign two consecutive hi/lo registers to the two i32 registers, once the 64-bit result is split into two 32-bit results.

Is there a constraint I can use (something like register hints) to force RA to allocate consecutive registers?

No. RA has no such constraints. I once hacked similar issue (i.e. some data type has very limited support or special usage at processor level) by registering register class after computeRegisterProperties(). This way you won’t tell SelectionDAG i64 is a legal type but only an available type at machine level. Ofc, you need very special code emitter to emit SMUL_LOHI into your MUL followed by subreg extractions. Anyway,it once worked for me but may not be a desired approach.

  • michael

Here is the problem explained more.

Normally there is a 64 bit register that is the result of certain multiply
and divide instructions.
It's really 2 32 bit registers.

This is like HI[0]/Lo[0]

In fact there are four such pairs, only the 0th pair available to basic
multiply and divide.

But DSP instructions have access to 4 , Hi[i],Lo[i], i=0..3

We want the register allocator to allocate them for us but also we need to
have them paired,
i.e. Hi[1],Lo[1]

Sounds exactly the same as ARM to support double registers (pairing of
2 float registers.). You may look into ARM for details.

So in principle if you have a 64 bit register you can have two 32 bit
registers inside.

If you tell the register allocator that you have 64 bit registers, then it
wants to assume that 64 bit
is a legal operand type and then llvm assumes that you have native
instructions for all the 64 bit
types, and we don't have that in mips32, for example. So you would have to
lower them all yourself.

You can explicitly specify all of them as 'Expand' so LLVM will expand
64-bit operation into 32-bit one.

I am suspecting the code in SelectionDAGLegalize won't expand 64-bit operations to 32-bit ones. For example, I see this code in SelectionDAGLegalize::ExpandNode (near line 3090):

  case ISD::SUB: {
    EVT VT = Node->getValueType(0);
    assert(TLI.isOperationLegalOrCustom(ISD::ADD, VT) &&
           TLI.isOperationLegalOrCustom(ISD::XOR, VT) &&
           "Don't know how to expand this subtraction!");

If we mark the action of SUB, ADD and XOR as 'Expand', the code will assert.

So you have to either make i64 illegal or mark the nodes as 'Custom' and write code to lower them.
[Villmow, Micah] Or an alternate solution is to implement subtract in a sequence of smaller but legal subtractions.

- Michael

Normally there is a 64 bit register that is the result of certain multiply and divide instructions.
It’s really 2 32 bit registers.

This is like HI[0]/Lo[0]

In fact there are four such pairs, only the 0th pair available to basic multiply and divide.

But DSP instructions have access to 4 , Hi[i],Lo[i], i=0…3

We want the register allocator to allocate them for us but also we need to have them paired,
i.e. Hi[1],Lo[1]

So in principle if you have a 64 bit register you can have two 32 bit registers inside.

Look at the Hexagon backend which also has 64 bit register pairs and only successive even-odd 32 bit registers can be paired together.

If you tell the register allocator that you have 64 bit registers, then it wants to assume that 64 bit
is a legal operand type and then llvm assumes that you have native instructions for all the 64 bit
types, and we don’t have that in mips32, for example. So you would have to lower them all yourself.

Even in Hexagon, not all operations are legal on i64. So, like Michael said, you should “setOperationAction” for those operations (and i64) to “Expand”.

However, I am guessing, in your case, you have more operations that are illegal than ones that are legal for i64 types ?

Pranav

Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, hosted by The Linux Foundation