how to allocate consecutive register?

Hi,

The gpu target I am working on requires the ‘value’ and ‘address’ operands of memory store instruction in consecutive register. Anybody has suggestion?

  • Ruiling

Hi Ruiling,

Make the store instruction takes only one operand, a tuple register.
You have examples of tuple registers in the ARM backend.

Cheers,
-Quentin

There are other CPUs with similar restrictions. You could look at how they handle it. An example which springs to mind is ARM A32 LDRD and STRD (load/store two consecutive registers). I think some other architectures do the same for operations which return two results, such as div/mod or NxN->2N multiply.

The difficult bit will be if there are loads with the same property. I
don't think you can easily encode the fact that one half of a register
is read and the other written.

Tim.

Seems like ARM target use reg_sequnce to form a register tuple and let the store instruction accept that register tuple.
Did I understand it correct? What if the address is 64bit while the value is 32bit? Is there any simple way? reg_sequence looks like only accept same type sub-registers.

But the real difficulty for me is I have already ran-out of lanemask bits.
I gave a brief introduction of Intel GPU register in the thread:
http://lists.llvm.org/pipermail/llvm-dev/2016-August/103953.html

And in the later trial, I hit the lanemask bits ran-out issue.
http://lists.llvm.org/pipermail/llvm-dev/2016-August/104017.html

Later I choose to define all register tuples using only Rw0~2047, and using subw0~31, I reached RegQ_SIMD8 at most!
Some piece of RegisterInfo.td are listed:

11 foreach Index = 0-31 in {
12 def subw#Index : SubRegIndex<16, !shl(Index, 4)>;
13 }

18 class IntelGPUReg<string n, bits<13> regIdx> : Register {
20 bits<1> regFile;
21
22 let Namespace = “IntelGPU”;
23 let HWEncoding{12-0} = regIdx;
24 let HWEncoding{15} = regFile;
25 }
26 foreach Index = 0-2047 in {
27 def Rw#Index : IntelGPUReg <“Rw”#Index, !shl(Index, 1)> {
28 let regFile = 0;
29 }
30 }
31
32 // b–>byte w–>word d–>dword q–>qword
33
34 def gpr_w : RegisterClass<“IntelGPU”, [i16], 16,
35 (sequence “Rw%u”, 0, 2047)> {
36 let AllocationPriority = 1;
37 }

83 def gpr_q_simd8 : RegisterTuples<[subw0, subw1, subw2, subw3, subw4, subw5, subw6, subw7,
84 subw8, subw9, subw10, subw11, subw12, subw13, subw14, subw15,
85 subw16, subw17, subw18, subw19, subw20, subw21, subw22, subw23,
86 subw24, subw25, subw26, subw27, subw28, subw29, subw30, subw31],
87 [(add (decimate gpr_w, 16)),
88 (add (decimate (shl gpr_w, 1), 16)),
89 (add (decimate (shl gpr_w, 2), 16)),
90 (add (decimate (shl gpr_w, 3), 16)),
91 (add (decimate (shl gpr_w, 4), 16)),
92 (add (decimate (shl gpr_w, 5), 16)),
93 (add (decimate (shl gpr_w, 6), 16)),

117 (add (decimate (shl gpr_w, 30), 16)),
118 (add (decimate (shl gpr_w, 31), 16))]>;

def RegQ_SIMD8 : RegisterClass<“IntelGPU”, [i64, f64], 64, (add gpr_q_simd8)>;

If I introduce larger register tuple, then I need more lanemask bits.
Maybe I need to find some other way. Or increase lanemask bits greatly.
But for now it is hard for me as I am not quite familiar with llvm register allocator. Any suggestion?
If I do not state the problem clearly, please feel free to drop a mail.

  • Ruiling