Register Allocation and Scheduling Issues

Hello,

I have defined 8 registers in registerinfo.td file in the following order:
R_0, R_1, R_2, R_3, R_4, R_5, R_6, R_7

But the generated assembly code only uses 2 registers. How to enable it to use all 8? Also can i control the ordering like after R_0 can i use R_5 without changes in registerinfo.td?

What changes are required here? either in scheduling or register allocation phases?

P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c]
P_2048B_VADD R_0, R_1, R_0
P_2048B_STORE_DWORD Pword ptr [rip + a], R_0
P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+2048]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+2048]
P_2048B_VADD R_0, R_1, R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+2048], R_0
P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+4096]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+4096]
P_2048B_VADD R_0, R_1, R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+4096], R_0
P_2048B_LOAD_DWORD R_0, Pword ptr [rip + b+6144]
P_2048B_LOAD_DWORD R_1, Pword ptr [rip + c+6144]
P_2048B_VADD R_0, R_1, R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+6144], R_0

Please help. I am stuck here.

Thank You

What are your thoughts on what might be the issue? Have you considered
the advantages and disadvantages of using multiple registers for the
code you're testing?

Cheers.

Tim.

Actually my hardware is designed such that there are 32 lanes. each has 8 registers. the assembly code should be emitted keeping this fact.
I defined the registers as follows in .td in the following order;
L_0_R_0,
L_0_R_1,
L_0_R_2,
L_0_R_3,
L_0_R_4,
L_0_R_5,
L_0_R_6,
L_0_R_7,

L_1_R_0,
L_1_R_1,
L_1_R_2,
L_1_R_3,
L_1_R_4,
L_1_R_5,
L_1_R_6,
L_1_R_7,

L_31_R_0,
L_31_R_1,
L_31_R_2,
L_31_R_3,
L_31_R_4,
L_31_R_5,
L_31_R_6,
L_31_R_7,

Now when i assemble the vec sum code by my implemented instructions and default x86 scheduling & register allocation. it is only using L_0. But it should use all the lanes? how to achieve this.

Something as follows:

currently it is emitting as follows:

P_2048B_LOAD_DWORD L_0_R_0, Pword ptr [rip + b]
P_2048B_LOAD_DWORD L_0_R_1, Pword ptr [rip + c]
P_2048B_VADD L_0_R_0, L_0_R_1, L_0_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a], L_0_R_0
P_2048B_LOAD_DWORD L_0_R_0, Pword ptr [rip + b+2048]
P_2048B_LOAD_DWORD L_0_R_1, Pword ptr [rip + c+2048]
P_2048B_VADD L_0_R_0, L_0_R_1, L_0_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+2048], L_0_R_0
P_2048B_LOAD_DWORD L_0_R_0, Pword ptr [rip + b+4096]
P_2048B_LOAD_DWORD L_0_R_1, Pword ptr [rip + c+4096]
P_2048B_VADD L_0_R_0, L_0_R_1, L_0_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+4096], L_0_R_0
P_2048B_LOAD_DWORD L_0_R_0, Pword ptr [rip + b+6144]
P_2048B_LOAD_DWORD L_0_R_1, Pword ptr [rip + c+6144]
P_2048B_VADD L_0_R_0, L_0_R_1, L_0_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+6144], L_0_R_0

It should emit as follows:

P_2048B_LOAD_DWORD L_0_R_0, Pword ptr [rip + b]
P_2048B_LOAD_DWORD L_0_R_1, Pword ptr [rip + c]
P_2048B_VADD L_0_R_0, L_0_R_1, L_0_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a], L_0_R_0
P_2048B_LOAD_DWORD L_1_R_0, Pword ptr [rip + b+2048]
P_2048B_LOAD_DWORD L_1_R_1, Pword ptr [rip + c+2048]
P_2048B_VADD L_1_R_0, L_1_R_1, L_1_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+2048], L_1_R_0
P_2048B_LOAD_DWORD L_2_R_0, Pword ptr [rip + b+4096]
P_2048B_LOAD_DWORD L_2_R_1, Pword ptr [rip + c+4096]
P_2048B_VADD L_2_R_0, L_2_R_1, L_2_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+4096], L_2_R_0
P_2048B_LOAD_DWORD L_3_R_0, Pword ptr [rip + b+6144]
P_2048B_LOAD_DWORD L_3_R_1, Pword ptr [rip + c+6144]
P_2048B_VADD L_3_R_0, L_3_R_1, L_3_R_0
P_2048B_STORE_DWORD Pword ptr [rip + a+6144], L_3_R_0

does it involve changing the register live intervals? or scheduling?

please help. i am trying hard but unable to solve this.