wide memory accesses

Hi,

I am trying to take 16 bit memory reads and combine them to a single 32 bit read. I am having trouble to make the code simply read 32 bytes and the use the subregisters accordingly, without unnecessary copying.

I have tried two techniques, in the MachineFunction:

  1. replace the MachineOperands in the users of the data with the new register/subregister index. This yields an assert failure during VirtRegRewriter, in substPhysReg: “Invalid SubReg for physical register”, after the Two-address rewrote this:

%reg16445 = add %reg16507:hi16, %reg16510:hi16 ; 32bit:16507,16510, 16bit: 16445
prepend: %reg16445 = COPY %reg16507;
rewrite to: %reg16445 = addh_1_8 %reg16445:hi16, %reg16510:hi16

In my eyes, there should not have been a subreg ‘hi16’ for the 16445 reg - this reg is 16 bits. I would have wished that the 16507:hi16 be interpreted as the corresponding subregister, and thus generated in the COPY with a :hi16. It is all right that the 16445 is of 16 bits, this is correct, but then it is used incorrectly with a :hi16 subregister value ?? Any ideas?

I am trying to use register classes as follows: I have one register class for 32 bit registers, and another one for the subregisters of 16 bits. Intuitively, I would like to use instructions with operands of the 16 bit register class, and then be able to use 32 bit registers with a subregister index. Is this possible, or need I make a new register class where all these are included in a blended fashion: 16/32 bits, and use this instead for the instruction?

I have defined the 16 bit registers as subregs to the 32 bits. I have also defined subregclasses, as per let SubRegClasses =…

  1. Insert COPY’s, but these would not get coalesced away, so instead of saving instructions I ended up with one load and two moves…:frowning: How could I get the wide load to simply be used intelligently by COPYing to the old virtual registers?

I would appreciate any advice on how to get good code with a SIMD combinations as in the example above! What would the general layout of register classes typically look like? I imagine the best approach is the first one, where the operands simply get updated, without any COPY’s introduced.

Thanks,

Jonas

Hi,

I am trying to take 16 bit memory reads and combine them to a single 32 bit read. I am having trouble to make the code simply read 32 bytes and the use the subregisters accordingly, without unnecessary copying.

I have tried two techniques, in the MachineFunction:

1. replace the MachineOperands in the users of the data with the new register/subregister index. This yields an assert failure during VirtRegRewriter, in substPhysReg: "Invalid SubReg for physical register", after the Two-address rewrote this:

%reg16445<def> = add %reg16507:hi16, %reg16510:hi16 ; 32bit:16507,16510, 16bit: 16445
  prepend: %reg16445<def> = COPY %reg16507;
  rewrite to: %reg16445<def> = addh_1_8 %reg16445:hi16, %reg16510:hi16

In my eyes, there should not have been a subreg 'hi16' for the 16445 reg - this reg is 16 bits. I would have wished that the 16507:hi16 be interpreted as the corresponding subregister, and thus generated in the COPY with a :hi16. It is all right that the 16445 is of 16 bits, this is correct, but then it is used incorrectly with a :hi16 subregister value ?? Any ideas?

If you are doing this before the register allocator passes, you must make sure that the code preserves SSA form. That can be difficult to do when dealing with sub-registers. Other targets just emit EXTRACT_SUBREG / INSERT_SUBREG and let the coalescer deal with it.

It looks like TwoAddressInctructionPass is not ready to deal with subreg indexes either, it is creating wrong code in your example.

I recommend:

2. Insert COPY's, but these would not get coalesced away, so instead of saving instructions I ended up with one load and two moves...:frowning: How could I get the wide load to simply be used intelligently by COPYing to the old virtual registers?

You need to implement getMatchingSuperRegClass in your register info class. That will give the coalescer the needed information to join subreg copies.

/jakob

Great answer, now it works. Only, I now have the immediately following problem:

I insert the copies from the 32 bit load, like

%reg16507 = ld %r4
%reg16457 = COPY %reg16507:lo16
%reg16443 = COPY %reg16507:hi16
%reg16510 = ld %r5
%reg16458 = COPY %reg16510:lo16
%reg16444 = COPY %reg16510:hi16
%reg16468 = add %reg16457, %reg16458 ; 16457 is regclass high-part

I make two loads, and copy high and low parts respectively to 16 bit registers for use, as you recommended.
The final instruction in the list is an addition, which is however onstrained to use the high subregister (%16457). The low subregister is copied to a virtual register with the hi16-regclass, but this gets coalesced to the wrong regclass.

The problem is that when I follow the comment for getMatchingSuperRegClass() - as wittingly I can - I then simply return A, as it is a proper register class
containing all registers in B; B is the regclass containing all subregisters of A with only the high parts. So, if A has Reg32_1, B has Reg16_1_hi, and so on.

So, when I return A, the COPY gets coalesced, but the registerclass for the new interval becomes the one of 32 bits, which gives an error for the addition instruction.

I would like to ask for help in as to what this method should actually do - if I am missing it, or if not, I then wonder what register classes I should use to make this work?

I perceive this: A={32 bit regs}, B={high parts of the registers in A}, so if called with (A,B,:hi), return A.

Is there something else missing here?

Thanks,

Jonas