Simpler subreg ops in machine code IR

I am considering adding a new target independent codegen-only COPY instruction to our MachineInstr representation. It would be used to replace INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not appear as MachineInstrs any longer.

The COPY instruction handles subreg operations with less redundancy:

  %reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
  %reg1045<def> = COPY %reg1044:sub_32bit<kill>

  %reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
  %reg1045:sub_32bit<def> = COPY %reg1044<kill>

  %reg1050:ssub_0<def> = EXTRACT_SUBREG %reg1060:dsub_1<kill>, ssub_0
  %reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>

It will also replace the TargetInstrInfo::copyRegToReg hook when copying virtual registers:

  %reg1050 = COPY %reg1044<kill>

It will be lowered with a TII.copyRegToReg() call in LowerSubregsInstructionPass (which may need renaming).

Why?

1. The new function CoalescerPair::isMoveInstr() can correctly determine if a MachineInstr is a (partial) register copy with source and destination registers and subreg indices. I think that is the only place it is done correctly currently. Weird stuff like subreg indices on EXTRACT_SUBREG operands is pretty hard to figure out. The COPY instruction is simpler.

2. Many more copies are created than are eventually output - the coalescer removes most of the copies inserted by phi elimination and 2-addr pass. The TII.copyRegToReg() hook is relatively expensive because it has to do register class comparisons to pick the right copy instruction. Similarly the TII.isMoveInstr() hook is more expensive than just checking for a COPY instruction. By calling copyRegToReg() late, and by avoiding isMoveInstr() entirely, the overhead is avoided.

3. The register class arguments to TII.copyRegToReg() can be eliminated - it will only ever be called for physical registers. This means that the implementation can be simpler, and sometimes better code can be generated. A register may be allocated to a more conveniently than the register class specifies. It also means that most of the annoying getMinimalPhysRegClass() calls go away.

4. The COPY instruction does not impose register class constraints on its operands, native copies do. This is important when we implement live interval splitting in the register allocator. After splitting an interval the register class can be recomputed. Without the copy constraints a larger register class may be valid, and spilling might be avoided.

5. copyRegToReg() can insert multiple instructions, avoiding fun pseudo-instructions like VMOVQQ and VMOVQQQQ.

Why Not?

1. copyRegToReg() won't be able to use register classes to pick a copy opcode. For instance, an XMM register will no longer be copied by MOVSS or MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?

2. What else?

1. copyRegToReg() won't be able to use register classes to pick a copy opcode. For instance, an XMM register will no longer be copied by MOVSS or MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?

I haven't had time to really look into it, but have been playing
around with the idea that instead of two register classes copyRegToReg
and some of the load and store callbacks should take a MVT. This
should be enough, no? Anyway, I like anything that makes the backend
interface simpler :slight_smile:

Thanks a lot for working on this.

Cheers,

Well, MVTs are even harder to get at.

MVTs disappear when the SelectionDAG IR is translated to MachineInstr IR.
Register classes disappear with register allocation.

I am considering adding a new target independent codegen-only COPY instruction to our MachineInstr representation. It would be used to replace INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not appear as MachineInstrs any longer.

The COPY instruction handles subreg operations with less redundancy:

  %reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
  %reg1045<def> = COPY %reg1044:sub_32bit<kill>

  %reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
  %reg1045:sub_32bit<def> = COPY %reg1044<kill>

  %reg1050:ssub_0<def> = EXTRACT_SUBREG %reg1060:dsub_1<kill>, ssub_0
  %reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>

It will also replace the TargetInstrInfo::copyRegToReg hook when copying virtual registers:

  %reg1050 = COPY %reg1044<kill>

It will be lowered with a TII.copyRegToReg() call in LowerSubregsInstructionPass (which may need renaming).

Ok. An immediate concern is this will make it harder to understand machine instructions. For example, it's hard for me to read
%reg1045:sub_32bit<def> = COPY %reg1044<kill>
as updating part of reg1045. It's solvable by pretty printing COPY instructions that include register class information.

Why?

1. The new function CoalescerPair::isMoveInstr() can correctly determine if a MachineInstr is a (partial) register copy with source and destination registers and subreg indices. I think that is the only place it is done correctly currently. Weird stuff like subreg indices on EXTRACT_SUBREG operands is pretty hard to figure out. The COPY instruction is simpler.

Agreed.

2. Many more copies are created than are eventually output - the coalescer removes most of the copies inserted by phi elimination and 2-addr pass. The TII.copyRegToReg() hook is relatively expensive because it has to do register class comparisons to pick the right copy instruction. Similarly the TII.isMoveInstr() hook is more expensive than just checking for a COPY instruction. By calling copyRegToReg() late, and by avoiding isMoveInstr() entirely, the overhead is avoided.

Ok. It seems like a nice secondary benefit.

3. The register class arguments to TII.copyRegToReg() can be eliminated - it will only ever be called for physical registers. This

I am not sure I follow this. Why would the regclass arguments be eliminated?

means that the implementation can be simpler, and sometimes better code can be generated. A register may be allocated to a more conveniently than the register class specifies. It also means that most of the annoying getMinimalPhysRegClass() calls go away.

True.

4. The COPY instruction does not impose register class constraints on its operands, native copies do. This is important when we implement live interval splitting in the register allocator. After splitting an interval the register class can be recomputed. Without the copy constraints a larger register class may be valid, and spilling might be avoided.

Ok.

5. copyRegToReg() can insert multiple instructions, avoiding fun pseudo-instructions like VMOVQQ and VMOVQQQQ.

The reason for VMOVQQ etc. is actually to keep scavenger happy by pretending the whole QQ register is updated / referenced. With sub-register indices on the operands, it's possible for current copyRegToReg to expand to multiple instructions. But then the scavenger can't determine a register is partially defined.

Why Not?

1. copyRegToReg() won't be able to use register classes to pick a copy opcode. For instance, an XMM register will no longer be copied by MOVSS or MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?

I don't see this being a big problem. That said, I am not sure I totally follow this. Do you propose we eliminate register classes arguments even when virtual register copies are being created?

2. What else?

Every corner case that we can't anticipate right now. :slight_smile:

Evan

I am considering adding a new target independent codegen-only COPY instruction to our MachineInstr representation. It would be used to replace INSERT_SUBREG, EXTRACT_SUBREG, and virtual register copies after instruction selection. Selection DAG still needs {INSERT,EXTRACT}_SUBREG, but they would not appear as MachineInstrs any longer.

The COPY instruction handles subreg operations with less redundancy:

  %reg1045<def> = EXTRACT_SUBREG %reg1044<kill>, 4
  %reg1045<def> = COPY %reg1044:sub_32bit<kill>

  %reg1045<def> = INSERT_SUBREG %reg1045, %reg1044<kill>, 4
  %reg1045:sub_32bit<def> = COPY %reg1044<kill>

  %reg1050:ssub_0<def> = EXTRACT_SUBREG %reg1060:dsub_1<kill>, ssub_0
  %reg1050:ssub_0<def> = COPY %reg1060:ssub_2<kill>

It will also replace the TargetInstrInfo::copyRegToReg hook when copying virtual registers:

  %reg1050 = COPY %reg1044<kill>

It will be lowered with a TII.copyRegToReg() call in LowerSubregsInstructionPass (which may need renaming).

Ok. An immediate concern is this will make it harder to understand machine instructions. For example, it's hard for me to read
%reg1045:sub_32bit<def> = COPY %reg1044<kill>
as updating part of reg1045. It's solvable by pretty printing COPY instructions that include register class information.

It is safe to assume that a <def> operand with a subreg index represents a partial update to a subregister on any instruction. You can't have a noop subreg index, for instance TRI.getSubReg(%EAX, sub_32bit) returns 0. X86 has some weird idempotent subreg indices, TRI.getSubReg(%XMM0, sub_ss) returns %XMM0, but that is an exception.

So this is an insert_subreg:

  %reg1045:sub_32bit<def> = COPY %reg1044<kill>

and this is a normal copy:

  %reg1045<def> = COPY %reg1044<kill>

But decorating with register classes is a good idea. The question is how to do it without too much clutter. I thought about listing them after the operands:

  %reg1045:sub_32bit<def> = COPY %reg1044<kill> ; GR64 GR32_ABCD

Inline is tricky:

  (GR64)%reg1045:sub_32bit<def> = COPY (GR32_ABCD)%reg1044<kill>

[...]

3. The register class arguments to TII.copyRegToReg() can be eliminated - it will only ever be called for physical registers. This

I am not sure I follow this. Why would the regclass arguments be eliminated?

If you look in LowerSubRegs.cpp, copying physical registers goes like this:

    const TargetRegisterClass *TRCS = TRI->getPhysicalRegisterRegClass(DstReg);
    const TargetRegisterClass *TRCD = TRI->getPhysicalRegisterRegClass(SrcReg);
    bool Emitted = TII->copyRegToReg(*MBB, MI, DstReg, SrcReg, TRCD, TRCS,
                                     MI->getDebugLoc());
    (void)Emitted;
    assert(Emitted && "Subreg and Dst must be of compatible register class");

The register class arguments provide no added value in this case. If copyRegToReg wants register classes, it can call getPhysicalRegisterRegClass() itself.

When copying virtual registers, the regclass arguments are essential, but copyRegToReg won't be used for copying virtual registers any longer.

5. copyRegToReg() can insert multiple instructions, avoiding fun pseudo-instructions like VMOVQQ and VMOVQQQQ.

The reason for VMOVQQ etc. is actually to keep scavenger happy by pretending the whole QQ register is updated / referenced. With sub-register indices on the operands, it's possible for current copyRegToReg to expand to multiple instructions. But then the scavenger can't determine a register is partially defined.

Good point.

1. copyRegToReg() won't be able to use register classes to pick a copy opcode. For instance, an XMM register will no longer be copied by MOVSS or MOVSD. Given just the physical register, MOVAPS will be used. Is that a problem?

I don't see this being a big problem. That said, I am not sure I totally follow this. Do you propose we eliminate register classes arguments even when virtual register copies are being created?

No, copying of virtual registers is done with the generic COPY instruction. It would be an error to pass a virtual register to copyRegToReg.

COPY is lowered to native instructions with copyRegToReg after register allocation where all registers are physical.

A transitional implementation would look like this:

  virtual void copyRegToReg(MachineBasicBlock &MBB,
                            MachineBasicBlock::iterator MI,
                            unsigned DstReg, unsigned SrcReg,
                            DebugLoc DL) const {
    assert(TargetRegisterInfo::isPhysicalRegister(DstReg));
    assert(TargetRegisterInfo::isPhysicalRegister(SrcReg));
    const TargetRegisterInfo *TRI = MBB.getParent()->getTarget().getRegisterInfo();
    const TargetRegisterClass *TRCS = TRI->getPhysicalRegisterRegClass(DstReg);
    const TargetRegisterClass *TRCD = TRI->getPhysicalRegisterRegClass(SrcReg);
    bool Emitted = copyRegToReg(*MBB, MI, DstReg, SrcReg, TRCD, TRCS, DL));
    (void)Emitted;
    assert(Emitted && "Nobody uses the copyRegToReg return value");
  }

2. What else?

Every corner case that we can't anticipate right now. :slight_smile:

I suspect that our handling of partial redefines in not yet complete enough that we can do without the explicit use operand on INSERT_SUBREG. We'll see.