pre-RA scheduling/live register analysis optimization (handle move) forcing spill of registers

Hi,

I have a question related to pre-RA scheduling and spill of registers.

I’m writing a backend for two operands instructions set, so FPU operations result have implicit destination.

For example, the result of FMUL_A_oo is implicitly the register FA_ROUTMUL.

I have defined FPUaROUTMULRegisterClass containing only FA_ROUTMUL.

During the instruction lowering, in order to avoid frequent spill out of FA_ROUTMUL, I systematically copy the result of FMUL_A_oo to a virtual register through a COPY_TO_REGCLASS.

def : Pat<(fdiv f32:$OffsetA, f32:$OffsetB), (COPY_TO_REGCLASS (FDIV_A_oo FPUaOffsetOperand:$OffsetA,FPUaOffsetOperand:$OffsetB),FPUaOffsetClass)>;

The instruction lowering goes as expected all instances of FMUL_A_oo are followed by a COPY, freeing the usage of FPUaROUTMULRegisterClass.

These COPY are at positions 64B and 112B in the example below. So far, so good.

My problem arise in some pre-RA instruction scheduling optimization moving these COPY at later positions 104B and 112B.

The new code sequence leaves two FMUL_A_oo without COPY. So this requires 2 registers from FPUaROUTMULRegisterClass (which only includes FA_ROUTMUL).

So spill out need to be inserted where I tried to avoid it by inserting the COPY. :-/

This ‘handleMove’ is generated by LiveIntervalAnalysis, but I don’t understand why it is generated and how to avoid this counterproductive optimization.

TIA, Dominique Torette.

*** IR Dump After MachineFunction Printer ***:

Machine code for function addproddivConst: Post SSA

Function Live Ins: %FA_ROFF1 in %vreg0

0B BB#0: derived from LLVM BB %entry

Live Ins: %FA_ROFF1

16B %vreg0 = COPY %FA_ROFF1; FPUaOffsetClass:%vreg0

32B %vreg2 = MOVSUTO_A_iSLo 1077936128; FPUaOffsetClass:%vreg2

48B %vreg3 = FMUL_A_oo %vreg0, %vreg2, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg3 FPUaOffsetClass:%vreg0,%vreg2

64B %vreg4 = COPY %vreg3; FPUaOffsetClass:%vreg4 FPUaROUTMULRegisterClass:%vreg3

80B %vreg5 = MOVSUTO_A_iSLo 1056964608; FPUaOffsetClass:%vreg5

96B %vreg6 = FMUL_A_oo %vreg0, %vreg5, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg6 FPUaOffsetClass:%vreg0,%vreg5

112B %vreg7 = COPY %vreg6; FPUaOffsetClass:%vreg7 FPUaROUTMULRegisterClass:%vreg6

128B %vreg8 = FADD_A_oo %vreg4, %vreg7, %RFLAGA<imp-def,dead>; FPUaROUTADDRegisterClass:%vreg8 FPUaOffsetClass:%vreg4,%vreg7

144B %FA_ROFF0 = COPY %vreg8; FPUaROUTADDRegisterClass:%vreg8

176B MOVSUTO_SU_os_rpc %SU_ROFF0, %RPC<imp-def,dead>

192B NOP

End machine code for function addproddivConst.

handleMove 64B → 104B: %vreg4 = COPY %vreg3; FPUaOffsetClass:%vreg4 FPUaROUTMULRegisterClass:%vreg3

%vreg4: [64r,128r:0) 0@64r

→ [104r,128r:0) 0@104r

%vreg3: [48r,64r:0) 0@48r

→ [48r,104r:0) 0@48r

*** IR Dump After Machine Instruction Scheduler ***:

Machine code for function addproddivConst: Post SSA

Function Live Ins: %FA_ROFF1 in %vreg0

0B BB#0: derived from LLVM BB %entry

Live Ins: %FA_ROFF1

16B %vreg0 = COPY %FA_ROFF1; FPUaOffsetClass:%vreg0

32B %vreg2 = MOVSUTO_A_iSLo 1077936128; FPUaOffsetClass:%vreg2

48B %vreg3 = FMUL_A_oo %vreg0, %vreg2, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg3 FPUaOffsetClass:%vreg0,%vreg2

80B %vreg5 = MOVSUTO_A_iSLo 1056964608; FPUaOffsetClass:%vreg5

96B %vreg6 = FMUL_A_oo %vreg0, %vreg5, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg6 FPUaOffsetClass:%vreg0,%vreg5

104B %vreg4 = COPY %vreg3; FPUaOffsetClass:%vreg4 FPUaROUTMULRegisterClass:%vreg3

112B %vreg7 = COPY %vreg6; FPUaOffsetClass:%vreg7 FPUaROUTMULRegisterClass:%vreg6

128B %vreg8 = FADD_A_oo %vreg4, %vreg7, %RFLAGA<imp-def,dead>; FPUaROUTADDRegisterClass:%vreg8 FPUaOffsetClass:%vreg4,%vreg7

144B %FA_ROFF0 = COPY %vreg8; FPUaROUTADDRegisterClass:%vreg8

176B MOVSUTO_SU_os_rpc %SU_ROFF0, %RPC<imp-def,dead>

192B NOP

End machine code for function addproddivConst.

Hi Dominque,

Not commenting on the scheduling part as I don’t know how the register pressure tracking is done there.
Unless you constrain your copies to stay next to the MUL_A (e.g., using a bundle), there is a non-zero chance that something is going to mess with them.

That said, the splitting mechanism should just insert the desired copies to avoid spilling for you.
Could you check why this is not happening? (-debug-only=regalloc and check why it spills or why the splitting failed.)

One possible problem is that you don’t have a bigger regclass that contains FPUaROUTMULRegisterClass, that could be used when relaxing the constraint with splitting.

Cheers,
-Quentin

Hi,

I have a question related to pre-RA scheduling and spill of registers.
I’m writing a backend for two operands instructions set, so FPU operations result have implicit destination.
For example, the result of FMUL_A_oo is implicitly the register FA_ROUTMUL.
I have defined FPUaROUTMULRegisterClass containing only FA_ROUTMUL.
During the instruction lowering, in order to avoid frequent spill out of FA_ROUTMUL, I systematically copy the result of FMUL_A_oo to a virtual register through a COPY_TO_REGCLASS.

def : Pat<(fdiv f32:$OffsetA, f32:$OffsetB), (COPY_TO_REGCLASS (FDIV_A_oo FPUaOffsetOperand:$OffsetA,FPUaOffsetOperand:$OffsetB),FPUaOffsetClass)>;

The instruction lowering goes as expected all instances of FMUL_A_oo are followed by a COPY, freeing the usage of FPUaROUTMULRegisterClass.
These COPY are at positions 64B and 112B in the example below. So far, so good.

My problem arise in some pre-RA instruction scheduling optimization moving these COPY at later positions 104B and 112B.
The new code sequence leaves two FMUL_A_oo without COPY. So this requires 2 registers from FPUaROUTMULRegisterClass (which only includes FA_ROUTMUL).
So spill out need to be inserted where I tried to avoid it by inserting the COPY. :-/

This ‘handleMove’ is generated by LiveIntervalAnalysis, but I don’t understand why it is generated and how to avoid this counterproductive optimization.

‘handleMove’ updates LiveIntervals when a virtual register read/write is moved. The scheduler has a heuristic called biasPhysRegCopy that tries to avoid creating any interference on physregs. You might check -debug-only=machine-scheduler to see why a copy was moved, or just step through the scheduler for a very small test case.

-Andy