Hi,
I have a question related to pre-RA scheduling and spill of registers.
I’m writing a backend for two operands instructions set, so FPU operations result have implicit destination.
For example, the result of FMUL_A_oo is implicitly the register FA_ROUTMUL.
I have defined FPUaROUTMULRegisterClass containing only FA_ROUTMUL.
During the instruction lowering, in order to avoid frequent spill out of FA_ROUTMUL, I systematically copy the result of FMUL_A_oo to a virtual register through a COPY_TO_REGCLASS.
def : Pat<(fdiv f32:$OffsetA, f32:$OffsetB), (COPY_TO_REGCLASS (FDIV_A_oo FPUaOffsetOperand:$OffsetA,FPUaOffsetOperand:$OffsetB),FPUaOffsetClass)>;
The instruction lowering goes as expected all instances of FMUL_A_oo are followed by a COPY, freeing the usage of FPUaROUTMULRegisterClass.
These COPY are at positions 64B and 112B in the example below. So far, so good.
My problem arise in some pre-RA instruction scheduling optimization moving these COPY at later positions 104B and 112B.
The new code sequence leaves two FMUL_A_oo without COPY. So this requires 2 registers from FPUaROUTMULRegisterClass (which only includes FA_ROUTMUL).
So spill out need to be inserted where I tried to avoid it by inserting the COPY. :-/
This ‘handleMove’ is generated by LiveIntervalAnalysis, but I don’t understand why it is generated and how to avoid this counterproductive optimization.
TIA, Dominique Torette.
*** IR Dump After MachineFunction Printer ***:
Machine code for function addproddivConst: Post SSA
Function Live Ins: %FA_ROFF1 in %vreg0
0B BB#0: derived from LLVM BB %entry
Live Ins: %FA_ROFF1
16B %vreg0 = COPY %FA_ROFF1; FPUaOffsetClass:%vreg0
32B %vreg2 = MOVSUTO_A_iSLo 1077936128; FPUaOffsetClass:%vreg2
48B %vreg3 = FMUL_A_oo %vreg0, %vreg2, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg3 FPUaOffsetClass:%vreg0,%vreg2
64B %vreg4 = COPY %vreg3; FPUaOffsetClass:%vreg4 FPUaROUTMULRegisterClass:%vreg3
80B %vreg5 = MOVSUTO_A_iSLo 1056964608; FPUaOffsetClass:%vreg5
96B %vreg6 = FMUL_A_oo %vreg0, %vreg5, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg6 FPUaOffsetClass:%vreg0,%vreg5
112B %vreg7 = COPY %vreg6; FPUaOffsetClass:%vreg7 FPUaROUTMULRegisterClass:%vreg6
128B %vreg8 = FADD_A_oo %vreg4, %vreg7, %RFLAGA<imp-def,dead>; FPUaROUTADDRegisterClass:%vreg8 FPUaOffsetClass:%vreg4,%vreg7
144B %FA_ROFF0 = COPY %vreg8; FPUaROUTADDRegisterClass:%vreg8
176B MOVSUTO_SU_os_rpc %SU_ROFF0, %RPC<imp-def,dead>
192B NOP
End machine code for function addproddivConst.
handleMove 64B → 104B: %vreg4 = COPY %vreg3; FPUaOffsetClass:%vreg4 FPUaROUTMULRegisterClass:%vreg3
%vreg4: [64r,128r:0) 0@64r
→ [104r,128r:0) 0@104r
%vreg3: [48r,64r:0) 0@48r
→ [48r,104r:0) 0@48r
*** IR Dump After Machine Instruction Scheduler ***:
Machine code for function addproddivConst: Post SSA
Function Live Ins: %FA_ROFF1 in %vreg0
0B BB#0: derived from LLVM BB %entry
Live Ins: %FA_ROFF1
16B %vreg0 = COPY %FA_ROFF1; FPUaOffsetClass:%vreg0
32B %vreg2 = MOVSUTO_A_iSLo 1077936128; FPUaOffsetClass:%vreg2
48B %vreg3 = FMUL_A_oo %vreg0, %vreg2, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg3 FPUaOffsetClass:%vreg0,%vreg2
80B %vreg5 = MOVSUTO_A_iSLo 1056964608; FPUaOffsetClass:%vreg5
96B %vreg6 = FMUL_A_oo %vreg0, %vreg5, %RFLAGA<imp-def,dead>; FPUaROUTMULRegisterClass:%vreg6 FPUaOffsetClass:%vreg0,%vreg5
104B %vreg4 = COPY %vreg3; FPUaOffsetClass:%vreg4 FPUaROUTMULRegisterClass:%vreg3
112B %vreg7 = COPY %vreg6; FPUaOffsetClass:%vreg7 FPUaROUTMULRegisterClass:%vreg6
128B %vreg8 = FADD_A_oo %vreg4, %vreg7, %RFLAGA<imp-def,dead>; FPUaROUTADDRegisterClass:%vreg8 FPUaOffsetClass:%vreg4,%vreg7
144B %FA_ROFF0 = COPY %vreg8; FPUaROUTADDRegisterClass:%vreg8
176B MOVSUTO_SU_os_rpc %SU_ROFF0, %RPC<imp-def,dead>
192B NOP