How to spill a register in fast RA if the spilling will create another temp register

Hello,

I met this problem when I working for FP16 emulation on SSE2.

We don’t have a single instruction on SSE2 to store a 16 bit element in XMM register into memory. Instead, we have to use a 32 bit GRP as a temp register to help with it, e.g.,

movss %xmm0, %eax
movw %ax, (%esp)

However, in fast RA, after we spilling the FR16 type, we don’t have chance to allocate physical register for the new created one. For example, we have a simple IR:

define dso_local half @foo(ptr %0) {
  %2 = load half, ptr %0
  %3 = call half @foo(ptr %0)
  ret half %2
}

When compiled with D107082 and llc -O0 -mtriple=i386 -mattr=sse2 < foo.ll -debug, we will get such MIR before fast RA:

Allocating bb.0 (%ir-block.1):
  %0:gr32 = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
  %4:vr128 = IMPLICIT_DEF
  %3:vr128 = COPY %4:vr128
  %3:vr128 = PINSRWrm %3:vr128(tied-def 0), %0:gr32, 1, $noreg, 0, $noreg, 0 :: (load (s16) from %ir.0)
  %5:fr16 = COPY %3:vr128
  %1:fr16 = COPY %5:fr16
  ADJCALLSTACKDOWN32 4, 0, 0, implicit-def $esp, implicit-def $eflags, implicit-def $ssp, implicit $esp, implicit $ssp
  MOV32mr $esp, 1, $noreg, 0, $noreg, %0:gr32 :: (store (s32) into stack)
  CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $xmm0
  ADJCALLSTACKUP32 4, 0, implicit-def $esp, implicit-def $eflags, implicit-def $ssp, implicit $esp, implicit $ssp
  %2:fr16 = COPY $xmm0
  $xmm0 = COPY %1:fr16
  RET32 implicit $xmm0

Then the %1:fr16 is spilling:

>> %1:fr16 = COPY %5:fr16
Regs: AL=%0 HAX=%0
Search register for %1 in class FR16 with hint $noreg
        Register: $xmm0 Cost: 0 BestCost: 4294967295
Assigning %1 to $xmm0
Spill Reason: LO: 0 RL: 1
Spilling %1 in $xmm0 to stack slot #0
Freeing $xmm0: %1
Search register for %5 in class FR16 with hint $xmm0
        Preferred Register 1: $xmm0
Assigning %5 to $xmm0

Finally, we got all VR allocated but the temp one:

Begin Regs:
Loading live registers at begin of block.
bb.0 (%ir-block.1):
  renamable $eax = MOV32rm %fixed-stack.0, 1, $noreg, 0, $noreg :: (load (s32) from %fixed-stack.0)
  renamable $xmm0 = IMPLICIT_DEF
  renamable $xmm0 = PINSRWrm renamable $xmm0(tied-def 0), renamable $eax, 1, $noreg, 0, $noreg, 0 :: (load (s16) from %ir.0)
  %6:gr32_nosp = MOVPDI2DIrr $xmm0
  MOV16mr %stack.0, 1, $noreg, 0, $noreg, %6.sub_16bit:gr32_nosp :: (store (s16) into %stack.0)
  ADJCALLSTACKDOWN32 4, 0, 0, implicit-def $esp, implicit-def dead $eflags, implicit-def $ssp, implicit $esp, implicit $ssp
  MOV32mr $esp, 1, $noreg, 0, $noreg, killed renamable $eax :: (store (s32) into stack)
  CALLpcrel32 @foo, <regmask $bh $bl $bp $bph $bpl $bx $di $dih $dil $ebp $ebx $edi $esi $hbp $hbx $hdi $hsi $si $sih $sil>, implicit $esp, implicit $ssp, implicit-def $xmm0
  ADJCALLSTACKUP32 4, 0, implicit-def $esp, implicit-def dead $eflags, implicit-def $ssp, implicit $esp, implicit $ssp
  dead renamable $xmm1 = COPY $xmm0
  $xmm0 = PINSRWrm undef $xmm0(tied-def 0), %stack.0, 1, $noreg, 0, $noreg, 0 :: (load (s16) from %stack.0)
  RET32 implicit killed $xmm0
Remaining virtual register operands
UNREACHABLE executed at /export/users2/pengfeiw/llvm-project/llvm/lib/CodeGen/MachineRegisterInfo.cpp:207!

Notice the %6:gr32_nosp is leaving as VR, thus we got an error of “Remaining virtual register operands”.

I’m not sure if other targets have the similar case here. Do we have predecessors? Thank you for any points!

In this particular case, can’t you just allocate a 32-bit spill slot, and use movss to spill/restore?

General solutions here get complicated, if you actually have to use an extra register. The register allocator can’t really handle spills that clobber other registers, so you need to work around it some other way. For example, do register allocation in multiple stages (RegAlloc: Allow targets to split register allocation · llvm/llvm-project@eebe841 · GitHub). Or use a pseudo-instruction to spill values, and scavenge a register after register allocation.

I have the same problem when spilling AMX register in fast RA. Spilling AMX register would create another GRP64 virtual register. It works for Greedy RA, because the new virtual register will be enqueue and be allocated in Greedy RA. For fast RA, since the spill instruction is following the def instruction which is currently scanned, we miss the chance to allocate the new virtual register. I have a patch (⚙ D125602 [X86][AMX][fastalloc] Allocate tile register separately.) to support allocate for specific register class in fast RA, so that I can allocate AMX register in a separate pass, can the following RA pass would allocate the new virtual register that is created when spilling AMX register.

Thanks a lot for your suggestions! @efriedma-quic
Yeah, using 32-bit slot seems more easy and it’s also efficent that pinsrw/vpextrw. Updated accordingly.