[AArch64] which vesion is should be expected in pass simple-register-coalescing?

dear, all

base on the MIR of liveintervals

# *** IR Dump After Live Interval Analysis (liveintervals) ***:
# Machine code for function foo: NoPHIs, TracksLiveness, TiedOpsRewritten

0B	bb.0.entry:
16B	  %0:fpr64 = MOVIv2i32 79, 24
32B	  %1:fpr32 = COPY %0.ssub:fpr64
48B	  $s0 = COPY %1:fpr32
64B	  RET_ReallyLR implicit killed $s0

I get two dump version of simple-register-coalescing, but I don’t know which one should be better for performance?

– 1st version

0B	bb.0.entry:
16B	  %0:fpr64 = MOVIv2i32 79, 24
48B	  $s0 = COPY %0.ssub:fpr64
64B	  RET_ReallyLR implicit killed $s0

–2nd version

0B	bb.0.entry:
48B	  dead $d0 = MOVIv2i32 79, 24, implicit-def $s0
64B	  RET_ReallyLR implicit killed $s0

more detail info in pass SIMPLE REGISTER COALESCING
– 1st version

********** SIMPLE REGISTER COALESCING **********
********** Function: foo
********** JOINING INTERVALS ***********
entry:
48B	$s0 = COPY %1:fpr32
	Considering merging %1 with $s0
	Can only merge into reserved registers.
32B	%1:fpr32 = COPY %0.ssub:fpr64
	Considering merging to FPR64 with %1 in %0:ssub
		RHS = %1 [32r,48r:0) 0@32r  weight:0.000000e+00
		LHS = %0 [16r,32r:0) 0@16r  weight:0.000000e+00
		merge %1:0@32r into %0:0@16r --> @16r
		erased:	32r	%1:fpr32 = COPY %0.ssub:fpr64
AllocationOrder(FPR64) = [ $d0 $d1 $d2 $d3 $d4 $d5 $d6 $d7 $d16 $d17 $d18 $d19 $d20 $d21 $d22 $d23 $d24 $d25 $d26 $d27 $d28 $d29 $d30 $d31 $d8 $d9 $d10 $d11 $d12 $d13 $d14 $d15 ]
		updated: 48B	$s0 = COPY %0.ssub:fpr64
	Success: %1:ssub -> %0
	Result = %0 [16r,48r:0) 0@16r  weight:0.000000e+00
48B	$s0 = COPY %0.ssub:fpr64
	Considering merging %0 with $d0
	Can only merge into reserved registers.
Remat: dead $d0 = MOVIv2i32 79, 24, implicit-def $s0
Shrink: %0 [16r,48r:0) 0@16r  weight:0.000000e+00
All defs dead: 16r	dead %0:fpr64 = MOVIv2i32 79, 24
Shrunk: %0 [16r,16d:0) 0@16r  weight:0.000000e+00
Deleting dead def 16r	dead %0:fpr64 = MOVIv2i32 79, 24
Trying to inflate 0 regs.
********** INTERVALS **********
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function foo: NoPHIs, TracksLiveness, TiedOpsRewritten

0B	bb.0.entry:
48B	  dead $d0 = MOVIv2i32 79, 24, implicit-def $s0
64B	  RET_ReallyLR implicit killed $s0

– 2st version

********** SIMPLE REGISTER COALESCING **********
********** Function: foo
********** JOINING INTERVALS ***********
entry:
48B	$s0 = COPY %1:fpr32
	Considering merging %1 with $s0
	Can only merge into reserved registers.
32B	%1:fpr32 = COPY %0.ssub:fpr64
	Considering merging to FPR64 with %1 in %0:ssub
		RHS = %1 [32r,48r:0) 0@32r  weight:0.000000e+00
		LHS = %0 [16r,32r:0) 0@16r  weight:0.000000e+00
		merge %1:0@32r into %0:0@16r --> @16r
		erased:	32r	%1:fpr32 = COPY %0.ssub:fpr64
AllocationOrder(FPR64) = [ $d0 $d1 $d2 $d3 $d4 $d5 $d6 $d7 $d16 $d17 $d18 $d19 $d20 $d21 $d22 $d23 $d24 $d25 $d26 $d27 $d28 $d29 $d30 $d31 $d8 $d9 $d10 $d11 $d12 $d13 $d14 $d15 ]
		updated: 48B	$s0 = COPY %0.ssub:fpr64
	Success: %1:ssub -> %0
	Result = %0 [16r,48r:0) 0@16r  weight:0.000000e+00
48B	$s0 = COPY %0.ssub:fpr64
	Considering merging %0 with $d0
	Can only merge into reserved registers.
Trying to inflate 0 regs.
********** INTERVALS **********
%0 [16r,48r:0) 0@16r  weight:0.000000e+00
RegMasks:
********** MACHINEINSTRS **********
# Machine code for function foo: NoPHIs, TracksLiveness, TiedOpsRewritten

0B	bb.0.entry:
16B	  %0:fpr64 = MOVIv2i32 79, 24
48B	  $s0 = COPY %0.ssub:fpr64
64B	  RET_ReallyLR implicit killed $s0

# End machine code for function foo.

Can you explain the context? In your particular example, the two versions should produce identical code after register allocation.

Thanks for your attention.

Yes, they’ll produce identical code finally as this case is very small(it only have one insn) , but I’m curious about which will be more fit the register allocation.

the 1st version seems use 2 register %0 and $s0 (also I don’t known why not allocate %0 with an physical register), and the 2nd version seems use 2 register $s0 and $d0.

I think there are a couple presentations on youtube about LLVM register allocation, for more general context. See also The LLVM Target-Independent Code Generator — LLVM 16.0.0git documentation .

%0 is a virtual register; this is before register allocation, so the location hasn’t been decided.

$s0 and $d0 are different names for the same register.

Basically, register coalescing is a pre-pass to assist register allocation: it removes dimensions from the problem of register allocation so the allocator is less likely to make bad decisions. But we don’t usually like to coalesce virtual registers with physical registers. If we need that specific register for something else, the result might be impossible, or cause extra spilling.

1 Like