Possible missed optimization?

Hello, I’ve noticed the following issue while testing some codegen tests, i would like to know if it’s a missed optimization or i missed something out here. This is for an out of tree backend im writing. I managed to reduce it to the following C function:

void foo(int *a) // int here is 16bits
{
*a &= 0xFF;
}

This is the code before regalloc:
Live Ins: %R25R24
%vreg0 = COPY %R25R24; DREGS:%vreg0
%vreg2 = COPY %vreg0; PTRREGS:%vreg2 DREGS:%vreg0
%vreg1 = LDWRd %vreg2; mem:LD2%a(tbaa=!“int”) DLDREGS:%vreg1 PTRREGS:%vreg2
%vreg3 = ANDIWRdK %vreg1, 255; DLDREGS:%vreg3,%vreg1
%vreg5 = COPY %vreg0; PTRREGS:%vreg5 DREGS:%vreg0
STWRr %vreg5, %vreg3; mem:ST2%a(tbaa=!“int”) PTRREGS:%vreg5 DLDREGS:%vreg3
RET

From above, the 3rd COPY instruction is redundant since it does exactly the same thing as the second COPY instruction, so the stw (store) instr should take %vreg2 instead of %vreg5. After regalloc we get this code:
Live Ins: %R25R24
%R27R26 = COPY %R25R24
%R19R18 = LDWRd %R27R26; mem:LD2%a(tbaa=!“int”) // <---------- why is R27:R26 killed?
%R19R18 = ANDIWRdK %R19R18, 255
%R27R26 = COPY %R25R24 // <------------------ why is this emitted?
STWRr %R27R26, %R19R18; mem:ST2%a(tbaa=!“int”)
RET

The last copy instruction should be removed as pointed out above, but since R27R26 is killed in the load instruction it has to be emitted. About the insane amount of regclasses there, the load/store and the andi instructions take subsets of regs from the main register class, they cant work with all registers, that’s why STW and LDW needs R27R26 since it belongs to the ptr reg class and not R25R24 where the “a” ptr is. As a test i made the load/store instructions work with the DREGS which is the main class and the problem was solved, but of course this is illegal code :slight_smile:

Thanks

The coalescer cannot join copies with disjoint register classes.

You need to make sure that there is a register class representing the intersection. Currently, such register classes cannot be inferred automatically by TableGen.

/jakob

Hello Jakob, thanks for the reply. The three regclasses involved here are all subsets from each other and aren’t disjoint. These are the basic descriptions of the regclasses involved to show what i mean:

DREGS: R31R30, R29R28 down to R1R0 (16 regs)
DLDREGS: R31R30, R29R28 down to R17R16 (8 regs)
PTRREGS: R31R30, R29R28, R27R26 (3 regs)

All classes intersect each other giving as a result the smaller class: DREGSxDLDREGS=DLDREGS / DLDREGSxPTRREGS=PTRREGS, etc. That’s why i think the coalescer should work since the regclasses overlap completely.

Cross class coalescing also has some heuristics to prevent it from creating very small register classes. It is possible that it doesn’t want to use PTRREGS because it only has 3 registers.

You can look at the output of -debug-only=regcoalescing to see what is going on.

/jakob

You can look at the output of -debug-only=regcoalescing to see what is going on.

This is the debug output i’ve got, some information is a bit cryptic for me so next is what i understood:

********** SIMPLE REGISTER COALESCING **********
********** Function: foo
********** JOINING INTERVALS ***********
entry:
16L %vreg0 = COPY %R25R24; DREGS:%vreg0
Considering merging %vreg0 with physreg %R25R24
RHS = %vreg0 = [16d,96d:0) 0@16d
LHS = %R25R24,inf = [0L,16d:0) 0@0L-phidef
updated: 96L %vreg8 = COPY %R25R24; PTRREGS:%vreg8
updated: 32L %vreg5 = COPY %R25R24; PTRREGS:%vreg5
Joined. Result = %R25R24,inf = [0L,96d:0) 0@0L-phidef
32L %vreg5 = COPY %R25R24; PTRREGS:%vreg5
Not coalescable.
64L %vreg6 = COPY %vreg4; DLDREGS:%vreg6,%vreg4
Considering merging %vreg4 with %vreg6 to DLDREGS
RHS = %vreg4 = [48d,64d:0) 0@48d
LHS = %vreg6 = [64d,80d:1)[80d,112d:0) 0@80d 1@64d
updated: 48L %vreg6 = LDWRd %vreg5; mem:LD2%a(tbaa=!“int”) DLDREGS:%vreg6 PTRREGS:%vreg5
Joined. Result = %vreg6 = [48d,80d:1)[80d,112d:0) 0@80d 1@48d
96L %vreg8 = COPY %R25R24; PTRREGS:%vreg8
Not coalescable.
********** INTERVALS POST JOINING **********
%R24,inf = [0L,16d:0) 0@0L-phidef
%vreg6 = [48d,80d:1)[80d,112d:0) 0@80d 1@48d
%R25R24,inf = [0L,96d:0) 0@0L-phidef
%vreg8 = [96d,112d:0) 0@96d
%vreg5 = [32d,48d:0) 0@32d
%R25,inf = [0L,16d:0) 0@0L-phidef
********** INTERVALS **********
%R24,inf = [0L,16d:0) 0@0L-phidef
%vreg6 = [48d,80d:1)[80d,112d:0) 0@80d 1@48d
%R25R24,inf = [0L,96d:0) 0@0L-phidef
%vreg8 = [96d,112d:0) 0@96d
%vreg5 = [32d,48d:0) 0@32d
%R25,inf = [0L,16d:0) 0@0L-phidef
********** MACHINEINSTRS **********

Machine code for function foo:

Function Live Ins: %R25R24 in reg%2147483648

0L BB#0: derived from LLVM BB %entry
Live Ins: %R25R24
32L %vreg5 = COPY %R25R24; PTRREGS:%vreg5
48L %vreg6 = LDWRd %vreg5; mem:LD2%a(tbaa=!“int”) DLDREGS:%vreg6 PTRREGS:%vreg5
80L %vreg6 = ANDIWRdK %vreg6, 255; DLDREGS:%vreg6
96L %vreg8 = COPY %R25R24; PTRREGS:%vreg8
112L STWRr %vreg8, %vreg6; mem:ST2%a(tbaa=!“int”) PTRREGS:%vreg8 DLDREGS:%vreg6
128L RET

What i see is the first copy getting coalesced so vreg0 goes away, and when it tries and succeeds to coalesce vreg4 with vreg6 it kills vreg5 dont know why. Because of the first coalesce R25R24 gets reloaded again and in the last COPY it says it cant get coalesced i guess because it’s trying to coalesce a phys reg, if it was with vreg5 then it would coalesce it.

Cross class coalescing also has some heuristics to prevent it from creating very small register classes
I’ve seen isWinToJoinCrossClass in SimpleRegisterCoalescing.cpp that does exactly what you mean here, it has a check that says:

// This heuristics is good enough in practice, but it’s obviously not right.
// 4 is a magic number that works well enough for x86, ARM, etc.

However this piece of code is not getting executed, so in this specific case the problem seems to be in another part? Although i would like to say if this can be sort of parametrized, because for small cpus, register classes aren’t as big as x86 or other beasts, so 4 which is the number used in this specific heuristic seems high for these cpus.

I am planning on disabling physreg coalescing in the near future (-disable-physical-join). That will change the behavior in this special case. Otherwise, I don’t know how to help you. Coalescing is hard with very constrained register classes.

/jakob

Ahh i see, I’ll wait for that change to happen and see how it goes and how it affects other cases. I remember you commented that you were going to take a look into the coalescer from a bug report i filled back in september. Indeed adding -disable-physical-join produced the expected code. Thanks for the help Jakob.