Unnecessary reload of register

Hi all,
I’m trying to figure out why llvm(llvm 3.1, can’t easilly try it with other version) inserts an unnecessary load for one register.

The following is the code before instruction selection:

printafterallOutput.txt (35.2 KB)

Hi Markus,

Hi all,
I’m trying to figure out why llvm(llvm 3.1, can’t easilly try it with other version) inserts an unnecessary load for one register.

The following is the code before instruction selection:

%call = tail call i32 @_Z7zahlIntv()
%0 = inttoptr i32 %call to i32*
%1 = load i32* %0, align 4, !tbaa !0
%arrayidx1 = getelementptr inbounds i32* %0, i32 1
%2 = load i32* %arrayidx1, align 4, !tbaa !0
%mul = mul nsw i32 %2, %1
ret i32 %mul

After instruction selection I get the following:


BB#0: derived from LLVM BB %entry
ADJCALLSTACKDOWN 8, %SP<imp-def,dead>, %SP
JSUB ga:_Z7zahlIntv, 1, <fi#-2>, 0, <fi#0>, 0, …
ADJCALLSTACKUP 8, 0, %SP<imp-def,dead>, %SP
%vreg0 = LDrid <fi#-2>, 4; mem:LD4[FixedStack-2] AkkuDRegs:%vreg0
%vreg2 = COPY %vreg0; PointerAdrRegs:%vreg2 AkkuDRegs:%vreg0
%vreg1 = LDridAddr %vreg2, 4; mem:LD4%arrayidx1 AkkuDRegs:%vreg1 PointerAdrRegs:%vreg2
%vreg4 = COPY %vreg0; PointerAdrRegs:%vreg4 AkkuDRegs:%vreg0
%vreg3 = MULINTDLDridAddr %vreg1, %vreg4, 0; mem:LD4%0 AkkuDRegs:%vreg3,%vreg1 PointerAdrRegs:%vreg4
STrid <fi#-, 0, %vreg3; mem:ST4[FixedStack-1] AkkuDRegs:%vreg3

So far everything seems to look fine. But I don’t understand why there is a %vreg4 as it has the same value as %vreg2.

I’d suggest that you check how the lowering is actually done for your target to end up with those two copies of the same value.

At the end I get the following:


JSUB ga:_Z7zahlIntv, 1, %SP, 24, %SP, 0, %AKKU1D<imp-def,dead>, %AR2<imp-def,dead>
%AKKU1D = LDrid %SP, 28; mem:LD4[FixedStack-2]
STrid %SP, 8, %AKKU1D
%AKKU1D = LDrid %SP, 8
%AR2 = LAR2d %AKKU1D
%AKKU1D = LDridAddr %AR2, 4; mem:LD4%arrayidx1
STrid %SP, 12, %AKKU1D
%AKKU1D = LDrid %SP, 8
%AR2 = LAR2d %AKKU1D
%AKKU1D = LDrid %SP, 12
%AKKU1D = MULINTDLDridAddr %AKKU1D, %AR2, 0; mem:LD4%0
STrid %SP, 0, %AKKU1D; mem:ST4[FixedStack-1]

It should be obvious that the second " %AR2 = LAR2d %AKKU1D" is unneccessary. I’ve no idea why llvm thinks it needs to fill the register with the prober value again.

The thing is that llvm does not keep track of the values. It sees three virtual registers: vreg0, vreg2, and vreg4 and the only thing special about them is that vreg0 and vreg2 are copy-related, same for vreg4 and vreg0. Other than with the register coalescer, these values are not attempted to be merged. Perhaps you could check the output of the register coalescer to see why it is not merging them (-debug-only regalloc), though I suspect that it is because PointerAdrRegs and AkkuDRegs are not coalescable.
If that is the case, we do have an optimization in the peephole optimizer that rewrites the COPYs to avoid cross register file copies. It wouldn’t catch this case[1], but it is possible to teach it.

Another option would be to check why MachineCSE does not catch this.

Anyway, the best way to avoid these redundant copies is not to emit them in the first place :).

[1] The copy rewriting works bottom-up:
A = b
c = A
=>
A = b
c = b
but what you want here is a bit different (you look for all the alternative sources):
A = b
C= b
=>
A = b
C = A
Note: Uppercase and lowercase registers are in different register file.

Thanks,
-Quentin