XMMs unused

Hi,

I see that, in Greedy register allocation not all the XMM registers are used(even there is a need) if there is a function call crossing the live range.
There are spills which can be avoided just by using them.

The reason I see is the CCC declares XMMs are not callee saved. This means they are caller saved, correct me if I am wrong.
Is the greedy RA intentionally not using XMMs in order to avoid saving and restoring regs at call site?

Can you provide an example?

Hi,

It can be reproduced with the following command:

clang -S -I spec17/benchspec/CPU/538.imagick_r/src/ -O3 -mavx spec17/benchspec/CPU/538.imagick_r/src/magick/feature.c -DMAGICKCORE_HDRI_ENABLE
I don't see any register above **xmm9** been used in any function.

Hi Craig,

The test case is from spec-17.

The attached is the assembly file for the function MeanShiftImage from spec17/538.imagick_r/src/magick/feature.c.
As I was saying, registers XMM10-15 are not used.

meanshift.s (27.2 KB)

At a first glance I see nothing obviously wrong with the assembly, but it is a big file. So if you have a specific part in mind, please copy into the E-Mail discussion.

I assume you are compiling for a mac or linux system? In that case none of the xmm registers are callee saved (as you already explained) so the register allocator has to spill them if they are alive across a call. So I don’t see proof yet that using xmm10-xmm15 would have helped in this function…

  • Matthias

Hi

At a first glance I see nothing obviously wrong with the assembly, but it is a big file. So if you have a specific part in mind, please copy into the E-Mail discussion.

I assume you are compiling for a mac or linux system? In that case none of the xmm registers are callee saved (as you already explained) so the register allocator has to spill them if they are alive across a call. So I don’t see proof yet that using xmm10-xmm15 would have helped in this function…

Yes, I am compiling for linux system.
So the RA will not consider assigning a scratch register to a live range crossing function call, though it may reduce spills?

Yes, I am compiling for linux system.
So the RA will not consider assigning a scratch register to a live range crossing function call, though it may reduce spills?

Well, it has to spill the register – otherwise it could be clobbered by a call.

Yes, I am compiling for linux system.
So the RA will not consider assigning a scratch register to a live range crossing function call, though it may reduce spills?
Well, it has to spill the register – otherwise it could be clobbered by a call.

May be, I haven’t conveyed it properly. What I mean was, does the RA make an analysis of spills incurred by using a scratch register and spilled/restored across the call site and by not using the scratch register at all ?

As far as I read the calling conventions for linux/mac there isn’t a single callee saved XMM register, so there is not XMM register for which this would work:

%XMMx = …
callq … # <= this may change the value of XMMx
use %XMMx

Right, this is definitely the case with the system calling conventions. LLVM can do this if you declare the called function as using a different calling convention, like attribute((preserve_all)).

– Steve