[PATCH] Unwanted r11 in push/pop on Cortex-M.

To recap, this is what I was trying to solve:

This C code:

int bar(int a, int b, int c, int d, int e, int f);

int foo(int a, int b, int c, int d, int e )
{
int x = 3*a;
return bar3(a,b,c,d,e,x);
}

Produced the following assembly output:

foo:
push {r11, lr}
sub sp, #8
bl bar
add sp, #8
pop {r11, pc}

The part I didn’t like is that push/pop become 4 bytes instructions because the register that we picked to “align” the stack pointer does not belong to the low register set (r0-r7).
After a bit of digging, the following patch seems to take care of it.

Basically what happens is:

  • the C function in the example doesn’t have any local to save

  • the only spilled register is LR

  • the Prolog/Epilog Inserted calls the target specific ARMFrameLowering::processFunctionBeforeCalleeSavedScan

  • at this point the we need to fix the fact that SP would not be EABI compliant because only 1 register has been spilled so far

  • inside processFunctionBeforeCalleeSavedScan, there a check: if TargetAlign == 8 && (NumGPRSpills & 1) then we align the stack by pushing one of the unspilled registers.

The problem is that this “unspilled” register list starts with r11 (for reasons I’m not clear about), so I decided to skip all the registers r8-r15.

The part I’m not sure about is: what if r4-r7 are all spilled already, is the SP going to be aligned by some other routine later on (maybe by adding a “sub sp, sp, #4”)?

Any comment?

Thanks,
Andrea

diff -aruN llvm-source/lib/Target/ARM/ARMFrameLowering.cpp llvm-source-new/lib/Target/ARM/ARMFrameLowering.cpp
— llvm-source/lib/Target/ARM/ARMFrameLowering.cpp 2013-07-29 10:50:07.000000000 -0700
+++ llvm-source-new/lib/Target/ARM/ARMFrameLowering.cpp 2013-10-20 16:33:59.905251552 -0700
@@ -1313,7 +1313,7 @@
for (unsigned i = 0, e = UnspilledCS1GPRs.size(); i != e; ++i) {
unsigned Reg = UnspilledCS1GPRs[i];
// Don’t spill high register if the function is thumb1

  • if (!AFI->isThumb1OnlyFunction() ||
  • if (!AFI->isThumbFunction() ||
    isARMLowRegister(Reg) || Reg == ARM::LR) {
    MRI.setPhysRegUsed(Reg);
    if (!MRI.isReserved(Reg))

The part I didn't like is that push/pop become 4 bytes instructions
because the register that we picked to "align" the stack pointer does not
belong to the low register set (r0-r7).
After a bit of digging, the following patch seems to take care of it.

Hi Andrea,

This seems a valid intention, but, since you're in Thumb2 mode, it
shouldn't make much of a difference, except for code size. And even for
code size, you'd save four bytes per function on the binary, which is
normally not enough to warrant a change. I also believe that this was not
accidental (though I can't confirm), since using R7 for the stack pointer
would split the register bank and reduce the ability, for instance, to use
sequential registers to shuffle GPRs into Ds and Qs and vice-versa.

That said, if register pressure is low, or if you're not using VFP or NEON,
you could use R7 as the stack frame, even on Thumb2. This could be an
option, and even enabled by default when in -Oz mode, for instance.

The problem is that this "unspilled" register list starts with r11 (for

reasons I'm not clear about)

I guess that's because it's always good to have the frame pointer saved on
each stack, to make it easier to unwind the stack while debugging, even
with high optimization levels. Someone with more contact with that part of
the code can chime in, but that seems pretty intentional, and it makes
sense, too.

The part I'm not sure about is: what if r4-r7 are all spilled already, is
the SP going to be aligned by some other routine later on (maybe by adding
a "sub sp, sp, #4")?

I think you'd have to come up with a few tests on each possible problematic
case you can think of. If you end up implementing this, I think you'll need
a good set of test cases, so that we make sure the behaviour you want
doesn't leak to unwanted cases.

cheers,
--renato