Issue with shrink wrapping

Hello,

So, I have this testcase:

void f(int n, int x) {
if (n < 0)
return;

int a[n];

for (int i = 0; i < n; i++)
a[i] = x[n - i - 1];

for (int i = 0; i < n; i++)
x[i] = a[i] + 1;
}

that, compiled with -O1/-Os for AArch64 and X86, generates machine code, which
fails to properly restore the stack pointer upon function return.

The testcase allocates a VLA, thus clang generates calls to llvm.stacksave and
llvm.stackrestore. The latter call is lowered to mov sp, x2, however this
move does not affect decisions made by the shrink wrapping pass, as the
instruction does not use or define a callee-saved register.

The end effect is that placement of register save/restore code is such that
along a certain path, the callee-saved registers and the stack pointer are
restored, but then the stack pointer is overwritten with an incorrect value.

I’ve “fixed” this by modifying ShrinkWrap::useOrDefCSROrFI to explicitly check
for the stack pointer register (using
TLI.getStackPointerRegisterToSaveRestore) and also to ignore tall call
instructions (isCall() && isReturn()), since they implictly use SP (for
AArch{32,64} at least).

Does this look correct? Are there alternatives?

Shouldn’t ShrinkWrap::useOrDefCSROrFI also check whether or not
MachineInstr::Frame{Setup,Destroy} flags are set?

In that case, I suppose an alternative slution would be to ensure that
llvm.{stacksave,stackrestore} are ultimately lowered to MachineInstr`s, which
have these flags set, perhaps by custom lowering of
ISD::{STACKSAVE,STACKRESTORE} to pseudo-instructions? That’d involve changes to
each backend, where shrink wrapping is enabled, although it looks least hackish.

Comments?

Thanks and best regards,
Momchil Velikov

Hello Momchil,

(CC’ing more people that could correct me if I’m wrong)

Thanks for looking into this. More answers below:

Hello,

So, I have this testcase:

void f(int n, int x) {
if (n < 0)
return;

int a[n];

for (int i = 0; i < n; i++)
a[i] = x[n - i - 1];

for (int i = 0; i < n; i++)
x[i] = a[i] + 1;
}

that, compiled with -O1/-Os for AArch64 and X86, generates machine code, which
fails to properly restore the stack pointer upon function return.

The testcase allocates a VLA, thus clang generates calls to llvm.stacksave and
llvm.stackrestore. The latter call is lowered to mov sp, x2, however this
move does not affect decisions made by the shrink wrapping pass, as the
instruction does not use or define a callee-saved register.

I was afraid that this would happen at some point, but never came across a miscompile due to this. I guess the assumption here was that all the code using the stack pointer is generated after/during PEI, so tracking FrameIndex operands would be sufficient.

The end effect is that placement of register save/restore code is such that
along a certain path, the callee-saved registers and the stack pointer are
restored, but then the stack pointer is overwritten with an incorrect value.

I’ve “fixed” this by modifying ShrinkWrap::useOrDefCSROrFI to explicitly check
for the stack pointer register (using
TLI.getStackPointerRegisterToSaveRestore)

This part sounds ok to me. Can you put up a patch please?

and also to ignore tall call
instructions (isCall() && isReturn()), since they implictly use SP (for
AArch{32,64} at least).

Calls are handled through the regmask check, and return blocks should be handled by the common-dominator call. Did you run into any issues with this? Marking blocks containing isReturn instructions as used will basically make shrink-wrapping to fail all the time.

Does this look correct? Are there alternatives?

Another thing we could do is to add SP (through TLI.getStackPointerRegisterToSaveRestore) to the CurrentCSRs set (and probably rename the set).

Shouldn’t ShrinkWrap::useOrDefCSROrFI also check whether or not
MachineInstr::Frame{Setup,Destroy} flags are set?

It’s not perfectly clear what the flag is used for and whether all targets use it for the same purpose. There has been some discussion here: http://lists.llvm.org/pipermail/llvm-dev/2017-February/110281.html, but I wouldn’t rely on that flag, especially if we want to be able to do some more advanced shrink-wrapping, we are better knowing why the instruction is considered as a “FrameSetup/Destroy” instruction.

In that case, I suppose an alternative slution would be to ensure that
llvm.{stacksave,stackrestore} are ultimately lowered to MachineInstr`s, which
have these flags set, perhaps by custom lowering of
ISD::{STACKSAVE,STACKRESTORE} to pseudo-instructions? That’d involve changes to
each backend, where shrink wrapping is enabled, although it looks least hackish.

I think the SP approach is correct, but yes, this would work as well.

Thanks,

I've "fixed" this by modifying `ShrinkWrap::useOrDefCSROrFI` to explicitly
check
for the stack pointer register (using
`TLI.getStackPointerRegisterToSaveRestore`)

This part sounds ok to me. Can you put up a patch please?

​Yes, working on it...​

and also to ignore tall call
instructions (`isCall() && isReturn()`), since they implictly use SP (for
AArch{32,64} at least).

Calls are handled through the regmask check, and return blocks should be
handled by the common-dominator call. Did you run into any issues with
this? Marking blocks containing `isReturn` instructions as used will
basically make shrink-wrapping to fail all the time.

​Actually, I do just the opposite, make sure tail calls do not prevent a
block from being used as a save/restore point, regardless of the fact that
a tail call may use SP.

Does this look correct? Are there alternatives?

Another thing we could do is to add SP (through TLI.
getStackPointerRegisterToSaveRestore) to the CurrentCSRs set
(and probably rename the set).

​Right, make sense.

Thanks a lot,
Momchil Velikov​