Unaligned load/store for callee-saved 128-bit registers

On my (out-of-tree) target I have 16 128-bit registers.
Unaligned load/store are illegal. (must 16-bytes aligned)

8 of those registers are defined as callee-saved and 8 caller-saved.
The default stack size is 4 bytes.
The target implements dynamic stack realign to make sure the stack will always be aligned correctly when necessary.

Yet I am still getting unaligned load/store when running this test case: http://pastie.org/8490604

The problem is in PEI::calculateCalleeSavedRegisters:

// We may not be able to satisfy the desired alignment specification of
// the TargetRegisterClass if the stack alignment is smaller. Use the
// min.
Align = std::min(Align, StackAlign);
FrameIdx = MFI->CreateStackObject(RC->getSize(), Align, true);

This will create unaligned load/store for a callee-saved 128-bit register on the frame slot because StackAlign is 4.

Adding a check for stack realignable or putting all the 128-bit registers as caller-save will fix the problem.

if (!TFI->isStackRealignable()) <— new line
Align = std::min(Align, StackAlign);

Is this a bug or am I missing something?

Thanks,
Francois Pichet, Octasic.

From: "Francois Pichet" <pichet2000@gmail.com>
To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, November 18, 2013 2:26:30 PM
Subject: [LLVMdev] Unaligned load/store for callee-saved 128-bit registers

On my (out-of-tree) target I have 16 128-bit registers.
Unaligned load/store are illegal. (must 16-bytes aligned)

8 of those registers are defined as callee-saved and 8 caller-saved.
The default stack size is 4 bytes.
The target implements dynamic stack realign to make sure the stack
will always be aligned correctly when necessary.

Yet I am still getting unaligned load/store when running this test
case: http://pastie.org/8490604

The problem is in PEI::calculateCalleeSavedRegisters:

// We may not be able to satisfy the desired alignment specification
of
// the TargetRegisterClass if the stack alignment is smaller. Use the
// min.
Align = std::min(Align, StackAlign);
FrameIdx = MFI->CreateStackObject(RC->getSize(), Align, true);

This will create unaligned load/store for a callee-saved 128-bit
register on the frame slot because StackAlign is 4.

Adding a check for stack realignable or putting all the 128-bit
registers as caller-save will fix the problem.

if (!TFI->isStackRealignable()) <--- new line
Align = std::min(Align, StackAlign);

Is this a bug or am I missing something?

This looks like a bug. By default, isStackRealignable() always returns true (this default comes from the TargetFrameLowering constructor). I wonder, however, is this is not correctly implemented in some backends (X86RegisterInfo::canRealignStack, for example, is not completely trivial). Nadav, do you know how this works?

-Hal

From: "Hal Finkel" <hfinkel@anl.gov>
To: "Francois Pichet" <pichet2000@gmail.com>
Cc: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
Sent: Monday, November 18, 2013 2:45:53 PM
Subject: Re: [LLVMdev] Unaligned load/store for callee-saved 128-bit registers

> From: "Francois Pichet" <pichet2000@gmail.com>
> To: "LLVM Developers Mailing List" <llvmdev@cs.uiuc.edu>
> Sent: Monday, November 18, 2013 2:26:30 PM
> Subject: [LLVMdev] Unaligned load/store for callee-saved 128-bit
> registers
>
>
>
> On my (out-of-tree) target I have 16 128-bit registers.
> Unaligned load/store are illegal. (must 16-bytes aligned)
>
>
>
> 8 of those registers are defined as callee-saved and 8
> caller-saved.
> The default stack size is 4 bytes.
> The target implements dynamic stack realign to make sure the stack
> will always be aligned correctly when necessary.
>
>
> Yet I am still getting unaligned load/store when running this test
> case: http://pastie.org/8490604
>
>
> The problem is in PEI::calculateCalleeSavedRegisters:
>
>
>
> // We may not be able to satisfy the desired alignment
> specification
> of
> // the TargetRegisterClass if the stack alignment is smaller. Use
> the
> // min.
> Align = std::min(Align, StackAlign);
> FrameIdx = MFI->CreateStackObject(RC->getSize(), Align, true);
>
>
> This will create unaligned load/store for a callee-saved 128-bit
> register on the frame slot because StackAlign is 4.
>
>
> Adding a check for stack realignable or putting all the 128-bit
> registers as caller-save will fix the problem.
>
> if (!TFI->isStackRealignable()) <--- new line
> Align = std::min(Align, StackAlign);
>
> Is this a bug or am I missing something?
>

This looks like a bug. By default, isStackRealignable() always
returns true (this default comes from the TargetFrameLowering
constructor). I wonder, however, is this is not correctly
implemented in some backends (X86RegisterInfo::canRealignStack, for
example, is not completely trivial). Nadav, do you know how this
works?

[Trying some other relevant people...]

Chad, Jakob: thoughts?

-Hal

BTW I managed to get around this problem by flagging all the 128-bit registers as caller saved only.

On my system, vector registers are more likely to be used on leaf functions anyway.

From: "Francois Pichet" <pichet2000@gmail.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chad Rosier" <mcrosier@codeaurora.org>, "Jakob Stoklund Olesen" <jolesen@apple.com>, "LLVM Developers Mailing
List" <llvmdev@cs.uiuc.edu>
Sent: Thursday, November 21, 2013 2:36:06 PM
Subject: Re: [LLVMdev] Unaligned load/store for callee-saved 128-bit registers

BTW I managed to get around this problem by flagging all the 128-bit
registers as caller saved only.

On my system, vector registers are more likely to be used on leaf
functions anyway.

Sounds good; however, unless I'm wrong, this looks like a general problem that needs to be fixed.

-Hal

Ping.

Jakob, can you please take a look at this? Francois's suggested fix looks reasonable to me, but as I mentioned below, I wonder if we'd need to better tie together TFI->isStackRealignable() and X86RegisterInfo::canRealignStack (etc.) to avoid breaking things (or at least so that everything will actually work as expected).

-Hal

Jakob, can you please take a look at this? Francois's suggested fix looks reasonable to me, but as I mentioned below, I wonder if we'd need to better tie together TFI->isStackRealignable() and X86RegisterInfo::canRealignStack (etc.) to avoid breaking things (or at least so that everything will actually work as expected).

This is a really complicated problem, and I don't have all the details paged in at the moment.

You need to determine fairly early on if you are going to dynamically realign the stack because it affects whether you'll need to reserve a frame pointer and/or a base pointer. SelectionDAGIsel.cpp has a call to MRI.freezeReservedRegs() after instruction selection is complete. That's when you need to make the call.

Note that this is long before you know if the register allocator is going to spill any vectors. X86 and ARM will both fall back to unaligned vector spill instructions if the stack is insufficiently aligned, see their respective storeRegToStackSlot implementations. Other targets may have to conservatively assume that vectors will spill if there are any present in the function.

It is also possible to reserve the frame pointer if the stack might need realignment. You can then avoid inserting the actual realignment code in PEI if it turns out it wasn't necessary after all. But you're not getting your frame pointer register back, that ship has sailed. This all gets triply hairy if the function contains dynamic allocas as well, and you may need a base pointer register.

I don't think there is an easy target-independent fix for this.

Thanks,
/jakob