Question about callee saved registers in x86

Hi llvmdev,

I'm trying to figure how llvm remembers stack slots allotted to callee
saved registers on x86. In particular, llvm pushes registers in
decreasing order of FrameIdxs [1], so the offsets they get (as
returned by MFI->getObjectOffset) don't directly correspond to their
actual stack locations. In X86FrameLowering's
emitCalleeSavedFrameMoves, when emitting DWARF information, this
discrepancy gets fixed up by subtracting the offset reported by
MFI->getObjectOffset from the minimum offset for any CSR (this is done
by the "Offset = MaxOffset - Offset + saveAreaOffset;" line). Is
there a reason why llvm doesn't keep around the offsets in the right
order from very beginning, by pushing the CSRs in increasing order of
FrameIdxs?

[1]: in fact, the way X86FrameLowering's spillCalleeSavedRegisters and
PEI's calculateCalleeSavedRegisters are set up, I don't see a reason
why the FrameIdxs and the generated push instructions have any
relation at all. It seems that the code relies on
MFI->CreateStackObject returning sequential integers.

Thanks!
-- Sanjoy

Hi llvmdev,

I'm trying to figure how llvm remembers stack slots allotted to callee
saved registers on x86. In particular, llvm pushes registers in
decreasing order of FrameIdxs [1], so the offsets they get (as
returned by MFI->getObjectOffset) don't directly correspond to their
actual stack locations. In X86FrameLowering's
emitCalleeSavedFrameMoves, when emitting DWARF information, this
discrepancy gets fixed up by subtracting the offset reported by
MFI->getObjectOffset from the minimum offset for any CSR (this is done
by the "Offset = MaxOffset - Offset + saveAreaOffset;" line). Is
there a reason why llvm doesn't keep around the offsets in the right
order from very beginning, by pushing the CSRs in increasing order of
FrameIdxs?

Now, that you mention it, I remember being down to the same rabbit hole. With certain calling conventions (coldcc, I think it was which can for sure expose this for x86), it is possible to generate invalid CFI directives for the registers in a frame. Especially when XMM registers must be preserved along with general purpose registers. And the reason for this was the offset fixing logic within emitCalleeSavedFrameMoves, which breaks when fixing offset for XMM registers.

To fix this disparity, I concluded that it could be done by reversing definition order of general purpose registers within X86CallingConv.td for all calling conventions, since llvm prefers to use push/pop model for storing GPR:s (for x86). With this change stack slots and registers would have 1:1 mapping, without extra offset calculations and emitCalleeSavedFrameMoves could be simplified by removing extra magic to determine slots, and to generate correct CFI directives in unusual cases.

Hi Pasi,

Do you have a broken test case lying around? If you do, I'll start work on a fix for this using that as the test case.

Thanks,
-- Sanjoy

Hi llvmdev,

I'm trying to figure how llvm remembers stack slots allotted to callee
saved registers on x86. In particular, llvm pushes registers in
decreasing order of FrameIdxs [1], so the offsets they get (as
returned by MFI->getObjectOffset) don't directly correspond to their
actual stack locations. In X86FrameLowering's
emitCalleeSavedFrameMoves, when emitting DWARF information, this
discrepancy gets fixed up by subtracting the offset reported by
MFI->getObjectOffset from the minimum offset for any CSR (this is done
by the "Offset = MaxOffset - Offset + saveAreaOffset;" line). Is
there a reason why llvm doesn't keep around the offsets in the right
order from very beginning, by pushing the CSRs in increasing order of
FrameIdxs?

Now, that you mention it, I remember being down to the same rabbit hole.
With certain calling conventions (coldcc, I think it was which can for
sure expose this for x86), it is possible to generate invalid CFI
directives for the registers in a frame. Especially when XMM registers
must be preserved along with general purpose registers. And the reason
for this was the offset fixing logic within emitCalleeSavedFrameMoves,
which breaks when fixing offset for XMM registers.

To fix this disparity, I concluded that it could be done by reversing
definition order of general purpose registers within X86CallingConv.td
for all calling conventions, since llvm prefers to use push/pop model
for storing GPR:s (for x86). With this change stack slots and registers
would have 1:1 mapping, without extra offset calculations and
emitCalleeSavedFrameMoves could be simplified by removing extra magic to
determine slots, and to generate correct CFI directives in unusual cases.

Now that checking out some old experimental code to solve this, it also seems to require to reversing the order of XMM/YMM registers within calling convention definitions in X86CallingConv.td, and to do other minor changes to account for these.

Hi,
I have a related question. The spilled x86 GPRs are assigned positive frame indices, which seems problematic in cases when stack needs to be realigned: they are pushed before stack is re-aligned, so they cannot be addressed relative to the stack pointer (because there is a random gap caused by SP re-alignment).

Shouldn’t they be assigned negative indices, i.e. be fixed objects, in which case they can be addressed relative to BP?

Vadim

Hi!

Hi,
I have a related question. The spilled x86 GPRs are assigned positive frame indices, which seems problematic in cases when stack needs to be realigned: they are pushed before stack is re-aligned, so they cannot be addressed relative to the stack pointer (because there is a random gap caused by SP re-alignment).

Shouldn't they be assigned negative indices, i.e. be fixed objects, in which case they can be addressed relative to BP?

Does llvm actually emit code to directly access the callee saved registers? As long as all llvm does is push them on entry and pop them on exit, you don't really care very much about offsets.

As far as DWARF information is concerned, I'd expect the locations of the callee saved register spills be in terms of the CFA (Canonical Frame Address), and as long as llvm emits a correct expression to calculate the CFA at all points of the function body, DWARF readers should do fine.

My initial point of confusion was that it seems llvm has some internal discrepancies about what slots CSRs get mapped to -- the actual code seems to push them in an different order than the they are assigned slots on the stack. This gets corrected later on, and we do get accurate .cfi directives, so I figured that this may be some convention that I had missed.

However, (thanks to Pasi's hint) there are cases where llvm generates incorrect .cfi directives, and this leads me to think that the issue is, in fact, the lack of co-ordination between the code that emits the PUSH instructions and the code that assigns stack slots to the CSR frame indices.

I think I have a reproducible test case now, and I've filed a bug at http://llvm.org/bugs/show_bug.cgi?id=19905. Unless someone else wants to tackle this, I'll have a go at it next week.

-- Sanjoy

If CSR indices corresponded to the actual locations where the registers are
saved, it might be possible to use CSR info and getFrameIndexReference() to
generate offsets for CFI directives, and everything would be in sync pretty
much automatically, no?

Anyhow, I am interested in this because I am trying to implement SEH unwind
info emission for 64 bit Windows. Currently, the XMM registers are being
spilled into non-fixed frame slots, and if the stack is realigned, they end
up on the wrong side of the "realignment gap". This makes generation of
SEH info (which is far less expressive than DWARF CFI) very challenging.
Things would be much easier if XMMs were saved above the gap, and thus
addressable via the base pointer.

Does anybody know if it is possible to assign callee-saved registers to
fixed frame slots? Any ideas about how to achieve that? Would it be ok
to simply re-write CSR info inside spillCalleeSavedRegisters()?

Vadim

Hi Pasi,

Do you have a broken test case lying around? If you do, I'll start work on a fix for this using that as the test case.

Thanks,
-- Sanjoy

Yes I do, and here it's. As for the current state of affairs, case 'foo' works as a good control, only the output order of cfi_offset directives would change with the proposed changes. For the case 'bar', at the moment no cfi_offset directives are generated at all, and that should be fixed too. But with the case 'foobar' all the things breaks down currently and you get nonsensical output like:

foobar: # @foobar
         .cfi_startproc
# BB#0:
         pushq %rbp
.Ltmp8:
         .cfi_def_cfa_offset 16
         pushq %r9
.Ltmp9:
         .cfi_def_cfa_offset 24
         pushq %rbx
.Ltmp10:
         .cfi_def_cfa_offset 32
.Ltmp11:
         .cfi_offset %rbx, -80
.Ltmp12:
         .cfi_offset %r9, -72
.Ltmp13:
         .cfi_offset %rbp, -64
.Ltmp14:
         .cfi_offset %xmm0, -48
.Ltmp15:
         .cfi_offset %xmm7, -32
.Ltmp16:
         .cfi_offset %xmm15, -16
         movaps %xmm15, -48(%rsp) # 16-byte Spill
         movaps %xmm7, -32(%rsp) # 16-byte Spill
         movaps %xmm0, -16(%rsp) # 16-byte Spill

As can be seen here, offsets are in reverse order. For the proposed change output would be like this:

; CHECK: .cfi_offset %rbp, -16
; CHECK: .cfi_offset %r9, -24
; CHECK: .cfi_offset %rbx, -32
; CHECK: .cfi_offset %xmm15, -48
; CHECK: .cfi_offset %xmm7, -64
; CHECK: .cfi_offset %xmm0, -80

As can be seen here physical locations for xmm registers would change with the proposed change (reverse gpr and ymm/xmm orders within X86CallingConv.td). That means some tweaks to existing test cases. But what matters here is correctness of directives, and one dirty hack (as the comment states) less in X86FrameLowering::emitCalleeSavedFrameMoves.

Pasi

frame-moves.ll (814 Bytes)