Stack alignment in JIT compiled code

Hi all,

i am trying to call an aligned function in my host application from jit
compiled bitcode. The call itself is done using an absolute pointer to
the function. The host application's functions make heavy use of SSE
instructions and are thus operate on a stack aligned to 16 byte.

If i call an aligned function in the host application from a frame
running jit compiled code, the alignment property is hurt, causing to
seg faults.

So far, i found no way to denote calls to the host function as aligned
or maintain stack alignment throughout the stack frame of the jit
compiled function. Further, the gcc front end (llvm-g++ (GCC) 4.2.1)
seems to ignore directives such as -mpreferred-stack-boundary.

Somebody gut a clue?

Simon Moll

Sounds like you might be on a platform that requires 16-byte alignment of the stack, and that the JIT system isn't respecting that, is that true? If so, the bug is CodeGen/JIT needs to ensure 16-byte alignment. You leave out important details, like, which system you're on.

Hello, Simon

So far, i found no way to denote calls to the host function as aligned
or maintain stack alignment throughout the stack frame of the jit
compiled function. Further, the gcc front end (llvm-g++ (GCC) 4.2.1)
seems to ignore directives such as -mpreferred-stack-boundary.

Mike is right. It depends on your subtarget:

1. If you're running stuff on Darwin, which has 16-byte aligned stack,
then this is JIT bug
2. If you're running stuff on Linux/Windows which has 4 bytes aligned
stack, then it is a bug in your host code. It should not assume any
particular "extra" stack alignment as defined in platform ABI. And if
it does require such extra alignment it should do stack realignment by
itself (for example, LLVM itself does so if function is doing some
vector math which requires SSE2 code).

(http://www.x86-64.org/viewvc/trunk/x86-64-ABI/low-level-sys-info.tex?revision=84&content-type=text%2Fplain):

The end of the input argument area shall be aligned on a 16 byte boundary.
In other words, the value (%rsp - 8) is always a multiple of 16 when control is
transferred to the function entry point. The stack pointer, %rsp,
always points to
the end of the latest allocated stack frame.
<<

The libc itself assumes it in that way, and does no stack realignment.
You can look for example at snprintf disassembly, that dumps SSE
registers on the stack:
0x00002b4b13522b60 <snprintf+0>: sub $0xd8,%rsp
0x00002b4b13522b67 <snprintf+7>: mov %rcx,0x38(%rsp)
0x00002b4b13522b6c <snprintf+12>: movzbl %al,%ecx
0x00002b4b13522b6f <snprintf+15>: mov %r8,0x40(%rsp)
0x00002b4b13522b74 <snprintf+20>: lea 0x0(,%rcx,4),%rax
0x00002b4b13522b7c <snprintf+28>: lea 50(%rip),%rcx #
0x2b4b13522bb5 <snprintf+85>
0x00002b4b13522b83 <snprintf+35>: mov %r9,0x48(%rsp)
0x00002b4b13522b88 <snprintf+40>: sub %rax,%rcx
0x00002b4b13522b8b <snprintf+43>: lea 0xcf(%rsp),%rax
0x00002b4b13522b93 <snprintf+51>: jmpq *%rcx
0x00002b4b13522b95 <snprintf+53>: movaps %xmm7,0xfffffffffffffff1(%rax)
0x00002b4b13522b99 <snprintf+57>: movaps %xmm6,0xffffffffffffffe1(%rax)
0x00002b4b13522b9d <snprintf+61>: movaps %xmm5,0xffffffffffffffd1(%rax)
0x00002b4b13522ba1 <snprintf+65>: movaps %xmm4,0xffffffffffffffc1(%rax)
0x00002b4b13522ba5 <snprintf+69>: movaps %xmm3,0xffffffffffffffb1(%rax)
0x00002b4b13522ba9 <snprintf+73>: movaps %xmm2,0xffffffffffffffa1(%rax)
0x00002b4b13522bad <snprintf+77>: movaps %xmm1,0xffffffffffffff91(%rax)
0x00002b4b13522bb1 <snprintf+81>: movaps %xmm0,0xffffffffffffff81(%rax)
0x00002b4b13522bb5 <snprintf+85>: lea 0xe0(%rsp),%rax
0x00002b4b13522bbd <snprintf+93>: mov %rsp,%rcx
0x00002b4b13522bc0 <snprintf+96>: movl $0x18,(%rsp)
0x00002b4b13522bc7 <snprintf+103>: movl $0x30,0x4(%rsp)
0x00002b4b13522bcf <snprintf+111>: mov %rax,0x8(%rsp)
0x00002b4b13522bd4 <snprintf+116>: lea 0x20(%rsp),%rax
0x00002b4b13522bd9 <snprintf+121>: mov %rax,0x10(%rsp)
0x00002b4b13522bde <snprintf+126>: callq 0x2b4b1353eb60 <vsnprintf>
0x00002b4b13522be3 <snprintf+131>: add $0xd8,%rsp
0x00002b4b13522bea <snprintf+138>: retq

Corrado

Corrado Zoccolo wrote:

Hello, Corrado

The end of the input argument area shall be aligned on a 16 byte boundary.
In other words, the value (%rsp - 8) is always a multiple of 16 when control is
transferred to the function entry point. The stack pointer, %rsp,
always points to
the end of the latest allocated stack frame.

That's correct. x86-64/linux has stack alignment specified as 16 in
ABI. That's why no stack realignment is needed at all. As was already
mentioned - without extra information about particular subtarget it's
hard to say something definite here.

Hello, Andrew

That's right. If you want to be able to call any code produced by gcc,
you have to preserve 16-alignment. gcc-generated code does not realign
the stack pointer.

This was for gcc < 4.4, where stack alignment handling was really
messy. stack-realignment branch was merged afair into gcc 4.4 and
allows automatic realignment of stack, when necessary.

Anton Korobeynikov wrote:

That's right. If you want to be able to call any code produced by gcc,
you have to preserve 16-alignment. gcc-generated code does not realign
the stack pointer.

This was for gcc < 4.4, where stack alignment handling was really
messy. stack-realignment branch was merged afair into gcc 4.4 and
allows automatic realignment of stack, when necessary.

I don't think that matters. gcc 4.4.0 has been released for all of ten days.
gcc < 4.4 is going to be in use for many years, so you have to be able to
cope with it.

Andrew.