variable length argument functions in AMD64 arch

Hi, all

I am trying to use clang to compile a small OS kernel. The variable
length argument function
gives me much headache. For example, a printk function is defined in
my kernel as:

int printk (const char *format, ...)
{
        return 0;
}

This printk will be compiled into (objdumped, AT&T syntax):
4011e2b0 <printk>:
4011e2b0: 55 push %ebp
4011e2b1: 48 dec %eax
4011e2b2: 89 e5 mov %esp,%ebp
4011e2b4: 48 dec %eax
4011e2b5: 81 ec b0 00 00 00 sub $0xb0,%esp
4011e2bb: 0f 29 7d f0 movaps %xmm7,-0x10(%ebp)
4011e2bf: 0f 29 75 e0 movaps %xmm6,-0x20(%ebp)
4011e2c3: 0f 29 6d d0 movaps %xmm5,-0x30(%ebp)
4011e2c7: 0f 29 65 c0 movaps %xmm4,-0x40(%ebp)
4011e2cb: 0f 29 5d b0 movaps %xmm3,-0x50(%ebp)
4011e2cf: 0f 29 55 a0 movaps %xmm2,-0x60(%ebp)
4011e2d3: 0f 29 4d 90 movaps %xmm1,-0x70(%ebp)
4011e2d7: 0f 29 45 80 movaps %xmm0,-0x80(%ebp)
4011e2db: 4c dec %esp
4011e2dc: 89 8d 78 ff ff ff mov %ecx,-0x88(%ebp)
4011e2e2: 4c dec %esp
4011e2e3: 89 85 70 ff ff ff mov %eax,-0x90(%ebp)
4011e2e9: 48 dec %eax
4011e2ea: 89 8d 68 ff ff ff mov %ecx,-0x98(%ebp)
4011e2f0: 48 dec %eax
4011e2f1: 89 95 60 ff ff ff mov %edx,-0xa0(%ebp)
4011e2f7: 48 dec %eax
4011e2f8: 89 b5 58 ff ff ff mov %esi,-0xa8(%ebp)
4011e2fe: 31 c0 xor %eax,%eax
4011e300: 48 dec %eax
4011e301: 81 c4 b0 00 00 00 add $0xb0,%esp
4011e307: 5d pop %ebp
4011e308: c3 ret

It seems clang will generate code to handle variable length arguments
no matter whether
va_xxx (va_start, va_end) is used or not. (gcc will only generate code
to handle varible
length arguments when va_start is used).

My biggest issue with this code is that movaps is used. According to
Intel's manual,
if the destination memory isn't 16-byte aligned, a GP# (General
Protection fault) will occur.
It seems that using movaps is wrong unless we can guarantee that ebp
is always 16byte aligned.
This may not be true. I manually edited the binary of generated code
to use the movups
(the same instruction as movups except that it will not check the
alignment). instruction
and everything is fine.

Any comments?

--Zhi

It seems clang will generate code to handle variable length arguments
no matter whether
va_xxx (va_start, va_end) is used or not. (gcc will only generate code
to handle varible
length arguments when va_start is used).

I suppose that's a quality-of-implementation issue; it's really more
of a backend issue, though, so I'd suggest asking on llvmdev.

My biggest issue with this code is that movaps is used. According to
Intel's manual,
if the destination memory isn't 16-byte aligned, a GP# (General
Protection fault) will occur.
It seems that using movaps is wrong unless we can guarantee that ebp
is always 16byte aligned.
This may not be true. I manually edited the binary of generated code
to use the movups
(the same instruction as movups except that it will not check the
alignment). instruction
and everything is fine.

The stack is supposed to be 16-byte aligned on x86-64; if it isn't,
there's probably a bug somewhere. But again, better to discuss on
llvmdev.

-Eli

The stack is supposed to be 16-byte aligned on x86-64; if it isn't,
there's probably a bug somewhere. But again, better to discuss on
llvmdev.

The generated code looks like x86-32. Does It need not be aligned on
16-byte boundary ?

Thanks, Eli. It was my fault that the stack wasn’t 16-byte aligned. Since I was compiling
a OS kernel with clang, I had to set rsp myself. I made a mistake in one of the assembly
code. Now, it has been fixed and everything is fine.

BTW, it seems clang can’t handle the “=rm” inline assembly code as in the following:
static inline struct pcpu_gs *percpu ()
{
struct pcpu_gs *gs;

asm volatile (“movq %%gs:gs_self, %0” : “=rm” (gs));
return gs;
}

clang will generate code like:

movq %gs:gs_self, -8(%rbp)

leading gas to complain: Error: too many memory references for `movq’

–Zhi

2009/8/13 Eli Friedman <eli.friedman@gmail.com>:

The stack is supposed to be 16-byte aligned on x86-64; if it isn’t,
there’s probably a bug somewhere. But again, better to discuss on
llvmdev.

The generated code looks like x86-32. Does It need not be aligned on
16-byte boundary ?

It is 64-bit code. I may have confused the objdump by setting the elf format to elf-x86-32.
I aligned the stack to 16-bytes and everything is fine now. Thanks

–Zhi

The code is wrong... the constraint should be "=r", not "=rm".

-Eli

Indeed, thanks again.

–Zhi