Win64 bugs

Hello!

I've just tried generating Win64 code and the result is not that good.

First of all, XMM registers are saved without reason to do so. Not only
this slows the performance but leads to random crashes too. XMMs are
stored to the stack with MOVAPS instruction which requires 16-byte
alignment which is not always the case. lli.exe (built in debug mode)
randomly crashes on some simple hello-world-alike tests due to misalignment.

Though the most problematic stuff is the lack of 'shadow zone' support
in Win64 ABI. Or maybe I haven't figured out how to turn this on. In
Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just
after the call instruction) as scratch data. VC++ compiler stores
arguments passed in registers there. In debug builds this doesn't get
optimized away.

Consider this C++ code:

#include <stdio.h>

int main () {
  for ( int i=0; i<5; i++ )
    printf ( "%d\n", 0 );
  return 0;
}

Compile it to llvm bytecode with -O0 flag. Then run debug build of
64-bit lli.exe (with -mtriple=x86_64-pc-windows argument). For me it
prints 0's forever.

The reason for this is printf function using shadow zone to store its
arguments. Second arguments goes to the stack at address RSP+10h and
overwrites 'i' variable always resetting it to zero.

Is anyone aware of the second bug? If I have some time I'll try to fix
it by myself but it'd be much better if someone hints me where to start
from.

Hello!

I've just tried generating Win64 code and the result is not that good.

First of all, XMM registers are saved without reason to do so. Not only
this slows the performance but leads to random crashes too. XMMs are
stored to the stack with MOVAPS instruction which requires 16-byte
alignment which is not always the case. lli.exe (built in debug mode)
randomly crashes on some simple hello-world-alike tests due to misalignment.

http://llvm.org/bugs/show_bug.cgi?id=3739 is about the extra XMM
stores; I thought the alignment was working, though...

Though the most problematic stuff is the lack of 'shadow zone' support
in Win64 ABI. Or maybe I haven't figured out how to turn this on. In
Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just
after the call instruction) as scratch data. VC++ compiler stores
arguments passed in registers there. In debug builds this doesn't get
optimized away.

Wow, that's really strange... I'm pretty sure that simply isn't implemented.

-Eli

Though the most problematic stuff is the lack of 'shadow zone' support
in Win64 ABI. Or maybe I haven't figured out how to turn this on. In
Win64 any function can treat 32 bytes of stack (RSP+08h..RSP+28h just
after the call instruction) as scratch data. VC++ compiler stores
arguments passed in registers there. In debug builds this doesn't get
optimized away.

Wow, that's really strange... I'm pretty sure that simply isn't implemented.

Another side effect is function with more that four arguments. It won't
work if LLVM/VC++ code is mixed. That's again because of abscence of
32-byte gap between stack top and arguments.

E.g.:
int main () {
  printf ( "%d %d %d %d\n", 1, 2, 3, 4 );
  return 0;
}

Output:
1 2 3 0

Any ideas on how hard would it be to fix?

Hi Peter,

The attached patch is a workaround for the XMM misalignment issue. Basically
it uses the fallback method of saving and restoring registers on the stack,
which does work correctly with alignment. If I recall correctly it also
doesn't save any registers unnecessarily, but I could be wrong about that.

Anyway, it's hack, but if all you want for now is to be able to work with
Win64 and use SSE this might offer a solution.

I wasn't aware of the second bug you're describing, but the one in your
latest e-mail about not being able to have more than four arguments I'm
experiencing as well. I'm afraid I haven't found any workaround for that
yet.

Cheers,

Nicolas

Win64workaround.patch (905 Bytes)

Anton K has a patch for this that we have been using successfully internally but he is still working out issues with regards to certain parameter types, in particular MMX/SSE params, that I believe he was going to fix before committing it. I'm sure he'll weigh into this discussion.

Stefanus

Stefanus Du Toit wrote:

The attached patch is a workaround for the XMM misalignment issue. Basically
it uses the fallback method of saving and restoring registers on the stack,
which does work correctly with alignment. If I recall correctly it also
doesn't save any registers unnecessarily, but I could be wrong about that.

Anyway, it's hack, but if all you want for now is to be able to work with
Win64 and use SSE this might offer a solution.

Thanks a lot! I'll try it when I'm back from the vacation.

Hello, Peter

Working patch even if incomplete would be great. I'll be out of
discussion for a week or so but since this is a "stopper" bug for me I'd
appreciate posting work in progress here.

There is the version of the patch which seems to implement most of all
weird win64 stuff. I will cleanup it and commit it as soon as I will
return from vacations (within next few days).

Hello, Nicolas

The attached patch is a workaround for the XMM misalignment issue. Basically
it uses the fallback method of saving and restoring registers on the stack,
which does work correctly with alignment. If I recall correctly it also
doesn't save any registers unnecessarily, but I could be wrong about that.

Please don't use this patch, it's completely wrong. The problem is
that prologue / epilogue emission code is not prepared for such
'fallback' solution and will emit improper stack update code. You can
easily catch this problem when you have other callee-saved registers
spilled (not only high xmm ones).

I have patch which should complete the win64 CC support in LLVM
(modulo varargs functions), I hope to commit it within next few days.

Hi Anton,

Thanks a lot for the heads up. I hadn't run into any problems yet with my
hack because I haven't used other callee-saved registers so far. Anyway, I'm
looking forward to your fix!

Kind regards,

Nicolas

Hello, Nicolas

Thanks a lot for the heads up. I hadn't run into any problems yet with my
hack because I haven't used other callee-saved registers so far. Anyway, I'm
looking forward to your fix!

I've commited the first series of patches to ToT to unbreak win64, basically:

1. Honour register save area
2. Enable proper passing of __m128 and __m64 arguments
3. Minor cleanups here and there

The callee-saved problem is still unfixed, I'm working on general solution.

Thanks!

What revision is your commit? I'd like to have a closer look at your patch
in an attempt to understand the issue better, and maybe try fixing the
callee-saved problem.

Cheers,

Nicolas

Hi Nicolas

What revision is your commit? I'd like to have a closer look at your patch
in an attempt to understand the issue better, and maybe try fixing the
callee-saved problem.

See r77962, r 77964 and around. But in fact there is nothing there
which indicates the issue :slight_smile: It existed before my patches and exists
currently. We need to have some generic way to mark instructions
belonging to "frame setup code".