Invalid or unaligned stack exception on Windows

Hi,

I wrote sometime ago about this issue (see links below).

http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-March/084089.html

http://lists.cs.uiuc.edu/pipermail/llvmdev/2015-April/084193.html

At the time I thought the problem was due to a bug in my code, and
that I had fixed it. But sadly it seems that the issue is still
present - it just got hidden by whatever change I made.

The error only occurs when a longjmp() call is invoked - but not at
every longjmp() call - many longjmp() calls appear to be fine while
one of them fails. Depending on compilation option (/O1 or /O2) the
failure occurs at different places so I cannot spot an obvious
pattern.

The scenario is:
C code calls setjmp() and eventually calls JITed code.
JITed code calls a C function
C function calls other C functions eventually leading to longjmp() call.
Note that the longjmp call is not directly from JITed code.

The error only occurs on Windows. I am using Visual Studio 2013
64-bit. However, it does not occur in a debug build (MSVC optimization
mode /Od) - but occurs when I use /O1 or /O2.

I do not get this error on MAC OSX Yosemite (using clang) or on Ubuntu
(using gcc).
On Ubuntu I am running gcc with -fsanitize=address to detect any memory issues.

LLVM JIT optimization does not make a difference - i.e. the error
occurs regardless of LLVM optimization settings.

I am really at a loss as to how to find the root cause.

How can I check whether there are stack alignment issues in JITed code?
Can I enable address sanitizer in MCJIT so that any memory errors can
be trapped?
Should I build LLVM and my project using mingw-64 to see if same error
occurs - is this supported on Windows?

I would really appreciate any input on this issue.

Thanks and Regards

Dibyendu

I have an example test program (smallest I could construct) that
triggers the problem. The relevant line is commented in the code.

https://github.com/dibyendumajumdar/ravi/tree/master/ravi-tests/longjmp_issue

I have also dumped the IR for the three functions that are compiled.

The C code that does the longjmp and setjmp is at:

https://github.com/dibyendumajumdar/ravi/blob/master/src/ldo.c

See functions luaD_throw() and luaD_rawrunprotected()

Are you using split stacks of some kind? Are you sure these actually work as intended on Win64? Based on the source code, it looks like you are allocating stack manually, but I could be wrong.

What triple are you using with LLVM to generate code?

There isn’t much else information here, but you can try to zero in on the problem by checking the stack alignment manually with a helper like:

void CheckAlignment() {
assert((((uintptr_t)_AddressOfReturnAddress() + 8) & 15) == 0);
}

Run this near where LLVM calls back into C code. If it fails, disassemble the calling LLVM function and look at that to see if there’s something wrong with the prologue. Sending that along with any followups would be helpful.

Hi there,

I had similar problem around LLVM 3.5 and I’m almost certain that is only Windows 64-bit related. In my case longjmp was crashing from time to time and moreover debugger trap was triggered inside longjmp every time (it could be ignored). I haven’t found a proper solution and switched from library longjmp to builtin one (llvm.eh.sjlj.longjmp). Moving longjmp call from C++ managed by MSVC to LLVM IR can also be helpful.

My guess is that longjmp does not work on Windows 64bit because it needs correct stack unwinding information and LLVM does not deliver it. Back then in 3.5 days exception handling on Windows was quite poor.

  • Paweł

Hi,
Thank you - that's very helpful to know. If I compiled using gcc or
clang would this still be the case? Presumably I can still compile
LLVM using MSVC.

Regards
Dibyendu

Are you using split stacks of some kind? Are you sure these actually work as
intended on Win64? Based on the source code, it looks like you are
allocating stack manually, but I could be wrong.

Hi,
Lua uses its own stack (which is just an array of value objects), and
Lua functions basically manipulate this data structure.

What triple are you using with LLVM to generate code?

x86_64-pc-windows-msvc-elf

There isn't much else information here, but you can try to zero in on the
problem by checking the stack alignment manually with a helper like:
void CheckAlignment() {
  assert((((uintptr_t)_AddressOfReturnAddress() + 8) & 15) == 0);
}

Run this near where LLVM calls back into C code. If it fails, disassemble
the calling LLVM function and look at that to see if there's something wrong
with the prologue. Sending that along with any followups would be helpful.

Thank you - I will try this, although Pawel's reply on this issue
seems like a plausible explanation.
I am trying to figure out how to dump the disassembly from the
compiled code - it seems not so easy as dumping IR.
I will also try compiling the using clang or gcc to see if that makes
the problem go away.

Thanks and Regards
Dibyendu

I think Paweł identified the problem. The frames on the stack between the setjmp and longjmp must have valid unwind information, which is described here:
https://msdn.microsoft.com/en-us/library/ft9x1kdx.aspx?f=255&MSPPError=-2147217396

In particular, it has this line about JITed code:
“For dynamically generated functions [JIT compilers], the runtime to support these functions must either use RtlInstallFunctionTableCallback or RtlAddFunctionTable to provide this information to the operating system. Failure to do so will result in unreliable exception handling and debugging of processes.”

LLVM does not contain any references to these functions, so I must conclude that unwinding through LLVM JITed frames on Win64 is not supported. Sorry. :frowning:

You can try implementing your own setjmp / longjmp pair that bypasses the libc versions. That might work.