Invalid or unaligned stack

Hi,

I am encountering a problem that I do not know how to debug. I would
greatly appreciate any guidance on this issue.

On Windows when I run Lua test cases from JITed code I am getting
following error:

Unhandled exception at 0x00007FFCEEEAC500 (ntdll.dll) in lua.exe:
0xC0000028: An invalid or unaligned stack was encountered during an
unwind operation.

This is happening when the Lua code is attempting to call longjmp().

The Lua test case that triggers this is a recursive call as shown below.

function err_on_n (n)
  if n==0 then error(); exit(1);
  else err_on_n (n-1); exit(1);
  end
end

do
  function dummy (n)
    if n > 0 then
      assert(not pcall(err_on_n, n))
      dummy(n-1)
    end
  end
end

dummy(10)

I have a struct that is created on the stack in the JIT compiler, and
this error is triggered when I add a field to the struct - if I remove
the field the error stops. The struct is not very large in size - it
is only 392 bytes with the new field.

If I allocate this struct on the heap the error goes away.

Note that just adding the field triggers the error even if I have no
other code changes.

The error occurs in Release build but not in Debug build.
I am using Visual C++ 2013 - 64-bit, and LLVM 3.6.0 on Windows.
I see no other unexpected behaviour - all the other tests pass.

All tests pass on Ubuntu, using LLVM 3.5.1 and gcc 4.8.2.

Of course I am currently assuming that somewhere my program is
corrupting memory - but as I have no other signs of a memory
corruption - the issue may be something else.

Any tips on what the problem might be would be gratefully received.

Thanks and Regards
Dibyendu

See r227426 from Clang. setjmp on Win64 is really weird. There is a hidden second parameter that you may need to fill in with @llvm.frameaddress.

Hi,

I constructed a minimal Lua program that reproduces the problem.
Essentially the problem occurs if a JITed function is recursively
called - and there is a longjmp from the inner call. Example:

function rais(n)
  if n == 0 then error()
  else rais(n-1)
  end
end
ravi.compile(rais)

function caller(n)
  pcall(rais,n)
end
ravi.compile(caller)

caller(1)

Here the call to error() triggers a longjmp. The pcall() calls setjmp.
The error only occurs on Windows as reported earlier.

I ran valgrind on Ubuntu to see if I could detect any memory issues.
Valgrind reports 6 errors of following type - not sure if this is an
issue or not.

==66154==
==66154== HEAP SUMMARY:
==66154== in use at exit: 115,521 bytes in 720 blocks
==66154== total heap usage: 9,357 allocs, 8,637 frees, 3,887,031
bytes allocated
==66154==
==66154== 152 bytes in 1 blocks are possibly lost in loss record 378 of 423
==66154== at 0x4C2AB80: malloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==66154== by 0x516AD4D: llvm::MDNode::getMDNode(llvm::LLVMContext&,
llvm::ArrayRef<llvm::Value*>, llvm::MDNode::FunctionLocalness, bool)
(in /home/dylan/github/ravi/build/libravi.so)
==66154== by 0x516A2B8:
llvm::MDBuilder::createTBAAStructTagNode(llvm::MDNode*, llvm::MDNode*,
unsigned long) (in /home/dylan/github/ravi/build/libravi.so)
==66154== by 0x5068961:
ravi::LuaLLVMTypes::LuaLLVMTypes(llvm::LLVMContext&) (in
/home/dylan/github/ravi/build/libravi.so)
==66154== by 0x5063EF6: ravi::RaviJITStateImpl::RaviJITStateImpl()
(in /home/dylan/github/ravi/build/libravi.so)
==66154== by 0x50644FD: raviV_initjit (in
/home/dylan/github/ravi/build/libravi.so)
==66154== by 0x5057DF5: lua_newstate (in
/home/dylan/github/ravi/build/libravi.so)
==66154== by 0x503641E: luaL_newstate (in
/home/dylan/github/ravi/build/libravi.so)
==66154== by 0x401674: main (in /home/dylan/github/ravi/build/lua)

Hi,

I constructed a minimal Lua program that reproduces the problem.
Essentially the problem occurs if a JITed function is recursively
called - and there is a longjmp from the inner call. Example:

function rais(n)
  if n == 0 then error()
  else rais(n-1)
  end
end
ravi.compile(rais)

function caller(n)
  pcall(rais,n)
end
ravi.compile(caller)

caller(1)

Here the call to error() triggers a longjmp. The pcall() calls setjmp.
The error only occurs on Windows as reported earlier.

This isn't enough info to solve the problem. Pasting the LLVM IR or C code
that calls setjmp and a stack trace of the crash might help figure it out,
though.

I ran valgrind on Ubuntu to see if I could detect any memory issues.
Valgrind reports 6 errors of following type - not sure if this is an
issue or not.

These memory leaks look unrelated.

Yes I am working on isolating the issue. There is all the Lua
infrastructure - don't know if I can eliminate that ...

I compiled Lua / JIT compiler on Ubuntu using -fsanitize=address and
ran the original test. No errors reported.

I will be checking a number of other things, and report back if I find
anything material.

Regards

Looks like the error reported by MSVC was incorrect - the actual issue
was caused by a part of Lua (the debug API) that I had not fully
tested in the JITed environment. Thanks to help from Address Sanitizer
I was able to find the real issue and have applied fixes.

Thanks and Regards
Dibyendu