Possible stack corruption during call to JITSymbol::getAddress()

Firstly, apologies if this is not the right place to be asking this question–feel free to point me in the correct direction. I could be doing something wrong here but stackoverflow didn’t feel like the correct place for this since there’s so little there about LLVM ORC.

Basically, I have a reproduction case (below) where if I throw an exception before I call JITSymbol::getAddress() everything works properly but throwing the same exception afterward will result in a SIGSEGV during stack unwinding. This suggests to me that somehow the stack is getting corrupted during the JITSymbol::getAddress() call.

This problem was initially discovered while working on my own project. While troubleshooting this I’ve discvoered that when LLVM is-DLLVM_USE_SANITIZER:STRING=Address the problem happens at different points during execution, perhaps having something to do with the padding around the stack variables added by the sanitizer? See the note after the call to runTest() in main().

I’m running this under an up-to-date Antergos Linux, clang version: 3.9.1 (tried compiling LLVM and the example program below with gcc 6.3.1 and the result is the same) clang set to default compiler by setting the following environment variables:

CC=/usr/bin/clang
CXX=/usr/bin/clang++

Commands used to build LLVM:

git clone https://github.com/llvm-mirror/llvm.git
git checkout release_40
cd llvm
mkdir build
cd build
cmake … -DLLVM_BUILD_LLVM_DYLIB:BOOL=ON -DLLVM_ENABLE_RTTI:BOOL=ON -DLLVM_ENABLE_EH:BOOL=ON -DLLVM_USE_SANITIZER:STRING=Address -DLLVM_PARALLEL_COMPILE_JOBS:STRING=8 -DLLVM_ENABLE_ASSERTIONS:BOOL=ON
cmake --build . – -j 8
sudo cmake --build . --target install

Command used to build test case executable:

clang test.cpp -std=c++14 -lstdc++ -lLLVM-4.0 -Wall -pedantic -Wextra -fstack-protector-all -fsanitize=address -fexceptions

Then of course:

./a.out

Output from the a.out:

ASAN:DEADLYSIGNAL

==6582==ERROR: AddressSanitizer: SEGV on unknown address 0x7f59eeb06020 (pc 0x7f59f1b20930 bp 0x000000000001 sp 0x7ffc5e546218 T0)
==6582==The signal is caused by a READ memory access.

The result if running backtrace in GDB while execution is paused after the SIGSEGV occurs:

#0 read_encoded_value_with_base (encoding=encoding@entry=28 ‘\034’, base=base@entry=0, p=p@entry=0x7fffe8a06020 <error: Cannot access memory at address 0x7fffe8a06020>, val=val@entry=0x7fffffffd6d8) at /build/gcc/src/gcc/libgcc/unwind-pe.h:252
#1 0x00007fffeba05a61 in binary_search_single_encoding_fdes (pc=0x7fffeba04426 <_Unwind_Resume+54>, ob=0x0) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.c:908
#2 search_object (ob=ob@entry=0x60400001d9d0, pc=pc@entry=0x7fffeba04426 <_Unwind_Resume+54>) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.c:977
#3 0x00007fffeba05fdd in _Unwind_Find_registered_FDE (bases=0x7fffffffda78, pc=0x7fffeba04426 <_Unwind_Resume+54>) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde.c:1013
#4 _Unwind_Find_FDE (pc=0x7fffeba04426 <_Unwind_Resume+54>, bases=bases@entry=0x7fffffffda78) at /build/gcc/src/gcc/libgcc/unwind-dw2-fde-dip.c:454
#5 0x00007fffeba02b23 in uw_frame_state_for (context=context@entry=0x7fffffffd9d0, fs=fs@entry=0x7fffffffd820) at /build/gcc/src/gcc/libgcc/unwind-dw2.c:1241
#6 0x00007fffeba03d40 in uw_init_context_1 (context=context@entry=0x7fffffffd9d0, outer_cfa=outer_cfa@entry=0x7fffffffdc00, outer_ra=0x5110fc) at /build/gcc/src/gcc/libgcc/unwind-dw2.c:1562
#7 0x00007fffeba04427 in _Unwind_Resume (exc=0x60d00000c7b0) at /build/gcc/src/gcc/libgcc/unwind.inc:224
#8 0x00000000005110fc in runTest () at /home/dave/projects/untitled/test.cpp:124
#9 0x0000000000511138 in main (argc=1, argv=0x7fffffffe698) at /home/dave/projects/untitled/test.cpp:132

My test-case is below. In runTest(), note the commented out throw statement before symbol.getAddress() and the uncommented one after it. Also note the comments after the call to runTest() in main().

Thanks.

Hi David,

This is definitely the right place to ask.

Let me see if I can reproduce this locally…

Cheers,
Lang.

Hi David,

This looks like bad eh-frame data due to a failure to fix up the frame descriptor entries:

<debug: adding frame> EHFrameAddr: 0x7feae5827000, EHFrameLoadAddr: 0x00000000e5827000, EHFrameSize: 60

==64588==ERROR: AddressSanitizer: SEGV on unknown address 0x7feae5827020 (pc 0x7feae886d970 bp 0x000000000001 sp 0x7ffca10e75f8 T0)

Eyeballing the code in RuntimeDyldELF (vs RuntimeDyldMachO, which is doing the right thing) I see it lacks the necessary fixups. If you’re feeling game you can try to port RuntimeDyldMachO’s solution to RuntimeDyldELF (where MachO uses a template argument, you’ll need to switch over the RuntimeDyldImpl Arch member to determine the pointer size for the fixup). Otherwise you should file a bug on bugs.llvm.org and CC me, and then I can CC some of the ELF devs and see if anyone has time.

In the mean time, turning off exception support should fix this, though I’m not sure whether that’s a viable option for your use case.

Cheers,

Lang.

Thanks Lang. I think I’ll go the bug creation route. I have an email out to llvm-admin requesting an account on bugs.llvm.org. I’ll let you know when I’ve filed the bug.

Hi David,

Thanks very much for that. I’ll continue to dig in as time permits, and I’ll update the bug report with my progress once it’s filed.

Cheers,
Lang.

Well, 3 days later and so far nobody has responded to my request for an account on bugs.llvm.org… so it doesn’t look like I’m going to be able to create that bug on my own, unfortunately.

FYI: this is the third person I know of who has run into problems getting a bugzilla account created. This needs addressed ASAP.

Philip

Attached Message Part (157 Bytes)

Hi David,

Sorry to hear. Has anyone followed up with you yet?

I’ve continued to dig in to this in my spare time and I’ve found the issue. It’s a use-after-free, rather than any sort of memory smashing. ORC is currently failing to deregister the EH-frame section when the JIT is torn down (but is deallocating the memory for it). Normally that’s not disastrous (though it does leave bogus EH frames in memory), but in your case the thrown exception tears down the JIT itself, so the unwinder ends up reading eh-frames that have been deallocated. I’m working on a fix to have the JIT properly deregister EH frames, but one workaround for now would be to make sure you catch the exception before it causes the JIT stack to be destructed.

Cheers,
Lang.

Nobody has responded yet and at this point I’m not holding my breath. I sent a second email asking if I had the correct procedure for requesting an account, etc, but there was no response to that either.

I think I can apply that workaround to my project. I have a bunch of tests to which will have share a statically allocated JIT–not ideal but not too awful I don’t think.

Per Tanya’s email from yesterday :

Philip,

Unfortunately there was a week where the admins were not available for creating accounts. We now have a team of 5 people who are set up to create bugzilla accounts so I don’t expect this to be issue going forward.

Thanks,
Tanya