Problem Using libLLVM-3.3.so

We’re using the LLVM 3.3 AArch64 disassembler in the following way. We have built LLVM 3.3 on Linux as a shared library; and have a main program that dynamically loads shared objects (.so libraries). The program is a simulator (though that shouldn’t be relevant to this question), and the shared objects it loads are electronic components that participate in the simulation. If the electronic component happens to be an ARM processor, it will make reference to the LLVM 3.3 shared library – specifically the AArch64 disassembler.

The problem is this. For some simulations, the LLVM shared library seems to take a segfault on exit. It runs correctly, but when the simulator finishes, it crashes on exit.

I’ve traced this back to the LLVM library by running the following experiment – run a known “good” simulator build without any references to LLVM, and observed that it runs correctly. Now rebuild the known “good” shared objects (the electronic components in the simulation), and link to the LLVM shared library. Still no references in the code to LLVM, just linking to the LLVM shared library. This causes the LLVM shared library to be loaded when the simulation is run; and this causes the failure on exit.

Does anybody have any ideas as to why this might be happening?

Rick Sullivan

Carbon Design Systems

125 Nagog Park Rd, Acton, MA 01720

O: +1 (978) 264-7370 | M: +1 (508) 479-3845

image001.jpgimage002.jpgimage003.jpgimage004.jpgimage005.jpgimage006.jpg

Rick Sullivan <ricks@carbondesignsystems.com> writes:

[snip]

The problem is this. For some simulations, the LLVM shared library
seems to take a segfault on exit. It runs correctly, but when the
simulator finishes, it crashes on exit.

[snip]

Does anybody have any ideas as to why this might be happening?

Can you run the application under gdb and obtain a backtrace?

Using a LLVM Debug build may (or may not) help on this endeavour.

Unfortunately, I haven't been able to get the failure to occur in gdb. Our crash handler generates a back trace, but it doesn't supply much information:

0: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler12GetBacktraceEPc+0x2b) [0x8209adb] CrashHandler::GetBacktrace(char*)
1: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler14GenerateReportEv+0x204) [0x820a434] CrashHandler::GenerateReport()
2: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler21DoAllReportGenerationEv+0x1c) [0x820a51c] CrashHandler::DoAllReportGeneration()
3: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_ZN12CrashHandler9GotSignalEv+0x6a6) [0x820ac86] CrashHandler::GotSignal()
4: /o/release/SoCD/mainline/PRODUCT/Linux/bin/Linux//release/socdesigner(_Z14CSignalHandleri+0x6c) [0x820b05c] CSignalHandler(int)
5: /lib/tls/libc.so.6 [0xc249b8]

I've also tried running simulations without using libLLVM-3.3.so, but instead statically linking the required LLVM .a libraries into the component shared objects. This is not an idea solution, because it effectively bundles LLVM into each component, more than doubling the size of each component on disk. However, this also produces a crash with a more meaningful stack trace. Whether this is related to the failures I'm seeing with the LLVM shared object - I don't know. Here is the valgrind report:

      ==15093== Jump to the invalid address stated on the next line
      ==15093== at 0x0: ???
      ==15093== by 0x285C2321: llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~parser() (CommandLine.h:629)
      ==15093== by 0x285C23D9: llvm::cl::opt<(anonymous namespace)::DefaultOnOff, false, llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~opt() (CommandLine.h:1
      ==15093== by 0x285A3BD1: __tcf_5 (DwarfDebug.cpp:85)
      ==15093== by 0x441867: __cxa_finalize (in /lib/tls/libc-2.3.4.so)
      ==15093== by 0x282212F2: (within libCORTEXA9MP.mx_DBG.so)
      ==15093== by 0x28EEBB05: (within libCORTEXA9MP.mx_DBG.so)
      ==15093== by 0x518C41: _dl_close (in /lib/tls/libc-2.3.4.so)
      ==15093== by 0x56DD59: dlclose_doit (in /lib/libdl-2.3.4.so)
      ==15093== by 0x40966D: _dl_catch_error (in /lib/ld-2.3.4.so)
      ==15093== by 0x56E2BA: _dlerror_run (in /lib/libdl-2.3.4.so)
      ==15093== by 0x56DD89: dlclose (in /lib/libdl-2.3.4.so)

Rick Sullivan <ricks@carbondesignsystems.com> writes:

I've also tried running simulations without using libLLVM-3.3.so, but
instead statically linking the required LLVM .a libraries into the
component shared objects. This is not an idea solution, because it
effectively bundles LLVM into each component, more than doubling the
size of each component on disk. However, this also produces a crash
with a more meaningful stack trace. Whether this is related to the
failures I'm seeing with the LLVM shared object - I don't know. Here
is the valgrind report:

      ==15093== Jump to the invalid address stated on the next line
      ==15093== at 0x0: ???
      ==15093== by 0x285C2321: llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~parser() (CommandLine.h:629)
      ==15093== by 0x285C23D9: llvm::cl::opt<(anonymous namespace)::DefaultOnOff, false, llvm::cl::parser<(anonymous namespace)::DefaultOnOff>::~opt() (CommandLine.h:1
      ==15093== by 0x285A3BD1: __tcf_5 (DwarfDebug.cpp:85)
      ==15093== by 0x441867: __cxa_finalize (in /lib/tls/libc-2.3.4.so)
      ==15093== by 0x282212F2: (within libCORTEXA9MP.mx_DBG.so)
      ==15093== by 0x28EEBB05: (within libCORTEXA9MP.mx_DBG.so)
      ==15093== by 0x518C41: _dl_close (in /lib/tls/libc-2.3.4.so)
      ==15093== by 0x56DD59: dlclose_doit (in /lib/libdl-2.3.4.so)
      ==15093== by 0x40966D: _dl_catch_error (in /lib/ld-2.3.4.so)
      ==15093== by 0x56E2BA: _dlerror_run (in /lib/libdl-2.3.4.so)
      ==15093== by 0x56DD89: dlclose (in /lib/libdl-2.3.4.so)

It is crashing while destroying a static instance of cl::opt. The `this'
pointer is null. I'll say memory corruption. Does Valgrind report
something about that? LLVM has a lot of shared state. If you load
several shared libraries, each containing its own copy of LLVM, and they
interact one with each other, bad things may happen.

Also, check that you are initializing LLVM on a sane way, and that you
are terminating it the right way too (make sure that the LLVM objects
you destroy are owned by you, and those that you don't are owned by
LLVM.) Try to bisect the problematic area by removing functionality from
your code.

If you can replicate the crash by loading just one shared library that
uses LLVM plus the valgrind report contains nothing about memory
corruption, file a bug report.