Segfault when using llvm-3.6 and OpenGL at the same time on Linux (with mesa, which uses llvm-3.4)

Hello there,

tl;dr:
Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same time in the same process (while llvm-3.6 is used from a linked shared library and llvm-3.4 is dlopen’ed - with a strange detail: Especially if -rdynamic is used when linking the program).

If so, is there a workaround?
If not, can it be fixed in llvm-3.6?

Long story:
I am writing a program that creates binaries using llvm-3.6 and has a Qt frontend.
I noticed that, when running it on Ubuntu Linux (reproduced on 32 and 64 bit), the program crashes when Qt initializes. The stack trace reveals that it happens when initializing OpenGL, inside the dynamic library initalization of the driver:

I reproduced this with Ubuntu Linux 14.04 32 bit and 64 bit in VirtualBox (video driver “vboxvideo”) and VMware Player (using some VMware video driver, I guess) and on my bare machine with a NVIDIA mobile graphics card (GeForce GTX 780M) using the nouveau driver. However, as I suspected, when installing the proprietary NVIDIA driver, the segfault does not happen - my guess is because it does not use the parts of mesa that uses llvm-3.4.

Here is the stack trace:

Program received signal SIGSEGV, Segmentation fault.
0xb606fe4c in llvm::LayoutAlignElem::get(llvm::AlignTypeEnum, unsigned int, unsigned int, unsigned int) () from /usr/lib/i386-linux-gnu/libLLVM-3.4.so.1
(gdb) bt
#0 0xb606fe4c in llvm::LayoutAlignElem::get(llvm::AlignTypeEnum, unsigned int, unsigned int, unsigned int) () from /usr/lib/i386-linux-gnu/libLLVM-3.4.so.1
#1 0xb5ae8590 in ?? () from /usr/lib/i386-linux-gnu/libLLVM-3.4.so.1
#2 0xb7fecd77 in call_init (l=, argc=argc@entry=1, argv=argv@entry=0xbfffefb4, env=env@entry=0xbfffefbc) at dl-init.c:78
#3 0xb7fece64 in call_init (env=0xbfffefbc, argv=0xbfffefb4, argc=1, l=) at dl-init.c:36
#4 _dl_init (main_map=main_map@entry=0x8267e20, argc=1, argv=0xbfffefb4, env=0xbfffefbc) at dl-init.c:126
#5 0xb7ff0e8e in dl_open_worker (a=0xbfffe59c) at dl-open.c:577
#6 0xb7fecc26 in _dl_catch_error (objname=objname@entry=0xbfffe594, errstring=errstring@entry=0xbfffe598, mallocedp=mallocedp@entry=0xbfffe593, operate=operate@entry=0xb7ff0b90 <dl_open_worker>,
args=args@entry=0xbfffe59c) at dl-error.c:187
#7 0xb7ff0684 in _dl_open (file=0xbfffe7b4 “/usr/lib/i386-linux-gnu/dri/swrast_dri.so”, mode=-2147483390, caller_dlopen=0xb7c22e38, nsid=, argc=1, argv=0xbfffefb4, env=0xbfffefbc)
at dl-open.c:661
#8 0xb7f65cbc in dlopen_doit (a=0xbfffe750) at dlopen.c:66
#9 0xb7fecc26 in _dl_catch_error (objname=0x825bb84, errstring=0x825bb88, mallocedp=0x825bb80, operate=0xb7f65c30 <dlopen_doit>, args=0xbfffe750) at dl-error.c:187
#10 0xb7f6637c in _dlerror_run (operate=operate@entry=0xb7f65c30 <dlopen_doit>, args=args@entry=0xbfffe750) at dlerror.c:163
#11 0xb7f65d71 in __dlopen (file=0xbfffe7b4 “/usr/lib/i386-linux-gnu/dri/swrast_dri.so”, mode=258) at dlopen.c:87
#12 0xb7c22e38 in ?? () from /usr/lib/i386-linux-gnu/mesa/libGL.so.1
#13 0xb7c22488 in ?? () from /usr/lib/i386-linux-gnu/mesa/libGL.so.1
#14 0xb7bffd6e in ?? () from /usr/lib/i386-linux-gnu/mesa/libGL.so.1
#15 0xb7bfbd62 in glXGetFBConfigs () from /usr/lib/i386-linux-gnu/mesa/libGL.so.1
#16 0xb7bfc8de in glXChooseFBConfigSGIX () from /usr/lib/i386-linux-gnu/mesa/libGL.so.1
#17 0xb7fa1f76 in fgChooseFBConfig () from /usr/lib/i386-linux-gnu/libglut.so.3
#18 0xb7fa2225 in fgOpenWindow () from /usr/lib/i386-linux-gnu/libglut.so.3
#19 0xb7fa0daa in fgCreateWindow () from /usr/lib/i386-linux-gnu/libglut.so.3
#20 0xb7fa2943 in glutCreateWindow () from /usr/lib/i386-linux-gnu/libglut.so.3
#21 0x0808b6c2 in main ()

I could drill it down to a testcase like this:

(You can view and clone the source from my github repository:
https://github.com/daniel-kun/llvmcrash )

daniel@ubuntu32-dev:~/projects/llvmcrash/app$ cat crashme.cpp
// Shared Library that is linked to the program “app”
#include <llvm/IR/LLVMContext.h>

llvm::LLVMContext c;

void foo () {
}
daniel@ubuntu32-dev:~/projects/llvmcrash/app$ cat app.cpp
// Main program that links “libcrashme.so” and initializes OpenGL
// - which in turn uses mesa, which uses llvm-3.4
#include “GL/freeglut.h”
#include “GL/gl.h”

void foo(); // Defined in libcrashme.so

// Dummy OpenGL display func
void display ()
{
}

int main(int argc, char** argv)
{
foo(); // Call our dummy llvm function, that actually does nothing. This is required for the crash
glutInit(&argc, argv);
glutInitDisplayMode(GLUT_SINGLE);
glutInitWindowSize(500, 500);
glutInitWindowPosition(100, 100);
glutCreateWindow(“This crashes when using an llvm lib”);
// Crash happens here:
glutDisplayFunc(display);
glutMainLoop();
}

daniel@ubuntu32-dev:~/projects/llvmcrash/app$ cat compile_and_run.sh
rm -rvf libcrashme.so app
g++ -shared -o libcrashme.so crashme.cpp llvm-config-3.6 --cxxflags &&
g++ -rdynamic -o app app.cpp -L. -lcrashme -lGL -lglut llvm-config-3.6 --ldflags --libs -lpthread -ldl -lncurses &&
LD_LIBRARY_PATH=. ./app
daniel@ubuntu32-dev:~/projects/llvmcrash/app$ ./compile_and_run.sh
removed ‘libcrashme.so’
removed ‘app’
./compile_and_run.sh: line 4: 6160 Segmentation fault (core dumped) LD_LIBRARY_PATH=. ./app

Any thoughts to fix this would be very appreciated, since it is a show-stopper for my application.

(Somehow, the -rdynamic seems to make the crash always appear, but I can’t make it get away even without -rdyamic in my real app, so not using -rdynamic did not fix it for me.)

Kind regards!

Daniel Albuschat

Are you using builds with or without exceptions? There is a known
"feature" in libstdc++ that exceptions are unified by class name, not by
object. That breaks dlopen even if all users have RTLD_LOCAL.

The second question would be whether any of the DSOs involved for
LLVM/Clang have different sonames for the different versions.

Joerg

Hello there,

tl;dr:
Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same time
in the same process (while llvm-3.6 is used from a linked shared library
and llvm-3.4 is dlopen'ed - with a strange detail: Especially if -rdynamic
is used when linking the program).
If so, is there a workaround?
If not, can it be fixed in llvm-3.6?

Long story:
I am writing a program that creates binaries using llvm-3.6 and has a Qt
frontend.
I noticed that, when running it on Ubuntu Linux (reproduced on 32 and 64
bit), the program crashes when Qt initializes. The stack trace reveals that
it happens when initializing OpenGL, inside the dynamic library
initalization of the driver:

I reproduced this with Ubuntu Linux 14.04 32 bit and 64 bit in VirtualBox
(video driver "vboxvideo") and VMware Player (using some VMware video
driver, I guess) and on my bare machine with a NVIDIA mobile graphics card
(GeForce GTX 780M) using the nouveau driver. However, as I suspected, when
installing the proprietary NVIDIA driver, the segfault does not happen - my
guess is because it does not use the parts of mesa that uses llvm-3.4.

The NVIDIA proprietary driver does not use mesa at all. Also based on
your backtrace, you are using llvmpipe and not nouveau.

I'm not a linking expert, but I have seen and fixed a few of these issues
with Mesa. In most cases the fix is to make sure that the llvm symbols
in the mesa driver and also the llvm symbols in your application aren't
being exported as global symbols.

-Tom

> Hello there,
>
> tl;dr:
> Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same time
> in the same process (while llvm-3.6 is used from a linked shared library
> and llvm-3.4 is dlopen'ed - with a strange detail: Especially if
-rdynamic
> is used when linking the program).
> If so, is there a workaround?
> If not, can it be fixed in llvm-3.6?

The NVIDIA proprietary driver does not use mesa at all. Also based on
your backtrace, you are using llvmpipe and not nouveau.

Yeah, the backtrace was created from within VirtualBox. That driver, and
the nouveau driver, cause the same issue. The VMWare driver, too.

I'm not a linking expert, but I have seen and fixed a few of these issues
with Mesa. In most cases the fix is to make sure that the llvm symbols
in the mesa driver and also the llvm symbols in your application aren't
being exported as global symbols.

How do I have control about how llvm symbols are being exported?

nm reports no exported symbols matching llvm:

daniel@ubuntu32-dev:~/projects/llvmcrash$ nm libcrashme.so | grep llvm
         U _ZN4llvm11LLVMContextC1Ev
         U _ZN4llvm11LLVMContextD1Ev

Thanks!
Daniel

> tl;dr:
> Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same time
> in the same process (while llvm-3.6 is used from a linked shared library
> and llvm-3.4 is dlopen'ed - with a strange detail: Especially if
-rdynamic
> is used when linking the program).

Are you using builds with or without exceptions? There is a known
"feature" in libstdc++ that exceptions are unified by class name, not by
object. That breaks dlopen even if all users have RTLD_LOCAL.

llvm-3.4 is from the official Ubuntu 14.04 repo. llvm-3.6 is from
http://llvm.org/apt.
How do I know / find out, whether they are compiled with or without
exceptions?

My guess is that they are compiled without exceptions, since it is the
default.
My app however, does use exceptions. Is this even possible?

The second question would be whether any of the DSOs involved for
LLVM/Clang have different sonames for the different versions.

Clang is not involved here. I am compiling with g++ and does only use llvm,
not Clang.
llvm does use some common libraries like pthreads, dl and ncurses. Are
those related, or do you mean libraries actually involved in code
generation?

Thanks!
Daniel Albuschat

> > Hello there,
> >
> > tl;dr:
> > Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same time
> > in the same process (while llvm-3.6 is used from a linked shared library
> > and llvm-3.4 is dlopen'ed - with a strange detail: Especially if
> -rdynamic
> > is used when linking the program).
> > If so, is there a workaround?
> > If not, can it be fixed in llvm-3.6?
>
> The NVIDIA proprietary driver does not use mesa at all. Also based on
> your backtrace, you are using llvmpipe and not nouveau.
>

Yeah, the backtrace was created from within VirtualBox. That driver, and
the nouveau driver, cause the same issue. The VMWare driver, too.

> I'm not a linking expert, but I have seen and fixed a few of these issues
> with Mesa. In most cases the fix is to make sure that the llvm symbols
> in the mesa driver and also the llvm symbols in your application aren't
> being exported as global symbols.

How do I have control about how llvm symbols are being exported?

Mesa uses linker scripts to control this.

nm reports no exported symbols matching llvm:

daniel@ubuntu32-dev:~/projects/llvmcrash$ nm libcrashme.so | grep llvm
         U _ZN4llvm11LLVMContextC1Ev
         U _ZN4llvm11LLVMContextD1Ev

What about swrast_dri.so from Mesa?

-Tom

> > tl;dr:
> > Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same time
> > in the same process (while llvm-3.6 is used from a linked shared library
> > and llvm-3.4 is dlopen'ed - with a strange detail: Especially if
> -rdynamic
> > is used when linking the program).
>
> Are you using builds with or without exceptions? There is a known
> "feature" in libstdc++ that exceptions are unified by class name, not by
> object. That breaks dlopen even if all users have RTLD_LOCAL.
>

llvm-3.4 is from the official Ubuntu 14.04 repo. llvm-3.6 is from
http://llvm.org/apt.
How do I know / find out, whether they are compiled with or without
exceptions?

nm -DC $lib | grep "typeinfo for" would be a start (RTTI, not
exceptions).

My guess is that they are compiled without exceptions, since it is the
default.
My app however, does use exceptions. Is this even possible?

If exceptions don't cross library boundaries, yes.

> The second question would be whether any of the DSOs involved for
> LLVM/Clang have different sonames for the different versions.
>

Clang is not involved here. I am compiling with g++ and does only use llvm,
not Clang.
llvm does use some common libraries like pthreads, dl and ncurses. Are
those related, or do you mean libraries actually involved in code
generation?

No, only the LLVM libraries. System libraries don't count.

Joerg

No llvm symbols, either:

daniel@ubuntu32-dev:~/projects/omni2/cbuild$ nm -DC
/usr/lib/i386-linux-gnu/dri/swrast_dri.so | grep llvm
         U draw_create_no_llvm
         U draw_get_shader_param_no_llvm

I'm out of ideas now... can I somehow log the pathes of libraries loaded in
a process? Maybe I can find out whether llvm-3.4 loads some llvm-3.5
libraries or so. On the other hand, why would that only happen when
llvm-3.5 is actually used? Shouldn't it solely depend on PATH /
LD_LIBRARY_PATH which library is used?

It seems that dynamic libraries are far more complicated / far less
straight forward on Linux than I thought...

Greetings,
Daniel Albuschat

> >
> > > > Hello there,
> > > >
> > > > tl;dr:
> > > > Is it a known behaviour that using llvm-3.4 and llvm-3.6 at the same
> time
> > > > in the same process (while llvm-3.6 is used from a linked shared
> library
> > > > and llvm-3.4 is dlopen'ed - with a strange detail: Especially if
> > > -rdynamic
> > > > is used when linking the program).
> > > > If so, is there a workaround?
> > > > If not, can it be fixed in llvm-3.6?
> > >
> > > The NVIDIA proprietary driver does not use mesa at all. Also based on
> > > your backtrace, you are using llvmpipe and not nouveau.
> > >
> >
> > Yeah, the backtrace was created from within VirtualBox. That driver, and
> > the nouveau driver, cause the same issue. The VMWare driver, too.
> >
> >
> > > I'm not a linking expert, but I have seen and fixed a few of these
> issues
> > > with Mesa. In most cases the fix is to make sure that the llvm symbols
> > > in the mesa driver and also the llvm symbols in your application aren't
> > > being exported as global symbols.
> >
> >
> > How do I have control about how llvm symbols are being exported?
>
> Mesa uses linker scripts to control this.
>
> > nm reports no exported symbols matching llvm:
> >
> > daniel@ubuntu32-dev:~/projects/llvmcrash$ nm libcrashme.so | grep llvm
> > U _ZN4llvm11LLVMContextC1Ev
> > U _ZN4llvm11LLVMContextD1Ev
>
> What about swrast_dri.so from Mesa?

Do you have a test program you can share?

No llvm symbols, either:

daniel@ubuntu32-dev:~/projects/omni2/cbuild$ nm -DC
/usr/lib/i386-linux-gnu/dri/swrast_dri.so | grep llvm
         U draw_create_no_llvm
         U draw_get_shader_param_no_llvm

I'm out of ideas now... can I somehow log the pathes of libraries loaded in
a process? Maybe I can find out whether llvm-3.4 loads some llvm-3.5
libraries or so. On the other hand, why would that only happen when
llvm-3.5 is actually used? Shouldn't it solely depend on PATH /
LD_LIBRARY_PATH which library is used?

Did you build a single libLLVM-3.x.so or did you have one shared object
per llvm component.

You can use ldd to see which libraries a shared object depends on.

-Tom