Segmentation faults when mixing versions of LLVM

ManDay · January 14, 2023, 10:47am

Is it a known or possibly accepted problem that when multiple versions of LLVM have been used to compile a software and its linked libraries, then there occur runtime errors which seem to be related to mixing different versions of LLVM’s libs?

Specifically, we are trying to find the right approach to solve this bug on Blender (Cycles renderer), Mesa and OSL (all three of which can be compiled with LLVM).

I could not find any previous discussion of the subject, so I would like to know what would be the correct solution of this problem from LLVM’s perspective.

Thank you!

tschuett · January 14, 2023, 10:54am

There was a similar issue, but with several LLVM libraries with the same version:

ManDay · January 14, 2023, 11:01am

Thank you. The thread seems to concentrate on linking to the same version, but mentions the problem of this thread under Related topics; I think LLVM as a shared library - #31 by _sean_silva comes very close.

From that, I understand that LLVM is currently not designed to work as a shared library, implying that canonically, it has to be linked statically. Is that correct in your opinion?

tschuett · January 14, 2023, 11:10am

I believe that it is totally fine to link against LLVM as a shared library. I did that today several times. The issue is to link against LLVM several times.

ManDay · January 14, 2023, 11:14am

Then I’m afraid I do not understand where you suggest the original mistake lies. What are Blender and Mesa (in this example) supposed to do differently?

DarkDefender · January 14, 2023, 11:23am

I think the issue is that there are symbol clashes if you for example build mesa with llvm15 and a graphical program (like Blender) pulls in something with llvm14.

Then there will be symbol clashes and the program will crash because of the ambiguity.

For static linking you can solve this by hiding the llvm symbols.

However it would be really nice if dymanic linking also worked.
For that I think that the symbols need to easily be namespaced.

So at complie time every llvm15 function would be in the “llvm15” namespace.
IE you would have llvm15::func_A and llvm14::func_A then there would be no ambiguity.
This would work for forks of llvm as well. So ROCM could use the “rocm_llvm15” namespace for example.

This probably means however that the build system both for llvm and other projects would need to change so that the namespace magic happens at compile time. At least that is what I think should happen as I don’t think it is a good idea for people to manually specify the llvm library namespace in the code. It should happen in the configure step in CMake or Meson where the user either can just let it be automatically chosen for them or manually specify the namespace and library they want to use.

The question is then if a solution like this is something the llvm project would like to implement to make dynamic linking less error prone for larger project and ecosystems.

tschuett · January 14, 2023, 11:29am

I remembered that BoringSSL did something in that area. Indeed they support to prefix all symbols, but it is written in C.

https://boringssl.googlesource.com/boringssl/+/HEAD/BUILDING.md#building-with-prefixed-symbols

Except for the C APIs, everything is probably hidden in llvm, clang, mlir, lld, lldb? namespaces. Long term there is a need to customise the namespaces with support from the build system.

DarkDefender · January 14, 2023, 11:33am

Agh!
I edited my message, but now it seems like it got deleted?

EDIT: Ah, seems like I triggered the spam filter. It is hidden for the moment.

DarkDefender · January 14, 2023, 11:49am

Hidden in namespaces in what way?
I would think that the same issue would happen when using C++ APIs if you have multiple dymanically linked llvm version pulled in at runtime, right?

tschuett · January 14, 2023, 11:51am

This more in the dream area than reality, but one library uses llvmFoo and the other uses llvmBar. There will be no symbol clashes.

labre · January 14, 2023, 12:01pm

I’d agree, that my RFC is not related to mixing LLVM versions.

My issue turned out to be somewhat a copy/paste error in RocM. It happens, because RocM explicitly registers an option and if that option is already registered, the missing CommandLine isolation will cause LLVM to abort the program due to the duplicated option.

github.com/RadeonOpenCompute/ROCm-CompilerSupport

comgr-objdump.cpp causes crash if loaded alongside mesa

opened 07:11PM - 20 Dec 22 UTC

xytovl

When using both libamd_comgr.so and mesa with libvulkan_radeon, application (ble…nder-3.3 in my case) crashes with ``` mesa: CommandLine Error: Option 'h' registered more than once! LLVM ERROR: inconsistency in registered CommandLine options Abandon (core dumped) ``` the following patch fixes this for me: ```patch diff -ru ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp --- ROCm-CompilerSupport-rocm-5.3.3.orig/lib/comgr/src/comgr-objdump.cpp 2022-07-28 17:06:14.000000000 +0200 +++ ROCm-CompilerSupport-rocm-5.3.3/lib/comgr/src/comgr-objdump.cpp 2022-12-20 20:03:53.357398486 +0100 @@ -175,9 +175,6 @@ static cl::alias SectionHeadersShort("headers", cl::desc("Alias for --section-headers"), cl::aliasopt(SectionHeaders)); -static cl::alias SectionHeadersShorter("h", - cl::desc("Alias for --section-headers"), - cl::aliasopt(SectionHeaders)); cl::list<std::string> FilterSections("section", ``` If I understand correctly, none of the command line options and aliases defined here are used, but this one alone fixes the issue.

Fortunately comgr-objdump is a copy of llvm-objdump and actually does not need to register that option, so it can be removed. On the other hand CommandLine isolation is being worked on for LLVM 16, so this shortcoming could be solved soon.
https://reviews.llvm.org/D129129
https://reviews.llvm.org/D129134

jrtc27 · January 15, 2023, 12:18am

Symbol versioning is the better alternative to that if you really want to go down that road.

ManDay · January 15, 2023, 7:44am

I was thinking that this problem can perhaps better be solved and actually should be solved in the linker itself.

The symbols in a unit (e.g. Blender) are unambiguous from compile time on. Or at least that can be demanded from the author of the unit, as they must be aware of the includes and libraries which they are using directly and can be asked to make sure there are no clashes. On the other hand they shouldn’t be asked to make sure there are no clashes in indirect dependencies from the includes (e.g. Mesa), because the latter are supposed to be black boxes to them.

The information to resolve the symbols unambiguously at runtime is there. Every time a symbol is resolved, it can be known

which unit wants the symbol resolved
therefore which set of SOs (with unique symbols) are candidate resolutions

It just has to be handed to the dynamic linker (by the compiler) and then used correctly!

I’m completely unfamiliar with how LLVM’s linker works and how difficult this would be to realize, but at least from the conceptual standpoint it could be how the linker behaves consistently with the idea that libraries (e.g. Mesa) are supposed to be black boxes to the user. Am I missing something?

DarkDefender · January 16, 2023, 2:13pm

How do you propose that would work in practice?
I haven’t used that before, but it seems to be more geared towards a single library being able to provide multiple functions of the same name and put them behind a version check.

tstellar · January 17, 2023, 3:35am

We already use symbol versioning, maybe we are doing something wrong there?

Topic		Replies	Views
[RFC] CommandLine: Allow loading >1 library linking to the same libLLVM version LLVM Project llvm-lib	16	1189	January 14, 2023
Linking against LLVM and conflicting with system mesa LLVM Dev List Archives	1	99	June 26, 2017
OpenCL runtimes and LLVM command line options LLVM Dev List Archives	4	104	May 8, 2018
Heads Up: libLLVMCore.a and loadable modules LLVM Dev List Archives	5	88	June 23, 2006
[RFC] Modernize CMake LLVM "Components"/libLLVM Facility LLVM Dev List Archives	10	114	January 5, 2021

Segmentation faults when mixing versions of LLVM

Related Topics