Segmentation faults when mixing versions of LLVM

Is it a known or possibly accepted problem that when multiple versions of LLVM have been used to compile a software and its linked libraries, then there occur runtime errors which seem to be related to mixing different versions of LLVM’s libs?

Specifically, we are trying to find the right approach to solve this bug on Blender (Cycles renderer), Mesa and OSL (all three of which can be compiled with LLVM).

I could not find any previous discussion of the subject, so I would like to know what would be the correct solution of this problem from LLVM’s perspective.

Thank you!

There was a similar issue, but with several LLVM libraries with the same version:

Thank you. The thread seems to concentrate on linking to the same version, but mentions the problem of this thread under Related topics; I think LLVM as a shared library - #31 by _sean_silva comes very close.

From that, I understand that LLVM is currently not designed to work as a shared library, implying that canonically, it has to be linked statically. Is that correct in your opinion?

I believe that it is totally fine to link against LLVM as a shared library. I did that today several times. The issue is to link against LLVM several times.

Then I’m afraid I do not understand where you suggest the original mistake lies. What are Blender and Mesa (in this example) supposed to do differently?

I think the issue is that there are symbol clashes if you for example build mesa with llvm15 and a graphical program (like Blender) pulls in something with llvm14.

Then there will be symbol clashes and the program will crash because of the ambiguity.

For static linking you can solve this by hiding the llvm symbols.

However it would be really nice if dymanic linking also worked.
For that I think that the symbols need to easily be namespaced.

So at complie time every llvm15 function would be in the “llvm15” namespace.
IE you would have llvm15::func_A and llvm14::func_A then there would be no ambiguity.
This would work for forks of llvm as well. So ROCM could use the “rocm_llvm15” namespace for example.

This probably means however that the build system both for llvm and other projects would need to change so that the namespace magic happens at compile time. At least that is what I think should happen as I don’t think it is a good idea for people to manually specify the llvm library namespace in the code. It should happen in the configure step in CMake or Meson where the user either can just let it be automatically chosen for them or manually specify the namespace and library they want to use.

The question is then if a solution like this is something the llvm project would like to implement to make dynamic linking less error prone for larger project and ecosystems.

I remembered that BoringSSL did something in that area. Indeed they support to prefix all symbols, but it is written in C.

https://boringssl.googlesource.com/boringssl/+/HEAD/BUILDING.md#building-with-prefixed-symbols

Except for the C APIs, everything is probably hidden in llvm, clang, mlir, lld, lldb? namespaces. Long term there is a need to customise the namespaces with support from the build system.

Agh!
I edited my message, but now it seems like it got deleted?

EDIT: Ah, seems like I triggered the spam filter. It is hidden for the moment.

Hidden in namespaces in what way?
I would think that the same issue would happen when using C++ APIs if you have multiple dymanically linked llvm version pulled in at runtime, right?

This more in the dream area than reality, but one library uses llvmFoo and the other uses llvmBar. There will be no symbol clashes.

I’d agree, that my RFC is not related to mixing LLVM versions.

My issue turned out to be somewhat a copy/paste error in RocM. It happens, because RocM explicitly registers an option and if that option is already registered, the missing CommandLine isolation will cause LLVM to abort the program due to the duplicated option.

Fortunately comgr-objdump is a copy of llvm-objdump and actually does not need to register that option, so it can be removed. On the other hand CommandLine isolation is being worked on for LLVM 16, so this shortcoming could be solved soon.
https://reviews.llvm.org/D129129
https://reviews.llvm.org/D129134

1 Like

Symbol versioning is the better alternative to that if you really want to go down that road.

I was thinking that this problem can perhaps better be solved and actually should be solved in the linker itself.

The symbols in a unit (e.g. Blender) are unambiguous from compile time on. Or at least that can be demanded from the author of the unit, as they must be aware of the includes and libraries which they are using directly and can be asked to make sure there are no clashes. On the other hand they shouldn’t be asked to make sure there are no clashes in indirect dependencies from the includes (e.g. Mesa), because the latter are supposed to be black boxes to them.

The information to resolve the symbols unambiguously at runtime is there. Every time a symbol is resolved, it can be known

  1. which unit wants the symbol resolved
  2. therefore which set of SOs (with unique symbols) are candidate resolutions

It just has to be handed to the dynamic linker (by the compiler) and then used correctly!

I’m completely unfamiliar with how LLVM’s linker works and how difficult this would be to realize, but at least from the conceptual standpoint it could be how the linker behaves consistently with the idea that libraries (e.g. Mesa) are supposed to be black boxes to the user. Am I missing something?

How do you propose that would work in practice?
I haven’t used that before, but it seems to be more geared towards a single library being able to provide multiple functions of the same name and put them behind a version check.

We already use symbol versioning, maybe we are doing something wrong there?