Is it a known or possibly accepted problem that when multiple versions of LLVM have been used to compile a software and its linked libraries, then there occur runtime errors which seem to be related to mixing different versions of LLVM’s libs?
Specifically, we are trying to find the right approach to solve this bug on Blender (Cycles renderer), Mesa and OSL (all three of which can be compiled with LLVM).
I could not find any previous discussion of the subject, so I would like to know what would be the correct solution of this problem from LLVM’s perspective.
I think the issue is that there are symbol clashes if you for example build mesa with llvm15 and a graphical program (like Blender) pulls in something with llvm14.
Then there will be symbol clashes and the program will crash because of the ambiguity.
For static linking you can solve this by hiding the llvm symbols.
However it would be really nice if dymanic linking also worked.
For that I think that the symbols need to easily be namespaced.
So at complie time every llvm15 function would be in the “llvm15” namespace.
IE you would have llvm15::func_A and llvm14::func_A then there would be no ambiguity.
This would work for forks of llvm as well. So ROCM could use the “rocm_llvm15” namespace for example.
This probably means however that the build system both for llvm and other projects would need to change so that the namespace magic happens at compile time. At least that is what I think should happen as I don’t think it is a good idea for people to manually specify the llvm library namespace in the code. It should happen in the configure step in CMake or Meson where the user either can just let it be automatically chosen for them or manually specify the namespace and library they want to use.
The question is then if a solution like this is something the llvm project would like to implement to make dynamic linking less error prone for larger project and ecosystems.
I’d agree, that my RFC is not related to mixing LLVM versions.
My issue turned out to be somewhat a copy/paste error in RocM. It happens, because RocM explicitly registers an option and if that option is already registered, the missing CommandLine isolation will cause LLVM to abort the program due to the duplicated option.
I was thinking that this problem can perhaps better be solved and actually should be solved in the linker itself.
The symbols in a unit (e.g. Blender) are unambiguous from compile time on. Or at least that can be demanded from the author of the unit, as they must be aware of the includes and libraries which they are using directly and can be asked to make sure there are no clashes. On the other hand they shouldn’t be asked to make sure there are no clashes in indirect dependencies from the includes (e.g. Mesa), because the latter are supposed to be black boxes to them.
The information to resolve the symbols unambiguously at runtime is there. Every time a symbol is resolved, it can be known
which unit wants the symbol resolved
therefore which set of SOs (with unique symbols) are candidate resolutions
It just has to be handed to the dynamic linker (by the compiler) and then used correctly!
I’m completely unfamiliar with how LLVM’s linker works and how difficult this would be to realize, but at least from the conceptual standpoint it could be how the linker behaves consistently with the idea that libraries (e.g. Mesa) are supposed to be black boxes to the user. Am I missing something?
How do you propose that would work in practice?
I haven’t used that before, but it seems to be more geared towards a single library being able to provide multiple functions of the same name and put them behind a version check.