LLVM currently has (and has had for a long time) some issues when it’s used as a shared library in a plugin-like setting. I would like to fix them, and since the required changes are subtle with a wide effect I’m raising them here.
The issues I know of fall into two broad categories:
- Life-time and cleanup of global variables.
- Lack of isolation of command-line options and option values.
The second category is important but requires some major changes to the CommandLine library. These (and related) CommandLine issues have been brought up regularly over the years by different people and experience shows that it is a difficult topic that deserves its own treatment. I have some concrete ideas, but for this thread I want to focus on the first category, which is conceptually simpler.
The proposed changes are here:
- ⚙ D128166 ManagedStatic: Destroy from destructor
- ⚙ D128167 CommandLine: Unregister options in the destructor
For the category of life-time and cleanup issues around global variables, there are two issues I am aware of:
- Cleanup of ManagedStatics
- Cleanup of cl::Option (cl::opt etc.)
ManagedStatics are currently added to a global linked list during construction, which is then walked for destruction during
llvm_shutdown, which must be called explicitly.
There are various different angles of this which cause problems in the plugin setting:
- There is no clear point at which llvm_shutdown can be called. A plugin does not know whether there are any other users of LLVM in the same process, so it cannot safely call llvm_shutdown. Even if there are no other users of LLVM in the process, it is possible for the plugin .so to be unloaded without unloading the LLVM .so. But if the plugin doesn’t call llvm_shutdown, then cleanup doesn’t happen and we get memory leaks.
- If the plugin itself uses ManagedStatics, then the corresponding destructors may have to run at a different time than the ManagedStatics used by LLVM itself, since the plugin may be unloaded without LLVM itself being unloaded. Note that this issue affects LLVM itself when built with
BUILD_SHARED_LIBS=ON(i.e., the not-really-supported build options in which every library is linked into its own shared object).
The solution is to use native language features in a standard way and simply add a destructor to ManagedStatic. The compiler and linker together ensure that the destructor is run at the correct time, i.e. .when the .so containing the ManagedStatic is unloaded.
One side-effect of this change is that the order in which destructors are run may change. To support that, I am adding a
ManagedStatic<T>::peek()method which returns a pointer to the static or
nullptrif it has already been destroyed (or if it has never been constructed). The intention is to use this method when global destructors have dependencies between each other, to avoid re-constructing a static that has already been destroyed.
At least LLD relies on being able to exit a process uncleanly for faster teardown, while still enjoying the side-effects of
llvm_shutdown. This feels very underhanded to me, so my patch also introduces an explicit
llvm_fast_shutdownfunction which explicitly invokes only the side-effects that LLD wants to have.
cl::Option is the base class for cl::opt and friends. It registers itself with the global options parser in the constructor, but currently does not unregister itself in the destructor.
This is an obvious problem because it leaves dangling pointers around when the plugin .so is unloaded but the LLVM .so isn’t. Furthermore, if the plugin is then loaded again, initializing the command-line options it carries fails because they’re already registered.
Again, the solution is to use native language features in a standard way and simply unregister command-line options there.
Aside: One may ask whether it is legitimate for a plugin to define its own command-line option, since plugins don’t “have access to” the command-line in any real sense, and one expects the process command-line to be parsed before any plugins are loaded.
My answer to that is yes. Plugins can be rather complex and consist of multiple libraries, some of which may also be used in command-line utitilies. For example, in our concrete use case we have an LLPC library which contains a graphics shader compiler built on LLVM. Our Vulkan driver links against the LLPC library, but we also have standalone command-line tools (mostly used during development) which link against the LLPC library.
As stated, for the issues to occur you really need to do two things at once:
- You need to use LLVM from a “plugin”, i.e. some shared object that gets dynamically loaded and unloaded during the runtime of a process.
- The plugin needs to link against LLVM dynamically, so that (1) you may end up in a situation where LLVM is shared by multiple users within the same process and/or (2) the plugin may be unloaded without LLVM also being unloaded.
My conrete use case is graphics drivers. Drivers for all major graphics APIs are loaded at runtime when a graphics context is created, based on the GPU(s) present on the user’s system, and those drivers include a shader compiler (which is often built on LLVM).
Depending on what exactly an application is doing, the same driver may be loaded and unloaded multiple times during a process runtime – for example, because the application first probes which GPUs and drivers are present, and then creates a context only later on. The driver(s) need to be loaded during the probing phase and are then unloaded e.g. by the Vulkan loader.
Where LLVM is ultimately packaged in a binary product, static linking with symbol hiding is often an option. It bypasses all these issues. However, at least Linux distributions usually require that large libraries such as LLVM use shared linking, so we can’t always use this workaround.
I have looked for other alternatives, but ultimately I haven’t found any that really solve the problems. For example, at some point I considered registering
llvm_shutdown as an atexit handler. The problem with this approach is that it doesn’t work when
ManagedStatics are defined in different shared objects that are unloaded at different times.
Ultimately, the proposed changes move us more towards using native language features and following best practices, i.e., using constructors and destructors in a way where the destructor properly tears down all state associated with the object. All of that is a good thing in my view.
The only technical downside of the changes that I can see is that more global destructors are run during shared library unload/process exit. My view is that if process shutdown time is something that the community cares about for some tools (like, apparently, LLD), we should systematically bypass all process shutdown explicitly in those tools, via a combination of an explicit
llvm_fast_shutdown() call and unclean exit.