dlsym() and RTLD_FIRST

Can someone explain this flag to me? I’ve read the documentation, but it’s still not clear to me. If you ask dlsym() to search some module X, why would it ever search modules other than X?

The reason I ask about this is that llvm support library already has a DynamicLibrary class whose purpose almost exactly matches what we’re using the Host::DynamicLibrary related functions for. However, it doesn’t use the RTLD_FIRST flag, and so I’m not sure what the implications are of us using it and deleting our own DynamicLibrary code.

I found this old email from Greg online: http://comments.gmane.org/gmane.comp.debugging.lldb.devel/305
His claim is that without RTLD_FIRST:

> FileSpec plugin_spec ("/plugins/disassembler.dylib");
> void *lib_handle = Host::DynamicLibraryOpen (plugin_spec, error);
> if (lib_handle)
> {
>     void *init_callback = Host::DynamicLibraryGetSymbol (lib_handle, "InitPlugin", error);
>     if (init_callback)
>     {
>         // Now on linux we might have a function that comes from another shared library...
>     }
> }

The documentation is indeed fairly scarce on the subject of how things might go wrong without the flag… but the description seems compelling enough that I’d like to keep it…

Can someone explain this flag to me?

It says "only look in this binary, don't look in any others. We are looking for a plug-in initialization function and we don't want to get one back from another dylib.

As Enrico said, the email from a while back details this:

http://comments.gmane.org/gmane.comp.debugging.lldb.devel/305

I've read the documentation, but it's still not clear to me. If you ask dlsym() to search some module X, why would it ever search modules other than X?

I don't know but it does.

The reason I ask about this is that llvm support library already has a DynamicLibrary class whose purpose almost exactly matches what we're using the Host::DynamicLibrary related functions for. However, it doesn't use the RTLD_FIRST flag, and so I'm not sure what the implications are of us using it and deleting our own DynamicLibrary code.

It would be nice if we could specify this flag so we either find the symbol from libx.dylib or we don't. We don't want to find the symbol in liby.dylib and call it in our case.

This seems like the only case we ever want, so I’m going to post a patch to LLVM’s DynamicLibrary class to use RTLD_FIRST on Apple, and a similar method of checking the module filespec on other platforms, and see if they accept it. If so, I will convert our Plugin code to use LLVM’s DynamicLibrary and then delete our DynamicLibrary

Sounds good to me. Hopefully if they don't want that they might accept an extra boolean argument that can specify to only look in the current shared library and then we can switch over to using LLVM's DynamicLibrary.

I actually already got the patch put together. I posted in another thread. I don’t know how to test plugin code, so if you or someone else there is willing to help out and check that it works as expected, I would appreciate it.

The review is up on the LLVM side. One point which was raised, and which I agree with, is that the presence of the string makes the class much heavier. This string is only needed to mimic MacOSX’s RTLD_FIRST behavior on other posix platforms. However, going back through the history of when this was added, I never actually saw a use case from anyone saying “we need this on Linux”. See the full original thread at [1]. But the TL;DR is that the flag is nice to have on MacOSX, and the filename comparison was added to Linux to maintain parity.

If nobody actually knows of a specific example of why this is necessary on Linux, can we just remove this behavior on Linux? My understanding is that the only thing which will change by removing this for Linux is the following: Imagine a plugin X is loaded, and X has a library dependency on Y and Z. X doesn’t contain the plugin Initialize or Terminate symbol, but Y or Z does. With the filename comparison code, LoadPlugin would fail, and without it, it would succeed and use the symbol found in Y or Z. I can understand that with the comparison the algorithm is a bit better, but it seems such an extremely unusual edge case that I don’t think it’s a big deal to remove it from the Linux side.

Thoughts?

[1] - http://thread.gmane.org/gmane.comp.debugging.lldb.devel/300/focus=302

Probably the way I’d look at this right now is that support in Linux is a bit dicey and we’re doing our best to stabilize (starting with single path for remote/local debugging, and making that stable and fast). In an effort to stabilize, I’d prefer to limit how much code change we do on the Linux end until we have a more stable product.

So while we could potentially take that out, I’d rather avoid making changes just because it might be simpler, as it might also add yet another error scenario on the Linux side. Right now I value similarity to MacOSX execution over code reduction. Once we’re a lot more stable on the Linux side, I’d be much more interested in revisiting with some actual use cases to see diffs in performance and scope of usage.

Just my 2 cents…

Just as a counterpoint, unless I’m misunderstanding this code, I don’t see it actually having a noticeable impact on stability. The search limiting will only be a factor in a case where you attempt to load something that isn’t a valid plugin. It’s already an error path. In fact, this code worked fine before the change was made, and was only made to imitate what appears to have been an optimization that was Mac-specific. The change for Mac doesn’t seem to have been strictly necessary either, but just an optimization. It’s actually not an optimization for Linux, because the dynamic loader will still search every module on linux, it will just fail anyway.

I guess the thing to do is make sure we’re certain we understand the behavior, which is perhaps best captured in a test. (i.e. test it with the RTLD_FIRST behavior where it does something, then verify it does something different without the flag. Then, once we agree it is not useful behavior for us, look at removing it).

By valid plugin, you’re referring to shared libraries, right? (What plugins are we referring to here, at what load point?)

That part is a good question, which I don’t totally understand. There’s a function Debugger::LoadPlugin() though, which accepts a path to a plugin to load. It’s called there. This also appears to be exposed through the “plugin load” command.

Ah ok.

It’s worth figuring out what it does (really) before we consider removing it.

I think my understanding of what it does is correct, but maybe Greg or someone can confirm. Basically, it tries to dlopen() the module at the path specified, and then search for the symbols LLDBPluginInitialize and LLDBPluginTerminate. If it finds them, it calls them. If it doesn’t, the plugin load fails. According to the documentation of dladdr(), it appears that the process for locating this symbol involves first searching the module specified in the argument to dlopen(), and then searching any dependent modules. If it is found in any of these, it succeeds. This optimization (using RTLD_FIRST and the filename comparison), causes this search to fail if the symbol is found in a dependent module, but not the original module.

I will try to verify that this is correct with someone who knows more than me about Linux, Mac, and dynamic linking on these platforms, but if correct then it doesn’t seem like there is any risk to removing this. That said, I’m interested in who is actually use these plugins. The best way to find out if it’s going to break something is to talk to the people who depend on this code.

I don’t know if anyone is actually relying on the plugin feature at this point

I believe when I first wrote it it was meant to allow people to write LLDB commands using C/C++ libraries that they already had available (e.g. for purposes like data analysis)
I also believe it didn’t find practical usage, and that it is also not documented anywhere (except looking at source code, that is!) - the One True Way ™ these days is to use Python commands

Ideally, I have always been hoping to extend this to also allow loading other kind of plugin entities (data formatters, for instance - but if one were to dream big, also frame/thread format keywords), but never gotten around to it

tl;dr I don’t think anyone is actually using these plugins, would not want them to be removed though, because I have plans for their future evolution

That was actually one of the main questions I wanted answered: Who is using them? If the answer is truly nobody, which I kind of suspected, then I don’t think there’s any harm in removing this optimization from the linux side.

Yeah given all this I’m not sure we buy anything on the Linux side by keeping it in.

I think my understanding of what it does is correct, but maybe Greg or someone can confirm. Basically, it tries to dlopen() the module at the path specified, and then search for the symbols LLDBPluginInitialize and LLDBPluginTerminate. If it finds them, it calls them. If it doesn't, the plugin load fails. According to the documentation of dladdr(), it appears that the process for locating this symbol involves first searching the module specified in the argument to dlopen(), and then searching any dependent modules. If it is found in any of these, it succeeds. This optimization (using RTLD_FIRST and the filename comparison), causes this search to fail if the symbol is found in a dependent module, but not the original module.

Bingo. We don't want to call any other version of LLDBPluginInitialize or LLDBPluginTerminate from any other plug-in. It isn't clear that the dynamic linker sticks to dependent modules, I would need to check on that. It might search all loaded shared libraries in the current process. Not sure if that differs between Mac and Linux. Probably not.

I will try to verify that this is correct with someone who knows more than me about Linux, Mac, and dynamic linking on these platforms, but if correct then it doesn't seem like there is any risk to removing this. That said, I'm interested in who is actually use these plugins. The best way to find out if it's going to break something is to talk to the people who depend on this code.

Again, we _only_ want to call LLDBPluginInitialize or LLDBPluginTerminate from the exact shared library we are trying to ask for the symbols from, not from anywhere else.

Greg

This should be relatively easy to test.

Again, we only want to call LLDBPluginInitialize or LLDBPluginTerminate from the exact shared library we are trying to ask for the symbols from, not from anywhere else.

We can try to make two plugins, and have one of the symbols missing. Then try this scenario, and verify we don’t get cross-over.

The documentation of dlsym() says this:

Combined with the fact that plugins are essentially a dead codepath (going by Enrico’s earlier comment), I think it’s probably ok to just remove this and use LLVM’s default DynamicLibrary loader. Any objections?