Creating a breakpoint on a target with no process

In my effort to get tests working on Windows, I’ve run across an issue with test\expression_command\timeout\TestCallWithTimeout.py :: TestCallWithTimeout.ExprCommandWithTimeoutsTestCase

This test creates a target and immediately puts a breakpoint on it before attempting to launch the process. Is this something that is supposed to work? BreakpointLocation::ResolveBreakpointSite() contains this line:

Process *process = m_owner.GetTarget().GetProcessSP().get();

if (process == NULL)
return false;

So naturally the breakpoint site cannot be resolved because there is no process. The end result of this is that this breakpoint never gets hit and the test fails.

Presumably this test works on other platforms, so any tips as to where I should look to track down this bug on Windows?

It is the responsibility of the dynamic loader plugin to tell the breakpoints to re-scan for new locations when shared libraries get added to the process. You should do this by collecting a list of the added libraries, and calling:

m_process->GetTarget().ModulesDidLoad(added_list);

How are you adding new modules as they get loaded?

Jim

I actually don’t even have a dynamic loader plugin implemented at all. I wasn’t completely sure what the purpose of it was. I saw that Virgile had implemented one in his original patch to get debugging working on Windows [https://github.com/xen2/lldb/commit/515956244784a9162183a6135068e893ba994532], but it did very little actual work, and in particular does not seem to do anything related to what you are suggesting above.

As for adding new modules when they load, basically this is the entirety of what I do.

Error error;

ModuleSP module = GetTarget().GetSharedModule(module_spec, &error);
module->SetLoadAddress(GetTarget(), module_addr, false, false);

However, as mentioned I don’t do this from a DynamicLoader plugin. Instead I just run this code directly from the same background thread that gets other debug events from the process, such as thread creation, exceptions, etc.

I guess can you elaborate a little bit on the interaction between the DynamicLoader plugin and the process plugin, and the responsibilities of each?

+Virgile Bello

Actually maybe Virgile can respond and explain the purpose of the DynamicLoaderWindows plugin he’s written here [https://github.com/xen2/lldb/commit/515956244784a9162183a6135068e893ba994532]. The description of the plugin seems to indicate that it watches for Dynamic Library loads and unloads, but it’s not clear what the code itself does, or how it’s related to watching for DLL loads and unloads.

The dynamic loader plugin has a couple of different jobs.

The one that is relevant to your question is that it is responsible for hooking up the mechanism whereby lldb gets notified of new shared library loads. It gets called when we attach or launch a process, at which point it is supposed to make whatever instrumentation is needed for tracking the loader. On most platforms this is done by setting some breakpoint in the correct place in the loader code and then decoding the meaning of the event when the breakpoint to gets hit (load or unload, what got loaded, etc.) Since this is often a non-trivial bit of code, and one that changes as the versions of the OS go by, so it seemed worthwhile to have it be a separate module. If you wanted to use this model for Windows, you would have your DynamicLoader plugin register the callback for the "Shared libraries changed" event that your main loop is getting, and then call into that to process the event.

In the short term you can probably just call ModulesDidLoad in the code you have below. Note, this isn't done in GetSharedModule because it is expensive to go looking through new modules for breakpoints, so you don't want to hang it off some call that might be called many times. Instead we have an explicit "Okay here's the set of new libraries" type call.

There isn't good documentation on this in the code, which we should fix. Also, it would arguably be cleaner to separate out the "discover new modules" part of the DynamicLoader, and the "Make these new modules work correctly" into separate steps within the Dynamic loader plugin. The former is going to be specific to the various subclasses, but the latter job is pretty generic. Then each port would know it had to call the DynamicLoader::RegisterNewModules or whatever it was when it was done with the platform specific side of registering them. But since that job currently consists of calling Target::ModulesDidLoad, we haven't been motivated to move the code around to do this.

The other main dynamic loader job is not relevant to your question, but for completeness sake is that it is also the place where knowledge of the library intercalling mechanism resides. Most importantly, most inter-library calls are implemented using some sort of stub that trampolines over to the actual call. That stub generally doesn't have debug information, so the normal behavior of "next" when it lands in the stub would be to say "I've stepped into code with no debug information, so I'll step out". But if the stub was going to resolve to a routine that did have debug info, that would be the wrong behavior. So before we decide to step out of unknown code, we always ask the current dynamic loader plugin to "GetStepThroughTrampolinePlan" to see if it knows how to get from this PC to somewhere more interesting, and if so to return a plan that does that job.

Jim

Thanks. For now I’ll experiment with your suggestion of just calling ModulesDidLoad directly in the callback, since getting the actual notification that a library is loaded is trivial on Windows and all the work is done for us by the OS. Is it safe to update the module list from a thread other than the main thread? All threads of the inferior will be stopped while I process this notification, but I know for example that with thread creation / thread exit, I have to maintain this thread list, and then only in UpdateThreadList do I actually update the thread list on the target. Is this restriction not the same with the module list?

One more question, how do I find the module that is loaded at a specific address? When this shared library is unloaded, the only information I have is its load address, but the only method for getting a Module from the target is to call GetSharedModule() with a ModuleSpec, which I won’t have. Is there a way to search based only on the load address?

You must implement a DynamicLoaderWindows. Shared library loading/unloading won't work otherwise.

The theory is simple: after launching or attaching, the plug-in will find the list of shared libraries to get the initial state. Also when you program dynamically loads/unloads DLLs, you need to update anything that changed (load/unload sections for things that got loaded/unloaded).

Please do NOT call ModulesDidLoad directly. You can do this temporarily, but you really do need a dynamic loader.

The MacOSX version finds the global list of shared libraries that are loaded, iterates though them, searches for and adds any modules that are in the target, removes any images from the target that aren't loaded, then sets the section load addresses for all sections in all modules to the correct value and then calls ModulesDidLoad(). This causes all breakpoints to get resolved.

We then set a breakpoint at a location that gets hit after /usr/lib/dyld loads/unloads new shared libraries so we can keep up. This is a synchronous breakpoint where we detect the new shared libraries that were loaded/unloaded, we add/remove modules and set them to the loaded or unloaded and then continue. So it is a very easy plug-in to write and is required so that dynamic plug-in loading/unloaded can track breakpoints correctly.

Greg

Sounds good. I tested with calling ModulesDidLoad() directly and it seems to resolve the breakpoints, so now that I know that that was the issue blocking me, I can try to do it the “right” way via a DynamicLoader plugin.

One thing I’m uncertain about though, is that I get the notification asynchronously instead of going through this breakpoint / callback mechanism. So I can send a notification from my event listener thread to the DynamicLoader plugin, but it’s not going to be on the main thread. Will this cause a problem?

It shouldn't be a problem. Access to the module list should lock around itself on the off chance that the main thread was also updating the module list while you were doing it here. Since you've stopped all the threads so nothing surprising is going to happen from the process itself, this should be fine.

Jim

You really want your shared library loads to be synchronous. There has to be a way to stop your target when a shared library loads? If not, you might miss your breakpoint if it is in a "PluginInitialize()" call and you stop the target after receiving the shared library load/unload notification. So try as hard as you can to make this happen synchronously...

Yes, the target is stopped, it’s just that I’m not on LLDB’s main thread. That was the only concern.

Then you are good as Jim said.

I started working on implementing a DynamicLoader plugin.

While it’s indeed quite simple, I still have some questions about why it’s necessary to make shared library load/unloading work.

I do understand the use case for other platforms, because there is a non-trivial amount of work required to detect shared library loading / unloading. On Windows however, there is no work involved because the OS tells us at every occurrence of a shared library load or unload. As a result, my DynamicLoader implementation basically boils down to some code like this in my process plugin

if (event is a module load)
dynamic_loader->NotifyModuleLoad(module);
else if (event is a module unload)
dynamic_loader->NotifyModuleUnload(module);

In each of these two methods, all I do is construct an empty ModuleList, add a single item to it, set the load address, and call GetTarget().ModulesDidLoad() or GetTarget().ModulesDidUnload().

So either way I’m calling ModulesDidLoad() / ModulesDidUnload() directly, it’s just am I having the Process plugin tell the DynamicLoader to do it, or am I having the process do it itself. Whichever one does it though, it’s the same few lines of code to prepare the call to ModulesDidLoad().

One more question: You said this: “the MacOSX version finds the global list of shared libraries that are loaded, iterates through them, seaches for any modules that are in the target, removes any images from the target that aren’t loaded, then sets the section load address for all sections in all modules to the correct value”.

Just to clarify some terminology, are “shared library”, “module”, and “image” here the same thing? Why would you have a shared library that is loaded but not in the target? Where else would it be? As for setting the section load address for all sections, it sounds like this is the same as just calling module->SetModuleLoadAddress() to the load address of the entire shared library. Is this correct?

I started working on implementing a DynamicLoader plugin.

While it's indeed quite simple, I still have some questions about why it's necessary to make shared library load/unloading work.

I do understand the use case for other platforms, because there is a non-trivial amount of work required to detect shared library loading / unloading. On Windows however, there is no work involved because the OS tells us at every occurrence of a shared library load or unload. As a result, my DynamicLoader implementation basically boils down to some code like this in my process plugin

if (event is a module load)
    dynamic_loader->NotifyModuleLoad(module);
else if (event is a module unload)
    dynamic_loader->NotifyModuleUnload(module);

In each of these two methods, all I do is construct an empty ModuleList, add a single item to it, set the load address, and call GetTarget().ModulesDidLoad() or GetTarget().ModulesDidUnload().

So either way I'm calling ModulesDidLoad() / ModulesDidUnload() directly, it's just am I having the Process plugin tell the DynamicLoader to do it, or am I having the process do it itself. Whichever one does it though, it's the same few lines of code to prepare the call to ModulesDidLoad().

That is indeed simple and if you have all the info you need in the message that is sent to the data to the handle, then it doesn't make sense to do this in the dynamic loader. For us, we can use the same dynamic loader plug-in with different Process subclasses since it just uses the process to find a symbol and read data via memory reads. So our dynamic loader will work on core files with ProcessMachCore and it also works with ProcessGDBRemote. In your case, you are getting messages straight from your process runner via the OS which is quite different than any other platform, so don't change anything, but you can continue to do what you do by directly doing the load/unload from the process plug-in and have your DynamicLoaderWindows just not do anything.

One thing you might actually need to do in the DynamicLoaderWindows is find out all the shared libraries that are loaded when you attach to a process. When you attach to a running process in windows, does it give you a bunch of callbacks for each shared library that is already loaded? Just like when you are running and a shared library loads/unloads? Or must you discover them in a different way?

One more question: You said this: "the MacOSX version finds the global list of shared libraries that are loaded, iterates through them, seaches for any modules that are in the target, removes any images from the target that aren't loaded, then sets the section load address for all sections in all modules to the correct value".

Indeed.

Just to clarify some terminology, are "shared library", "module", and "image" here the same thing?

Yes.

Why would you have a shared library that is loaded but not in the target?

You wouldn't, but you might have said "file a.out" on the command line, and from just inspecting the "a.out" mach-o file it added a bunch of modules to your target. Before you run you would have:

(lldb) image list
/tmp/a.out
/usr/lib/libc.dylib
/usr/lib/libxml2.dylib

But when you run you specified DYLD_LIBRARY_PATH=/tmp/my_dylibs in the environement so when you actually attach to "a.out" you really have a different set of shared libraries:

(lldb) image list
/tmp/a.out
/tmp/my_dylibs/libc.dylib
/usr/lib/libxml2.dylib

We we start out with a target that contains the first three images, but when we run and get our first batch of shared library loaded notifications we look for "/tmp/a.out" and leave it because it is correct and set its load location, we remove the "/tmp/lib/libc.dylib" from the image list in the target and add the "/tmp/my_dylibs/libc.dylib" and then set its load address, and then we look for "/usr/lib/libxml2.dylib" and keep it because it is correct and set its load addresses.

Where else would it be?

See above.

As for setting the section load address for all sections, it sounds like this is the same as just calling module->SetModuleLoadAddress() to the load address of the entire shared library. Is this correct?

If your system always loads all sections by sliding them all by constant amounts, then yes. On MacOSX, all shared libraries in the shared cache will have all their sections moved separately. So we need to load add sections to completely different addresses (we don't just add a constant slide to all of them like module->SetModuleLoadAddress() does). Some sections aren't loaded sometimes (like the "__LINKEDIT" section isn't always loaded for our kernel). So you need to do what you need to do to load things correctly on your system.

I started working on implementing a DynamicLoader plugin.

While it’s indeed quite simple, I still have some questions about why it’s necessary to make shared library load/unloading work.

I do understand the use case for other platforms, because there is a non-trivial amount of work required to detect shared library loading / unloading. On Windows however, there is no work involved because the OS tells us at every occurrence of a shared library load or unload. As a result, my DynamicLoader implementation basically boils down to some code like this in my process plugin

if (event is a module load)
dynamic_loader->NotifyModuleLoad(module);
else if (event is a module unload)
dynamic_loader->NotifyModuleUnload(module);

In each of these two methods, all I do is construct an empty ModuleList, add a single item to it, set the load address, and call GetTarget().ModulesDidLoad() or GetTarget().ModulesDidUnload().

So either way I’m calling ModulesDidLoad() / ModulesDidUnload() directly, it’s just am I having the Process plugin tell the DynamicLoader to do it, or am I having the process do it itself. Whichever one does it though, it’s the same few lines of code to prepare the call to ModulesDidLoad().

That is indeed simple and if you have all the info you need in the message that is sent to the data to the handle, then it doesn’t make sense to do this in the dynamic loader. For us, we can use the same dynamic loader plug-in with different Process subclasses since it just uses the process to find a symbol and read data via memory reads. So our dynamic loader will work on core files with ProcessMachCore and it also works with ProcessGDBRemote. In your case, you are getting messages straight from your process runner via the OS which is quite different than any other platform, so don’t change anything, but you can continue to do what you do by directly doing the load/unload from the process plug-in and have your DynamicLoaderWindows just not do anything.

One thing you might actually need to do in the DynamicLoaderWindows is find out all the shared libraries that are loaded when you attach to a process. When you attach to a running process in windows, does it give you a bunch of callbacks for each shared library that is already loaded? Just like when you are running and a shared library loads/unloads? Or must you discover them in a different way?

That’s a good question and it’s actually still an unknown. There’s still a bit more work I need to do before being able to attach to a process. It might give me a module load notification for each existing module right when I load, and I might have to enumerate them. Even if I have to enumerate them, the code to do that is pretty simple. But it might make sense to put that code in the DynamicLoader::DidAttach method anyway. For that matter, it might also make sense to add the main executable’s module in DynamicLoader::DidLaunch, if nothing else for consistency. But modules that load or unload while the debugger is connected seem to be a natural fit to just calling ModulesDidLoad.

One more question: You said this: “the MacOSX version finds the global list of shared libraries that are loaded, iterates through them, seaches for any modules that are in the target, removes any images from the target that aren’t loaded, then sets the section load address for all sections in all modules to the correct value”.

Indeed.

Just to clarify some terminology, are “shared library”, “module”, and “image” here the same thing?

Yes.

Why would you have a shared library that is loaded but not in the target?

You wouldn’t, but you might have said “file a.out” on the command line, and from just inspecting the “a.out” mach-o file it added a bunch of modules to your target. Before you run you would have:

(lldb) image list
/tmp/a.out
/usr/lib/libc.dylib
/usr/lib/libxml2.dylib

But when you run you specified DYLD_LIBRARY_PATH=/tmp/my_dylibs in the environement so when you actually attach to “a.out” you really have a different set of shared libraries:

(lldb) image list
/tmp/a.out
/tmp/my_dylibs/libc.dylib
/usr/lib/libxml2.dylib

We we start out with a target that contains the first three images, but when we run and get our first batch of shared library loaded notifications we look for “/tmp/a.out” and leave it because it is correct and set its load location, we remove the “/tmp/lib/libc.dylib” from the image list in the target and add the “/tmp/my_dylibs/libc.dylib” and then set its load address, and then we look for “/usr/lib/libxml2.dylib” and keep it because it is correct and set its load addresses.

Where else would it be?

See above.

As for setting the section load address for all sections, it sounds like this is the same as just calling module->SetModuleLoadAddress() to the load address of the entire shared library. Is this correct?

If your system always loads all sections by sliding them all by constant amounts, then yes. On MacOSX, all shared libraries in the shared cache will have all their sections moved separately. So we need to load add sections to completely different addresses (we don’t just add a constant slide to all of them like module->SetModuleLoadAddress() does). Some sections aren’t loaded sometimes (like the “__LINKEDIT” section isn’t always loaded for our kernel). So you need to do what you need to do to load things correctly on your system.

Makes sense, thanks. Last question. When a module loads the notification I get contains the load address. So I call module->SetLoadAddress(x), and then ModulesDidLoad, at some point in the future I’m going to get a module unloaded event with the only parameter being x. It doesn’t seem there’s an easy way for me to find the ModuleSP in the target given only this value. Do I need to just keep my own map in the DynamicLoader or in the Process plugin?

Actually I checked the documentation, and it’s documented what the behavior is. When attaching to a process, the system will automatically send a load dll event for every currently loaded dll in the process. So the same code that works for creating processes will also be able to populate the initial DLL set when attaching as well.

You can just resolve the "load address" x using the target:

Address addr;
if (m_process->GetTarget().ResolveLoadAddress (x, addr))
{
    ModuleSP module_sp = addr.GetModule();
    if (module_sp)
    {
    }
}