Process plugin implementation with multiple threads

With ProcessWindows, most of the OS notifications from the process (breakpoints, thread creation, etc) are going to come in via a background thread.

The way I see this being handled in Virgile’s implementation of ProcessWindows which I’m working on merging up is as follows:

  1. When events happen, put their parameters into some struct and put it in a queue
  2. Stop the process, which triggers a RefreshStateAfterStop call to the process plugin on the main thread, and then block the debugger thread.
  3. In RefreshStateAfterStop, process the message that was put onto the queue, update the thread list, etc, and then signal the debugger thread to wake up.

Is there a technical reason why we have to do this with message passing and doing all the work in RefreshStateAfterStop?

For example, suppose we get a create thread notification. Shouldn’t it just be possible to acquire the thread list’s mutex and stick the new thread onto the end of it without the message passing and all that? Why do all threads have to stop just because one new thread was created?

Is this necessary for some reason common to all plugins and platforms, or is it something specific to the implementation here, and there’s no underlying fundamental reason why it shouldn’t work with the right synchronization.

I can't say anything for windows, but if you don't need to do anything with threads as they are created, then you can just wait until you stop to discover them all. Linux has a unique problem in that it must attach to each thread as they are created in order to be able to get exceptions for that thread, so they must be notified of threads being created and do the attach. There should be no event for this if the process doesn't stop. For MacOSX, we connect to the task which is akin to the process and we don't need to be notified when threads are created as we can ask a task for its threads. MacOSX doesn't need to do anything to the threads as they are created, so we can just ask the task for its threads when we stop.

What is the story for windows? Does the process plug-in need to attach to each thread as they are created? Even if it does, there should be no LLDB event that is required for thread creation.

Greg

Windows doesn’t need to do anything special for each thread as it is created. Only need to update the thread list in the Process object as far as we’re concerned, but that’s it.

The main thing I’m worried about is that the Process object itself runs a couple of threads in the background (the private state thread, for example), so I was worried about possible race conditions when modifying state asynchronously (i.e. from my debugger thread).

What state needs to be updated asynchronously? You shouldn't be touching the thread list (no need to update it, you should be able to update it on demand each time you actually stop right?) at all until you have a valid stop reason and then the process will update the threads itself when and if it needs them.

The way Windows debug notifications work is like this:

while (WaitForDebugEvent(&event)) // Block until something happens in the process
{
// Windows stopped the process for us automatically. Do something with the event

ContinueDebugEvent(&event); // Resume the process
}

After WaitForDebugEvent and before I call ContinueDebugEvent, the event might say that a new thread was created. So at this point I’d like to update the thread list. The process is actually stopped here, but it shouldn’t appear so to the user. So at the beginning I tell LLDB the process is stopped – SetPrivateState(eStateStopped), and at the end I tell it that it’s running again – SetPrivateState(eStateRunning).

However, this code is still running on a different from what LLDB is going to invoke methods on the plugin. RefreshStateAfterStop, DoResume, etc are all going to be running on a different thread from this loop. The code is simpler if I can just stick the new thread in the ThreadList (or whatever other state needs updating if it’s not thread related) right here from this thread. Alternatively, when I set the state to eStateStopped presumably that will trigger LLDB to call RefreshStateAfterStop() from the other thread. I can also modify the state then, it just requiers a little extra work as I have to do some synchronization and marshalling / storage of the event parameters.

Is there a way to ask Windows "give me all the threads for this process"? If so, the path of least resistance would be to ignore the Thread Created events in this loop, and then only generate the thread list on demand when someone has stopped.

At some point we'll want to do keep alive debugging and then we'll have to handle posting interesting events while the process is still running. But I don't want to do that piecemeal.

For now, putting threads in the thread list while the debugger is running from both the private & public state's perspective does no good, and putting them in between a stop & an immediate continue doesn't do anybody any good either. You're just making your plugin more fragile.

Jim

It’s possible, but Windows gives us a handle to the thread which has all privileges associated with it (in other words we can use it for just about anything), so it’s considered better to use the handle windows gives us. At the very least we’d want to queue up all the incoming event notifications that we get from this loop, and then use it to update the state when someone stops.

Just to be certain, when you and Greg refer to “when someone has stopped”, I assume you are referring to a public stop, i.e. someone has to enter a command to resume the debugger?

I was wondering why my RefreshStateAfterStop override was never being called, I’m guessing now that it’s because I was only updating the private state, and this only gets called on a public stop.

It's possible, but Windows gives us a handle to the thread which has all privileges associated with it (in other words we can use it for just about anything), so it's considered better to use the handle windows gives us. At the very least we'd want to queue up all the incoming event notifications that we get from this loop, and then use it to update the state when someone stops.

You get one view of the thread if you catch the event but another if you enumerate them when stopped? That's awkward.

Just to be certain, when you and Greg refer to "when someone has stopped", I assume you are referring to a public stop, i.e. someone has to enter a command to resume the debugger?

I was wondering why my RefreshStateAfterStop override was never being called, I'm guessing now that it's because I was only updating the private state, and this only gets called on a public stop.

That's not quite what happens, I don't know if what actually happens also explains what you are seeing, but this is how this bit of logic works:

When the private state gets changed we generate an event and put it on the public event queue. Then we call "Process::ShouldBroadcastEvent", to decide whether to deliver the event to the listener on the process broadcaster. RefreshStateAfterStop gets called in ShouldBroadcastEvent, just before we go into the decision loop about whether to broadcast a stopped event or not. The event might not be broadcast, for instance it might get consumed by the thread plans because the stop was just one part of their larger designs. If we do decide to broadcast the event, it gets put on the event queue. Then when the event gets consumed by the process listener THAT'S when the public state finally gets updated. Note, even if we broadcast an event we might not change the state, since one of the StopInfo actions might have restarted the process (e.g. a breakpoint command). In that case we don't change the public state, but just mark the event as a stopped event with the restarted flag set.

This little dance is kind of obnoxious, but it is required so that the thread plans can do their jobs without spamming the public event consumer with all sorts of spurious stops and starts, and also so that the state is kept in sync with the last event you've consumed. After all, if your process ran and stopped again right away, and you had consumed the "running" event, but hadn't consumed the "stopped" event, it would be kind of disconcerting for the state to be "stopped"...

Hope that helps.

Jim

It's possible, but Windows gives us a handle to the thread which has all privileges associated with it (in other words we can use it for just about anything), so it's considered better to use the handle windows gives us. At the very least we'd want to queue up all the incoming event notifications that we get from this loop, and then use it to update the state when someone stops.

Just to be certain, when you and Greg refer to "when someone has stopped", I assume you are referring to a public stop, i.e. someone has to enter a command to resume the debugger?

No just a private stop. You should _not_ be changing the process state to stopped and running again (via Process::SetPrivateState()) for thread creation or thread death, only for stops where an existing thread stops for some reason (trace (single step), breakpoint, exception, etc). The private state thread will move the thread plans along for private stops, which agains are only actual stops where the process stops due to a existing thread (not a new or dying thread) stopping for a reason. These thread creation events should be suppressed from the private state thread and only handled for your own purposed. You also do not want to be updating the thread list as you are running (the one in Process::GetThreadList()), so you should keep your own ThreadList object within the windows event loop that only your windows event will update. Later when a stop gets propagated to the private state of the process, it will call:

    virtual bool
    UpdateThreadList (ThreadList &old_thread_list, ThreadList &new_thread_list) = 0;

And you can merge your private copy of your ThreadList into new_thread_list as needed. You should be doing something like:

ThreadList newWindowsThreads;
ThreadList deadWindowsThreads;

while (WaitForDebugEvent(&event)) // Block until something happens in the process
{
    if (event is thread creation)
    {
         newWindowsThreads.Append(new thread)
    }
    else if (event is thread death)
    {
         deadWindowsThreads.Append(...)
    }
    // Windows stopped the process for us automatically. Do something with the event
    ContinueDebugEvent(&event); // Resume the process
}

Then in your

bool
ProcessWindows::UpdateThreadList (ThreadList &old_thread_list, ThreadList &new_thread_list)
{
     // Append any threads from old_thread_list into new_thread_list if they are still alive by checking if any of them are in deadWindowsThreads and only adding them if they aren't in the list
     // Append all threads from newWindowsThreads into new_thread_list
}

I was wondering why my RefreshStateAfterStop override was never being called, I'm guessing now that it's because I was only updating the private state, and this only gets called on a public stop.

Yes this seems to be for public stops only. ProcessWindows::UpdateThreadList() isn't only for public stops, it is used in private stops as well.

It’s likely possible to get the same view of the thread, but there’s no guarantee. handles are funny things, and there’s a huge amount of complexity and security checks that go on behind the scenes when you try to open a handle, so it’s safer to just use the one that’s been blessed by the OS.

Thanks for the additional explanations

You could probably maintain these:

ThreadList newWindowsThreads;
ThreadList deadWindowsThreads;

As just vectors of your handles:

std::vector<HANDLE> newWindowsThreadHandles;
std::vector<HANDLE> deadWindowsThreadHandles;

Then in:

bool
ProcessWindows::UpdateThreadList (ThreadList &old_thread_list, ThreadList &new_thread_list)

You can search the old_thread_list and see if any of these have handles that are in deadWindowsThreadHandles and if so don't copy them over into new_thread_list, and if they aren't do copy them over. Then you would create new windows threads (subclasses of lldb_private::Thread) for all items in newWindowsThreadHandles...

There is no need to create any lldb_private::Thread subclasses objects until you have a stop that might use them for debugger purposes as the thread might be created and also die before you stop, so you would want to remove any items from newWindowsThreadHandles that are in deadWindowsThreadHandles or remove dead threads from newWindowsThreadHandles without adding them to deadWindowsThreadHandles if they already exist in newWindowsThreadHandles before a stop comes along...

Yea that’s more or less what I’m planning to do. Thanks!

I will need to dig into the code again (it’s been almost a year, so I don’t remember details)!

From what I remember, some of the Windows-specific things were:

  • You could only launch and process most of debugging Windows API command from the thread where CreateProcess was done (let’s call it the “Debugger control thread”)
  • As mentioned earlier, causing a process to break was actually launching another “control” thread inside the debugged Windows process, which would do the int3.
  • The way multiple threads could hit the same breakpoint (probably partly due to having to go back to main debugger thread) needed special handling (https://github.com/xen2/lldb/commit/515956244784a9162183a6135068e893ba994532#diff-826417f27b16bb5b36023b7e6cbc6829R474 )

I think I was passing message of thread creation/deletion instead of registering them immediately so that “state view” would be consistent from within this “Debugger control thread” and externally.
However, I was still learning LLDB architecture at the time (wasn’t sure about what kind of invariant/state was expected from outside), so it might be not necessary.

Digging up this old thread a little bit. So on Windows now I can launch a process with -s to stop at the initial breakpoint, and when I do this I can print register values, stack frames, and lots of other things.

But RefreshStateAfterStop isn’t being called for the initial stop. This is where I’ve put all my code to update the StopInfo of the various threads, so currently for the initial breakpoint I’ve got no way to tell the threads why they’re stopped.

There’s a comment in Process::Launch() that seems to justify this, which says this:

// Note, the stop event was consumed above, but not handled. This was done
// to give DidLaunch a chance to run. The target is either stopped or crashed.
// Directly set the state. This is done to prevent a stop message with a bunch
// of spurious output on thread status, as well as not pop a ProcessIOHandler.

But still, this is a real stop that will cause the debugger to break. Is it a bug that RefreshStateAfterStop isn’t being called for this?

Sounds like it to me. You don't want to call RefreshStateAfterStop if you don't need to (for instance if processing this first stop when you're just going to auto-continue or any other "synchronous" breakpoint callback like the shared library load notification) since it could be a little expensive. But if you are ever going to return control to the user you need to call RefreshStateAfterStop.

Jim