Race condition during process launch

Greetings fellow developers,

I have been debugging an issue, where starting an inferior process from lldb command line results in lldb lockup. I have traced the problem to a race condition in waiting for process events.

Normally, we have the event handler thread waiting for (all) events in Debugger::DefaultEventHandler. Additionally, when we start a process with -o “process launch”, we end up calling (in the main thread) Target::Launch with synchronous_execution=true. This results in a call to Process::WaitForProcessToStop(), which tries to wait for the process state change messages on the same listener as the first thread.

This results in a race between the two threads. If all the events are processed by the event-handler thread, the main thread will not receive the Stop event and will end up waiting forever, locking up the debugger. When the main thread manages to catch the Stop event everything proceeds normally (as far as I can tell).

This happens on linux in about 80% of cases when I run:
lldb a.out -o “br set -n main” -o “process launch”
It happens both with local debugging and llgs. I haven’t managed to reproduce it on mac, but I suspect this is simply due to different thread scheduling, as the code in question is platform independent.

I was wondering if someone could advise on the correct solution to this problem. Obviously, we need the main thread to always receive the stop event. However, I am unsure what is the correct solution. Can I just hijack all the event from the event-handler thread? Will something bad happen if those events will not be processed there? Or should there be another listener listening for the stop events here?

regards,
pavel

Greetings fellow developers,

I have been debugging an issue, where starting an inferior process from lldb command line results in lldb lockup. I have traced the problem to a race condition in waiting for process events.

Normally, we have the event handler thread waiting for (all) events in Debugger::DefaultEventHandler. Additionally, when we start a process with -o "process launch", we end up calling (in the main thread) Target::Launch with synchronous_execution=true. This results in a call to Process::WaitForProcessToStop(), which tries to wait for the process state change messages on the same listener as the first thread.

This results in a race between the two threads. If all the events are processed by the event-handler thread, the main thread will not receive the Stop event and will end up waiting forever, locking up the debugger. When the main thread manages to catch the Stop event everything proceeds normally (as far as I can tell).

This happens on linux in about 80% of cases when I run:
lldb a.out -o "br set -n main" -o "process launch"
It happens both with local debugging and llgs. I haven't managed to reproduce it on mac, but I suspect this is simply due to different thread scheduling, as the code in question is platform independent.

I was wondering if someone could advise on the correct solution to this problem. Obviously, we need the main thread to always receive the stop event. However, I am unsure what is the correct solution. Can I just hijack all the event from the event-handler thread? Will something bad happen if those events will not be processed there? Or should there be another listener listening for the stop events here?

At present we really only support one listener handling process events at a time. And for complex little two-steps like handling the initial stops in launch, or running functions in the target, there are intermediate events which would just confuse the agent that requested the dance. You do want to hide all those events from the caller.

So something like process launch, which is going to handle a few stops and restarts for its own purposes, should hijack the process broadcaster.

The HandleProcessEvent stuff doesn't really care which Listener is causing events to get pulled off the broadcaster, so that isn't a worry. And then if the hijacking listener stops at some event that is interesting to the original Listener, then it can just forward the event on.

For instance, RunThreadPlan - which is actually the runner for Function calling plans (I had more ambitions for it at first, but it's become pretty specialized...) does a pretty complex set of tricks. It handles auto-continuing from breakpoints if requested, and will interrupt execution to switch from single thread execution to running all threads. None of this would make any sense to somebody who said "run this expression for me." So we install an Hijacking listener and then handle all the stages of running the function using that. If the function call succeeds, then we eat all the events and return the result. But for instance if you hit a breakpoint (and the caller wants to stop at breakpoint hits in function calls) then on the way out, we forward the breakpoint stop event to the original listener so that the stop can be handled by the normal Process Listener just like any other stop.

Jim

Thanks a lot for the explanation. This makes things much clearer. I’ll try to implement something along these lines.

regards,
pavel

Yeah, this stuff is little tricky. Feel free to ask about anything that seems opaque...

Jim

Thanks!