Limbo

There’s an issue on Linux where LLDB stop with “stop reason = thread exited” and displays a brief assembly dump from somewhere in libc. This seems to happen because it is stopping in the “limbo” state. I can make it go away by having POSIXLimboStopInfo::ShouldStop() return false instead of true.

Is there any reason I shouldn’t do that?

Thanks,

Andy

I don't know enough about the Linux threading model to know what is really going on. Is the thread being in this "Limbo" state the reason why the process as a whole stopped, or did it stop for some other reason, and the thread in Limbo is just along for the ride? If the latter, then should it have a stop reason at all? In general, in lldb, threads only have stop reasons if they were one of the threads that caused the process to stop.

You are achieving pretty much the same thing by returning false from its ShouldStop. But note that if you happen to hit a breakpoint on another thread when the Limbo'ed thread exists, then both threads will be reported to have stopped, one with reason breakpoint and one "thread exited". Is that what you want?

Jim

Hi Jim,

We're setting the limbo state because we got a 'SIGTRAP | PTRACE_EVENT_EXIT << 8' signal -- that is, the inferior process is exiting. It looks like this is only getting used by Linux and FreeBSD.

I'm not sure it's even possible for another thread to hit a breakpoint at this stage, but if it is then the behavior you describe is what we'd want.

-Andy

When multi-threading debugging works on Linux, this signal would be received for any inferior thread which exits, including non-main spawned threads. It would be possible for another thread to hit a breakpoint in this case. I'm wondering whether its' even useful to stop lldb/create a limbo stop reason when a thread exits? Is there any usefulness to examining a thread in limbo state (ie. a thread finished execution, it's about to exit. we can read registers...)? if anything, we would update the process thread list to remove the exiting thread and make sure it exits but I don't think the debugger needs to stop for this.

Matt

So if the process actually gets a signal and stops on thread exit, then I'm inclined to think you should use the StopInfo mechanism, since that's the general mechanism for handling target stops, and subverting it doesn't seem wise. As Andrew pointed out, returning false to the ShouldStop will mean you won't ever stop. Then the other bit is whether the user should be notified about the stop info. That's already needed in the case where thread A has stopped for an internal breakpoint and thread B has stopped for a user breakpoint. In that case, you want to tell them about the user breakpoint but not the internal breakpoint. That is governed by the StopInfo's ShouldNotify function. So for instance, in the StopInfoBreakpoint, it checks whether the breakpoint is internal or not and returns true or false accordingly. So you could use the same mechanism for these Limbo stop events. They should just always return false to ShouldNotify. That way, for instance, if you needed to do something when a thread exits you could stuff it into the StopInfo's PerformAction...

I have a sneaking suspicion that there are some places in lldb that don't obey the ShouldNotify, because I have occasional reports of some of our internal breakpoint hits getting printed. But I haven't had time to go track down who the bad guy is yet. But that is the way it is supposed to work, and seems a pretty reasonable scheme...

Jim

When multi-threading debugging works on Linux, this signal would be received for any inferior thread which exits, including non-main spawned threads. It would be possible for another thread to hit a breakpoint in this case. I'm wondering whether its' even useful to stop lldb/create a limbo stop reason when a thread exits? Is there any usefulness to examining a thread in limbo state (ie. a thread finished execution, it's about to exit. we can read registers...)? if anything, we would update the process thread list to remove the exiting thread and make sure it exits but I don't think the debugger needs to stop for this.

If the thread is exiting and nothing can be done with it, it shouldn't even be in the process thread list. Just omit any threads in this state and do any cleanup needed to reap it/let it die.

The current trunk implementation of POSIXLimboStopInfo is returning 'true' for both ShouldStop() and ShouldNotify(). Having ShouldStop() return 'false' gets the process to run to completion, but I get a line saying "Process <pid> stopped and was programmatically restarted" even if I also return 'false' from ShouldNotify().

To get rid of the 'restarted' message, I also have to add an 'eStopReasonTrace' handler to ThreadPlanBase.

The attached patch addresses all three of these changes. If it looks right to everyone else I'll commit it. (BTW, this is adapted from some earlier work that Matt did that had never been committed to trunk, but I think we've done some testing with it here on Linux, Mac and FreeBSD.)

-Andy

limbo-stop-plan.patch (1.13 KB)

Why is the stop reason for hitting this POSIXLimboStopInfo eStopReasonTrace. That's specifically the stop reason for single stepping, which should generally be handled by the plan that was doing the single stepping (and thus why there didn't need to be a handler for it in the Base thread plan.) I have no objections to adding a handler in ThreadPlanBase, but it seems weird to me that that's the stop reason for this Thread Exit stop event.

Jim

I'm unsure why it is eStopReasonTrace, I think that's something we inherited. Would eStopReasonNone or a new thread exiting stop reason be a better candidate?

Though there's probably some busy-work with handling a new stop reason, that would be my vote. If you make it eStopReasonNone, it will be hard to get us to do something non-trivial with the stop should we choose to. At some point, I want to expand the "target stop-hook" mechanism so you can hook into a bunch of interesting system events, including shared library loads, process spawning and thread creation and destruction. So while we're at it it probably is worth putting in something for this thread exit.

Jim

Like Matt, I have no idea why eStopReasonTrace was chosen here. It probably stems from the fact that the limbo state happens in response to a "PTRACE" event, so probably guessed PTRACE_EVENT==eStopReasonTrace.

Frankly, I'm not even clear what the purpose of the limbo state is. We get there because the Linux ProcessMonitor::Launch method calls PTRACE(...PTRACE_O_TRACEEXIT...), and a comment there says, "This is used to keep the child in limbo until it is destroyed." So it seems like someone thought we should be stopping there. (FWIW, svn blame attributes the comment to 'wilsons' as part of the original Linux process plugin on 7/23/2010.) I notice that the PTRACE_O_TRACEEXIT call is currently excluded for FreeBSD.

The ptrace man page says this about PTRACE_O_TRACEEXIT:

"This stop will be done early during process exit when registers are still available, allowing the tracer to see where the exit occurred, whereas the normal exit notification is done after the process is finished exiting. Even though context is available, the tracer cannot prevent the exit from happening at this point."

So it might be a useful place to let the user stop and look around, but I wouldn't think we'd ordinarily want to stop there.

-Andy

Like Matt, I have no idea why eStopReasonTrace was chosen here. It probably stems from the fact that the limbo state happens in response to a "PTRACE" event, so probably guessed PTRACE_EVENT==eStopReasonTrace.

Ah, interesting. The single-step bit on a processor is sometimes called the "trace" bit, so eStopReasonTrace was the stop reason you get when you set the trace bit and allow the process to run, and then it comes back having executed a single instruction. Instruction single stepping is used a lot to implement the various stepping strategies, so calling a stop eStopReasonTrace when it is really just some random stop or other is definitely going to confuse us. I'm sure Greg and I thought that naming was as self-evident as eStopReasonBreakpoint. But we should document the stop reasons to avoid future confusion.

Frankly, I'm not even clear what the purpose of the limbo state is. We get there because the Linux ProcessMonitor::Launch method calls PTRACE(...PTRACE_O_TRACEEXIT...), and a comment there says, "This is used to keep the child in limbo until it is destroyed." So it seems like someone thought we should be stopping there. (FWIW, svn blame attributes the comment to 'wilsons' as part of the original Linux process plugin on 7/23/2010.) I notice that the PTRACE_O_TRACEEXIT call is currently excluded for FreeBSD.

The ptrace man page says this about PTRACE_O_TRACEEXIT:

"This stop will be done early during process exit when registers are still available, allowing the tracer to see where the exit occurred, whereas the normal exit notification is done after the process is finished exiting. Even though context is available, the tracer cannot prevent the exit from happening at this point."

So it might be a useful place to let the user stop and look around, but I wouldn't think we'd ordinarily want to stop there.

Right. There's a similar thing with shared library loads. For the most part you would never want to stop there, that would just annoy you. But for some odd cases you might want to. Similarly for when a program spawns a subprogram, makes a thread, etc. Right now there's no way to do that from the lldb command line, but I want add that possibility for all these sorts of program events.

Jim

How about this patch (attached)?

-Andy

limbo-reason.patch (5.47 KB)

I hate to be obnoxious, but can you use eStopReasonThreadExiting instead of eStopReasonLimbo? That seems like a really weird name, and I'm not sure anybody will understand what it means. I think eStopReasonThreadExiting is correct. If we find some other platform that can only tell us about threads dying AFTER they are already gone, we can add eStopReasonThreadExited to handle that case. Do you also get one of the "Limbo" events when the process is on its way out too? Or is that just the main thread exiting? If there's a distinction we may want to reflect that in the stop reason. But if you only get this for threads going away, eStopReasonThreadExiting should be fine.

I don't care what you call the POSIX*StopInfo, that's down in the platform layer and can be as quirky as the platform is.

Other than that it looks fine.

Jim

No problem on the name change. It makes sense.

My understanding is that we'll get the event in question for each thread as the process is exiting, but no single event is necessarily correlated with the process exiting. For instance, I believe there are circumstances under which the main thread can exit and the process stay alive.

-Andy

Once the main thread receives the thread exiting signal and lldb handles it, there will be another call in lldb to wait_pid() to get the next signal/event of the process. At that point, if we get a WIFEXITED event, then the main thread/process is gone. This is the behavior in a single-threaded case. Maybe we can eventually look into whether only the WIFEXITED event is sufficient for thread exiting...

Matt