thread state question on Ctrl-C

Hey guys,

When we hit Ctrl-C in lldb while debugging a multithreaded inferior with 5 threads, at a low level we’ll end up stopping all 5 threads. It looks like our expectation is that one of the 5 threads gets marked with a stop reason, while the other 4 threads in this case would simply report they are stopped for no reason (gdb-remote T00 stop reason). Is that correct?

Right now in llgs I believe I’m doing this in a non-compliant way. I am marking all threads as stopped with the stop reason for an interrupt. This seems to translate to lldb thinking it needs multiple restarts to get it going again.

Related - if I do a qThreadStopInfo gdb-remote command and the thread is not stopped, that is an error that gets an E response. But if it is stopped but not for any user-visible reason, that is a T00 response. Is that correct?

The difference between MacOSX/debugserver and Linux/llgs is visible in the test/tools/lldb-gdbserver/TestGdbRemote_qThreadStopInfo.py test if you log the stop signals returned by the qThreadStopInfo that loops over all threads. MacOSX is only marking one with a stop reason of non-zero, whereas Linux/llgs is marking all threads with the SIGSTOP stop reason. That particular test isn’t checking that aspect, but obviously I want to add a test that verifies we’re handling the stop reason marking correctly on the llgs side.

Thanks!

Hey guys,

When we hit Ctrl-C in lldb while debugging a multithreaded inferior with 5 threads, at a low level we'll end up stopping all 5 threads. It looks like our expectation is that one of the 5 threads gets marked with a stop reason, while the other 4 threads in this case would simply report they are stopped for no reason (gdb-remote T00 stop reason). Is that correct?

We should probably invent a "stopped by user interrupt" stop reply that we can get from debugserver. Right now debugserver just reports the results of whatever mechanism it uses to interrupt the target, which is generally a SIGSTOP. Signals get delivered to one thread, so the stop reply will say that thread stopped with a SIGSTOP.

But I don't quite understand what you mean by "at a low level we'll end up stopping all 5 threads". When you hit a breakpoint on one thread, do you also have to stop the other threads? And if so, do you report that as the stop reason for those threads? That would be wrong, since the stopping of those threads is an implementation detail, and showing that those threads stopped for some reason that doesn't relate to their current state would be confusing. If you don't report the internal stopping of threads for a breakpoint hit, why do you have to do that for interrupt?

Right now in llgs I believe I'm doing this in a non-compliant way. I am marking all threads as stopped with the stop reason for an interrupt. This seems to translate to lldb thinking it needs multiple restarts to get it going again.

I don't know why this would take multiple restarts. If you hit two breakpoints on two different threads simultaneously, then internally you'll have to step over each of them to get the process going again. But that's just a detail of how software breakpoints work. If they were hardware breakpoints, you could just restart. What's going on that you have to do multiple restarts.

Related - if I do a qThreadStopInfo gdb-remote command and the thread is *not* stopped, that is an error that gets an E response. But if it is stopped but not for any user-visible reason, that is a T00 response. Is that correct?

We don't currently do "keep alive" debugging, so I'm not quite sure why the thread would not be stopped. How is that happening. Anyway, I don't think there's a stop reason for "not actually stopped". E seems a crude way to indicate this, however, so when we get around to keep alive debugging we should invent a reply that indicates not stopped as distinct from "got an error reading the thread status..."

Jim

Hey guys,

When we hit Ctrl-C in lldb while debugging a multithreaded inferior with 5 threads, at a low level we'll end up stopping all 5 threads. It looks like our expectation is that one of the 5 threads gets marked with a stop reason, while the other 4 threads in this case would simply report they are stopped for no reason (gdb-remote T00 stop reason). Is that correct?

You should show what actually happens in the system. If all 5 threads actually stop due to a signal, show that. So try to show the truth as much as you can. It would be great if we actually were able to stop the program some other way that doesn't involve a signal being thrown, but none of us have that as far as I know. It would be great to get all threads stopped with no stop reason, but we currently should show the truth about how the program stopped (SIGSTOP) in case when you resume the SIGSTOP got delivered to your program you would want to be able to understand why it did such a thing.

Right now in llgs I believe I'm doing this in a non-compliant way. I am marking all threads as stopped with the stop reason for an interrupt.

You should tell the truth and just tell us which thread got the signal if you can determine this.

This seems to translate to lldb thinking it needs multiple restarts to get it going again.

Signals shouldn't require multiple starts. They don't on MacOSX. We determine, using the current UnixSignals from the process, if we suppress them or not and just resume the program.

Related - if I do a qThreadStopInfo gdb-remote command and the thread is *not* stopped, that is an error that gets an E response. But if it is stopped but not for any user-visible reason, that is a T00 response. Is that correct?

qThreadStopInfo should always return a valid response for any real thread. It can't be sent while the process is running because when the process is running you are in the middle of a "c" or "vCont..." packet and waiting for the stop reply. T00 should be returned for any thread was simply suspended because we wanted to stop the process, yes that is correct.

The difference between MacOSX/debugserver and Linux/llgs is visible in the test/tools/lldb-gdbserver/TestGdbRemote_qThreadStopInfo.py test if you log the stop signals returned by the qThreadStopInfo that loops over all threads. MacOSX is only marking one with a stop reason of non-zero

We aren't marking anything, we are just telling the truth by converting the actual system exception information into a stop reply packet.

whereas Linux/llgs is marking all threads with the SIGSTOP stop reason.

So the question is: when you interrupt your target, how are you doing it? Sending a SIGSTOP? Does any thread actually have a stop reason? If you have no threads with no real stop reasons, you will need to mark at least one thread with a stop reason of "reason:interrupted;" (and add corresponding code to ProcessGDBRemote::SetThreadStopInfo(...) to handle the new "interrupted" reason.

That particular test isn't checking that aspect, but obviously I want to add a test that verifies we're handling the stop reason marking correctly on the llgs side.

So tell the truth as much as possible, and if you are interrupting and have to make up a new stop reason, use the "interrupted" on a single thread (the first one probably) if no threads actually have a stop reason to report after halting your process.

Greg

I’ll need to go through it in detail. I suspect I’m actually sending a SIGSTOP to each thread on Ctrl-C which I think is the root of the issue - I only need to do that to one.

You guys gave me enough info to work through it from here. Thanks!