Resuming traced process on Linux and SIGSTOP

Currently when attaching to a running process on Linux, a SIGSTOP signal is (incorrectly I think) injected into the inferior on resume. This can be reproduced by simply launching any process and then in a separate terminal doing:

sudo lldb -p

c

process interrupt
c

On the second continue the SIGSTOP that was used to stop the process is injected into the inferior by being passed to PTRACE_CONT, resulting in the process going into a stop state outside the control of LLDB. The SIGSTOP comes from the SetResumeSignal() call in POSIXThread and StopInfo.

I can’t think of any reason why a SIGSTOP should ever be injected into the inferior so as a test I simply prevented it from ever happening:

diff --git a/source/Plugins/Process/Linux/ProcessMonitor.cpp b/source/Plugins/Process/Linux/ProcessMonitor.cpp
index 3dec6de…3079379 100644
— a/source/Plugins/Process/Linux/ProcessMonitor.cpp
+++ b/source/Plugins/Process/Linux/ProcessMonitor.cpp
@@ -2209,6 +2209,9 @@ ProcessMonitor::Resume(lldb::tid_t tid, uint32_t signo)
bool result;
Log *log (ProcessPOSIXLog::GetLogIfAllCategoriesSet (POSIX_LOG_PROCESS));

  • if (signo == SIGSTOP)
  • signo = eResumeSignalNone;

Somebody is not paying attention to the "process handle" settings. Normally SIGSTOP is set not to pass:

(lldb) process handle SIGSTOP
NAME PASS STOP NOTIFY
========== ===== ===== ======
SIGSTOP false true true

This check should be done in Process::WillResume:

    virtual void
    WillResume (lldb::StateType resume_state)
    {
        ThreadSP thread_sp (m_thread_wp.lock());
        if (thread_sp)
        {
            if (thread_sp->GetProcess()->GetUnixSignals().GetShouldSuppress(m_value) == false)
                thread_sp->SetResumeSignal(m_value);
        }
    }

I wonder why this isn't happening in your case?

Jim

Sorry, that's StopInfoSignal::WillResume.

Jim

Thanks Jim, LinuxSignals.cpp had SIGSTOP marked as suppress = false unlike UnixSignals.cpp which had it correctly marked suppress = true. With that fixed though the SIGSTOP is still getting set due to the call to SetResumeSignal() in POSIXThread::SignalDeliveredNotify() it looks like. Should this code also be using the signals table to decide whether to suppress it?

I'm not very familiar with the Linux side of things, but this should all be happening automatically from generic code. The thread gets asked "Thread::ShouldResume" which first consults the StopInfo::WillResume, and that does the suppression of the signal. Then the thread plans get consulted, and if everybody decides the thread wants to resume, the virtual Thread::WillResume gets called. You should probably walk through ShouldResume in this case and see why the signal isn't getting suppressed.

Jim

Actually it looks like those calls may be redundant now as it’s being handled by StopInfo. This patch gets attach resume/interrupt working for me on Linux.

SIGSTOP-resume.patch (1.75 KB)

Indeed, setting the resume signal explicitly on the thread on stop was not the right thing to do. You don't want to clear the thread's resume signal out in Thread::ShouldStop because somebody might be trying to resume the thread with a hand-provided signal, and you wouldn't want to interrupt that. But the thread itself should let the StopInfo carry the stop signal information.

Jim

Great, thanks Jim. I take it that patch is ok to apply then?

Yes. Thanks for tracking this down.

Jim

Hey Andrew,

Any chance you can put a test into lldb to verify the fix? I’m doing some work on (essentially currently) parallel code (implementations of NativeProcessProtocol/NativeThreadProtocol) for Linux that will be used in lldb-gdbserver and I’d love us to have tests that verify we don’t gunk that up or regress it in the future.

Hey Todd, sure thing. I haven’t written a test before and I’m out of time tonight but I can take a look over the weekend or early next week, I’m looking into another issue with a SIGSTOP leaking out on detach as well so would you be okay if I apply this small patch now and then write a test that covers both this case and the detach case for when I submit a patch for that?

Cheers,
Andrew

Oh yeah sure that’s fine. When testing the test to make sure it catches the failure, you can just essentially reverse out the patch and then reapply to verify the fix.

No rush either, it would just be great to get more tests into the system to catch these things as we discover proper behavior.

No problem, I’ll ping the list when I have a patch ready.

I'm pretty sure we have some test cases that test attach. If we don't, then it would be great to add one. If we do, then it would be interesting to see why they didn't fail. For instance, maybe they just didn't bother to try "continue" after the attach. Just adding that would be good too.

Jim

Right - basically whatever didn’t fail in your attach scenario we should test and make sure that it fails without your patch, and (ideally grin) passes with your patch.

We definitely have some, because they used to fail early on in the
FreeBSD porting effort. I suspect that your suggestion is correct and
they just don't do anything but attach and then quit.

On a related note, I'm pretty sure we don't have tests for detach.
I'll see about extending tests to include that at some point.

-Ed

I’m attaching a patch here that resolves the issue of a signal being delivered to the inferior on detach, let me know if I should start a new thread for this but since it’s related to what we were discussing I’m attaching it here. To reproduce the issue just launch a process, then in a new terminal:

sudo lldb -p

c

q

When quitting the inferior receives a SIGSTOP because the process is already running when the PTRACE_DETACH is sent. I can’t find a clear answer on what’s supposed to happen here (PTRACE_DETACH sent to running thread) but this is the behaviour I’m seeing. The docs do indicate that PRACE_DETACH is like PTRACE_CONT and should only be sent to a stopped thread.

I also tried to add a unit test for this and the previous patch, however I hit http://llvm.org/pr16172. The tests these two patches require are similar to the one mentioned in that case and I get the same errors in my sample test as when running that one:

attach -p (ok)

c (ok)

detach (error: Detach failed: No such process)

And:

attach -p (ok)
c (ok)

process interrupt (ok)
c (error: Failed to resume process: Resume request failed - process still running…)

Linux.detach.patch (561 Bytes)

Just pinging this again in case anyone has a chance to review the small Linux detach patch here. Thanks!

I just finished clearing out some Linux test errors (I’m now all green on Linux x86_64). I’ll add this to a queue I’ve got with some changes from Piotr, Ed and you in it now.

Great, thanks Todd!