LLDB keeps on detaching with "exited with status = -1 (0xffffffff) lost connection" on Linux

Hi,

We have got a reproduce from a customer who is running a service on Linux. He is using LLDB to attaching to the process and debug.
First issue is that, we keep on getting a lot of SIGSTOP randomly. We workaroud that with “process handle -s false SIGSTOP” then we got tons of output saying SIGSTOP stopped and restarted:

Process 13500 stopped and restarted: thread 673 received signal: SIGSTOP
Process 13500 stopped and restarted, reasons:
	thread 674 received signal: SIGSTOP
	thread 675 received signal: SIGSTOP
Process 13500 stopped and restarted, reasons:
	thread 676 received signal: SIGSTOP
	thread 677 received signal: SIGSTOP
	thread 678 received signal: SIGSTOP
Process 13500 stopped and restarted: thread 679 received signal: SIGSTOP
Process 13500 stopped and restarted, reasons:
	thread 680 received signal: SIGSTOP
	thread 681 received signal: SIGSTOP
...

Eventually, lldb would detach with error:
exited with status = -1 (0xffffffff) lost connection

It seems that LLDB has lost connection to debugserver. I have captured gdb remote logs here which I am not familiar to decipher. I do not know how to attach the log file but the last several entries are shared here:

intern-state     <  20> send packet: $p7;thread:275d1;#02
intern-state     <  20> read packet: $80cfbf521e7f0000#53
intern-state     <  21> send packet: $x7f1e52bfce00,200#c0
intern-state     < 516> read packet: $20cebf521e....
intern-state     <  21> send packet: $x7f1e88bfce00,200#c9
intern-state     < 516> read packet: $20cebf881e7f0000babda33a000000000400000000000000b0d7bf881e7f000060cebf881e7f000049c9a33a00000000806d3b3b00000000806d3b3b00000000b0d7bf881e7f0000b0d7bf881e7f0000e865a15e1f7f0000e865a15e1f7f00006809c0881e7f00004006c0881e7f00004006c0881e7f00004006c0881e7f0000000000000000000000000000000000009097bfa21e7f0000e0a9895e1f7f0000c0cebf881e7f00004006c0881e7f00004006c0881e7f000000000000000000009097bfa21e7f0000e0a9895e1f7f000070cfbf881e7f000082ac895e1f7f000000000000000000004006c0881e7f00004006c0881e7f00002c7cf7f9cdad77194006c0881e7f000000000000000000009097bfa21e7f0000e0a9895e1f7f00002c7cd7facdad77192c7c4532a1017419000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000e0cfffffffffffff4006c0881e7f00000000000000000000dcd1925e1f7f00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000#17
dbg.evt-handler  <  16> send packet: $jThreadsInfo#c1
dbg.evt-handler  <85322> read packet: $[{"name":"ConvStaticBalan","....
dbg.evt-handler  <  21> send packet: $z0,7f1f5fa86cc0,1#c7
dbg.evt-handler  <   6> read packet: $OK#9a
dbg.evt-handler  <  22> send packet: $D;00000000000034bc#eb
dbg.evt-handler  <   6> read packet: $OK#9a
b-remote.async>  <  20> send packet: $vCont;c:p34bc.-1#0a
b-remote.async>  <   7> read packet: $E1e#db

Any suggestion is appreciated. cc @clayborg, @labath

Jeffrey

Trivial point: if this is Linux, you are using lldb-server not debugserver…

W.r.t. SIGSTOP, if you want lldb not to print the SIGSTOP, then also pass -n false to process handle. But note, lldb doesn’t have any control over whether the process gets all these SIGSTOP’s, however, so those are all real and the fact that the system seems to be pounding on this process telling it to stop is interesting at least…

The last bit of the transaction between lldb & lldb-server is lldb asking lldb-server to continue the process ($vCont) and lldb-server returning an error: E1e. lldb doesn’t really know what to do with a process that can’t continue and assumes the debug session is dead and disconnects. I didn’t see any place in the lldb-server code where it calls sendError(0x1e) or sendError(30) so this might actually be a system error number? It wouldn’t be altogether surprising that whoever is continually telling this process to stop eventually tried some more strenuous method, and that’s why the process can’t continue.

Jim

This makes sense. Sorry, I should make the ask more explicitly, for SIGSTOP I am not complaining the output but need tips regarding how to best troubleshoot who is sending SIGSTOP to it? Like, is it in process code or external process?
I originally thought the lost connection issue is more important to focus on but seems that they (SIGSTOP and lost connection) may be the same issue underlying we need to find out.

Any suggestion going forward here?

Jeffrey

Set breakpoints on pthread_kill and kill in the program you are debugging (there may be other ways to send yourself a signal on Linux, you might need to research that) to see if your program is sending it to itself. If those breakpoints don’t fire and the machine you’re running the debug session on has strace installed, you can get that to watch the kill system call to see if another user process is killing your process. If the kernel is doing it directly, then I don’t know how you would figure out why it is doing that on Linux.

Jim

You can use the thread siginfo command to dump the siginfo_t structure corresponding to the signal. That should give you some indication as to who is sending that signal (si_pid) and why (si_code).