I’m in the midst of implementing a lldb-compatible (v13.0.1) gdb-server for my platform. I have already implemented everything required to attach to a running process using gdb-remote <host>:<port>. All of that is working at expected.
Now I’m adding support for the A & qLaunchSuccess combo to support launching a new process on the remote target from lldb. Although the packets exchanged after A & qLaunchSuccess are the same as in the attach path, lldb is ending up in a strange state where it can see all the threads, but commands like reg read output error: invalid thread. thread list shows the threads, but even after thread select, reg read still reports invalid thread.
Opening up lldb in the debugger, I can see that the CommandInterpreter instance does not have a thread set. I’ve stepped through ProcessGDBRemote::DoLaunch(), and RefreshStateAfterStop() and verified that at the ProcessGDB* level, all the thread data is communicated correctly. I just do not see where the selected thread is communicated to the CommandInterpreter.
Looking at the packets going back and forth, I see nothing different in the packet contents between what is being sent in the attach case versus the launch case. I’ve been digging through the lldb code for a couple of days now, but I’m kind of at a loss.
I’m attaching the protocol exchange for launch (stripped of some contents). If anyone could take a look, and maybe spot what I’m missing, I would greatly appreciate it.
When the reg read command is run, it gets stopped in CommandObject::CheckRequirements() due to m_thread_sp == nullptr.
When CommandInterpreter::IOHandlerInputComplete() returns after handling the process launch command, CommandInterpreter::GetExecutionContext() returns an ExecutionContext with m_process_sp set, but m_thread_sp is NULL.
I’ve verified that the process in m_process_sp, has one thread, and that should be the selected thread, but it’s not being assigned to ExecutionContext.m_thread_sp.
If anyone can share a rough idea of where the logic is for filling out the execution context on the process launch code-path, I would greatly appreciate it.
Looks like there’s a state mismatch between Process::GetState() and the RunLock for the process. process_sp->GetState() reports eStateStopped, but process_sp->GetRunLock().m_running == true. As a result ExecutionContextRef::SetTargetPtr() does not set the selected thread because it believes the process is running.
Looking into how eStateStopped can be set, but the RunLock think we’re still running.
Looks like this is a bug in lldb 13.0.1, and happens for other platforms when doing remote debugging.
When remote debugging a linux system, the same error appears:
# on remote system
lldb-server p --server --listen "*:2323"
# on local system
lldb
(lldb) platform select remote-linux
Platform: remote-linux
Connected: no
(lldb) platform connect connect://remote-host:2323
Platform: remote-linux
Triple: x86_64-unknown-linux-gnu
OS Version: 3.10.0 (3.10.0-1127.el7.x86_64)
Hostname: remote-host
Connected: yes
WorkingDir: /home/user
Kernel: #1 SMP Tue Feb 18 16:39:12 EST 2020
(lldb) file ./hello
Current executable set to '/Users/user/src/hello' (x86_64).
(lldb) process launch --stop-at-entry
Process 14578 launched: '/Users/user/src/hello' (x86_64)
(lldb) th list
Process 14578 stopped
* thread #1: tid = 14578, 0x00007ffff7ddc140 ld-2.17.so`_start, name = 'hello', stop reason = signal SIGSTOP
(lldb) reg r
error: invalid thread
(lldb) si
error: invalid thread
(lldb) thread select 1
* thread #1, name = 'hello', stop reason = signal SIGSTOP
frame #0: 0x00007ffff7ddc140 ld-2.17.so`_start
ld-2.17.so`_start:
-> 0x7ffff7ddc140 <+0>: movq %rsp, %rdi
0x7ffff7ddc143 <+3>: callq 0x7ffff7ddc850 ; _dl_start
ld-2.17.so`_dl_start_user:
0x7ffff7ddc148 <+0>: movq %rax, %r12
0x7ffff7ddc14b <+3>: movl 0x220c37(%rip), %eax ; _dl_skip_args
(lldb) si
error: invalid thread
It looks like it may be a bug in Process::SetPublicState(). It will only call m_public_run_lock.SetStopped() if !StateChangedIsExternallyHijacked(), but stepping through the code, that always comes out false for me.
With stop reason signal coming from the gdb-server, Process::StateChangedIsExternallyHijacked() fails when comparing: