LLDB protocol question; custom gdb-server; launch; threads visible, but `invalid thread` on `reg read`

I’m in the midst of implementing a lldb-compatible (v13.0.1) gdb-server for my platform. I have already implemented everything required to attach to a running process using gdb-remote <host>:<port>. All of that is working at expected.

Now I’m adding support for the A & qLaunchSuccess combo to support launching a new process on the remote target from lldb. Although the packets exchanged after A & qLaunchSuccess are the same as in the attach path, lldb is ending up in a strange state where it can see all the threads, but commands like reg read output error: invalid thread. thread list shows the threads, but even after thread select, reg read still reports invalid thread.

Opening up lldb in the debugger, I can see that the CommandInterpreter instance does not have a thread set. I’ve stepped through ProcessGDBRemote::DoLaunch(), and RefreshStateAfterStop() and verified that at the ProcessGDB* level, all the thread data is communicated correctly. I just do not see where the selected thread is communicated to the CommandInterpreter.

Looking at the packets going back and forth, I see nothing different in the packet contents between what is being sent in the attach case versus the launch case. I’ve been digging through the lldb code for a couple of days now, but I’m kind of at a loss.

I’m attaching the protocol exchange for launch (stripped of some contents). If anyone could take a look, and maybe spot what I’m missing, I would greatly appreciate it.

launch-packets.txt (13.6 KB)

Still digging into this.

When the reg read command is run, it gets stopped in CommandObject::CheckRequirements() due to m_thread_sp == nullptr.

When CommandInterpreter::IOHandlerInputComplete() returns after handling the process launch command, CommandInterpreter::GetExecutionContext() returns an ExecutionContext with m_process_sp set, but m_thread_sp is NULL.

I’ve verified that the process in m_process_sp, has one thread, and that should be the selected thread, but it’s not being assigned to ExecutionContext.m_thread_sp.

If anyone can share a rough idea of where the logic is for filling out the execution context on the process launch code-path, I would greatly appreciate it.

Process 12797 stopped
* thread #1, name = 'lldb', stop reason = step over
    frame #0: 0x00007f4e4bd4dd4a liblldb.so.13`lldb_private::CommandInterpreter::IOHandlerInputComplete(this=0x0000000001885730, io_handler=0x00000000019cec40, line="process launch --stop-at-entry ") at CommandInterpreter.cpp:2906:1
   2903	    io_handler.SetIsDone(true);
   2904	    m_result.SetResult(lldb::eCommandInterpreterResultInferiorCrash);
   2905	  }
-> 2906	}
   2907
   2908	bool CommandInterpreter::IOHandlerInterrupt(IOHandler &io_handler) {
   2909	  ExecutionContext exe_ctx(GetExecutionContext());
(lldb) p this->GetExecutionContext()
(lldb_private::ExecutionContext) $24 = {
  m_target_sp = std::__shared_ptr<lldb_private::Target, __gnu_cxx::_S_atomic>::element_type @ 0x00000000019b8620 {
    _M_ptr = 0x00000000019b8620
  }
  m_process_sp = std::__shared_ptr<lldb_private::Process, __gnu_cxx::_S_atomic>::element_type @ 0x0000000001a3a1f0 {
    _M_ptr = 0x0000000001a3a1f0
  }
  m_thread_sp = nullptr {
    _M_ptr = nullptr
  }
  m_frame_sp = nullptr {
    _M_ptr = nullptr
  }
}

### check the process
(lldb) p $24.m_process_sp.get()
(lldb_private::process_gdb_remote::ProcessGDBRemote *) $32 = 0x0000000001a3a1f0

### check for threads
(lldb) p $24.m_process_sp.get()->GetThreadList()
(lldb_private::ThreadList) $30 = {
  lldb_private::ThreadCollection = {
    m_threads = size=1 {
      [0] = std::__shared_ptr<lldb_private::Thread, __gnu_cxx::_S_atomic>::element_type @ 0x0000000001a5ae90 {
        _M_ptr = 0x0000000001a5ae90
      }
    }
    m_mutex = {
      std::__recursive_mutex_base = {
        _M_mutex = {
          __data = {
            __lock = 0
            __count = 0
            __owner = 0
            __nusers = 0
            __kind = 1
            __spins = 0
            __elision = 0
            __list = {
              __prev = nullptr
              __next = nullptr
            }
          }
          __size = ""
          __align = 0
        }
      }
    }
  }
  m_process = 0x0000000001a3a1f0
  m_stop_id = 1
  m_selected_tid = 0
}

Looks like there’s a state mismatch between Process::GetState() and the RunLock for the process. process_sp->GetState() reports eStateStopped, but process_sp->GetRunLock().m_running == true. As a result ExecutionContextRef::SetTargetPtr() does not set the selected thread because it believes the process is running.

Looking into how eStateStopped can be set, but the RunLock think we’re still running.

frame #2: 0x00007f4e4be4debd liblldb.so.13`lldb_private::ExecutionContextRef::SetTargetPtr(this=0x00007ffc5b1dfd10, target=0x00000000019b8620, adopt_selected=true) at ExecutionContext.cpp:517:36
   514 	            // resuming.
   515 	            Process::StopLocker stop_locker;
   516
-> 517 	            if (stop_locker.TryLock(&process_sp->GetRunLock()) &&
   518 	                StateIsStoppedState(process_sp->GetState(), true)) {
   519 	              lldb::ThreadSP thread_sp(
   520 	                  process_sp->GetThreadList().GetSelectedThread());
(lldb) p process_sp->GetState()
(lldb::StateType) $43 = eStateStopped
(lldb) p process_sp->GetRunLock()
(lldb_private::ProcessRunLock) $44 = {
  m_rwlock = {
    __data = {
      __lock = 0
      __nr_readers = 1
      __readers_wakeup = 0
      __writer_wakeup = 0
      __nr_readers_queued = 0
      __nr_writers_queued = 0
      __writer = 0
      __shared = 0
      __pad1 = 0
      __pad2 = 0
      __flags = 0
    }
    __size = ""
    __align = 4294967296
  }
  m_running = true
}

Looks like this is a bug in lldb 13.0.1, and happens for other platforms when doing remote debugging.

When remote debugging a linux system, the same error appears:

# on remote system
lldb-server p --server --listen "*:2323"

# on local system
lldb
(lldb) platform select remote-linux
  Platform: remote-linux
 Connected: no
(lldb) platform connect connect://remote-host:2323
  Platform: remote-linux
    Triple: x86_64-unknown-linux-gnu
OS Version: 3.10.0 (3.10.0-1127.el7.x86_64)
  Hostname: remote-host
 Connected: yes
WorkingDir: /home/user
    Kernel: #1 SMP Tue Feb 18 16:39:12 EST 2020
(lldb) file ./hello
Current executable set to '/Users/user/src/hello' (x86_64).
(lldb) process launch --stop-at-entry
Process 14578 launched: '/Users/user/src/hello' (x86_64)
(lldb) th list
Process 14578 stopped
* thread #1: tid = 14578, 0x00007ffff7ddc140 ld-2.17.so`_start, name = 'hello', stop reason = signal SIGSTOP
(lldb) reg r
error: invalid thread
(lldb) si
error: invalid thread
(lldb) thread select 1
* thread #1, name = 'hello', stop reason = signal SIGSTOP
    frame #0: 0x00007ffff7ddc140 ld-2.17.so`_start
ld-2.17.so`_start:
->  0x7ffff7ddc140 <+0>: movq   %rsp, %rdi
    0x7ffff7ddc143 <+3>: callq  0x7ffff7ddc850            ; _dl_start

ld-2.17.so`_dl_start_user:
    0x7ffff7ddc148 <+0>: movq   %rax, %r12
    0x7ffff7ddc14b <+3>: movl   0x220c37(%rip), %eax      ; _dl_skip_args
(lldb) si
error: invalid thread

It looks like it may be a bug in Process::SetPublicState(). It will only call m_public_run_lock.SetStopped() if !StateChangedIsExternallyHijacked(), but stepping through the code, that always comes out false for me.

With stop reason signal coming from the gdb-server, Process::StateChangedIsExternallyHijacked() fails when comparing:

  • hijacking_name = "LaunchEventHijack"
  • g_resume_sync_name = "lldb.Process.ResumeSynchronous.hijack"

At this point, I’m just going to open an issue in github.