strange issues on linux

I’m running into some issues trying to debug with lldb on linux.
It started with errors that debugserver aborted. I ran lldb-server manually and then used process connect, and got this error:

lldb-server: …/tools/lldb/source/Plugins/Process/Linux/NativeThreadLinux.cpp:227: virtual NativeRegisterContextSP lldb_private::process_linux::NativeThreadLinux::GetRegisterContext(): Assertion `reg_interface && “OS or CPU not supported!”’ failed.

I synced lldb and llvm to head and rebuilt, and this error went away. But then later it showed up again on a different machine when debugging the same binary with the newly built lldb.

Once I get the binary running under lldb I’m seeing strange things on my OperatingSystem plugin. Sometimes things work normally, sometimes the OS plugin seems to see incorrect register values but they’re correct at the lldb prompt, and sometimes the registers just seem completely wrong.
The behaviour is pretty consistent when I set a breakpoint at the same line, but if I move it to a different line or move to a different machine I’ll get a different behavior.

Here’s an example where the plugin sees invalid registers:

(lldb) run
OperatingSystemGo::UpdateThreadList(0, 1, 1) fetching thread data from Go for pid 678
Process 678 launched: ‘vc.test’ (x86_64)
I0528 17:09:01.105108 678 rand.go:86] Seeding pseudo-random number generator with 1432858141105104474
OperatingSystemGo::UpdateThreadList(1, 1, 1) fetching thread data from Go for pid 678
OperatingSystemGo: RealThread 0 has sp 7fffffffd430
Process 678 stopped

  • thread #1: tid = 0x02a6, 0x000000000049f379 vc.testvc.createVCs(~r0=stream.XVC at 0x000000c208030668, ~r1=stream.XVC at 0x000000c208030678) + 73 at xvc_test.go:23, name = 'vc.test', stop reason = breakpoint 1.1 frame #0: 0x000000000049f379 vc.testvc.createVCs(~r0=stream.XVC at 0x000000c208030668, ~r1=stream.XVC at 0x000000c208030678) + 73 at xvc_test.go:23
    (lldb) register read
    General Purpose Registers:
    rax = 0x000000c208030580
    rbx = 0x0000000000000000
    rcx = 0x000000c208001a40
    rdx = 0x000000000095e678 vc.testvc.TestXVC.f rdi = 0x0000000000000000 rsi = 0x000000c208030768 rbp = 0x0000000000abe578 main.statictmp_0002 + 312 rsp = 0x000000c208030500 r8 = 0x000000005567ae1d r9 = 0x0000000000000007 r10 = 0x000000000300c146 r11 = 0x000000000000007b r12 = 0x0000000000000025 r13 = 0x000000000095d840 vc.testruntime.gcbits.3082880518
    r14 = 0x0000000000000008
    r15 = 0x0000000000000000
    rip = 0x000000000049f379 vc.test`vc.createVCs + 73 at xvc_test.go:23
    rflags = 0x0000000000000246
    cs = 0x0000000000000033
    fs = 0x0000000000000000
    gs = 0x0000000000000000
    ss = 0x000000000000002b
    ds = 0x0000000000000000
    es = 0x0000000000000000

So my plugin sees one real thread, but the SP is bogus: 7fffffffd430
When I print registers at the prompt i get the correct value: 0x000000c208030500

Then I run stepi:

(lldb) stepi
OperatingSystemGo::UpdateThreadList(8, 3, 3) fetching thread data from Go for pid 678
OperatingSystemGo: RealThread 0 has sp c2080304f8
OperatingSystemGo: RealThread 1 has sp 7ffff77f5df8
OperatingSystemGo: RealThread 2 has sp 7ffff6fb4d70

Now my plugin is getting three threads. But we’ve only gone one instruction, so where are these other threads coming from? Thread 0 has the correct SP this time, but not the other threads.
Now that we’ve got the right SP value, the OS plugin is able to find the memory thread which confuses the threadplan so it keeps running till it hits a breakpoint:

OperatingSystemGo::UpdateThreadList(9, 3, 3) fetching thread data from Go for pid 678
OperatingSystemGo: RealThread 0 has sp c208030500
OperatingSystemGo: RealThread 1 has sp 7ffff77f5df8
OperatingSystemGo: RealThread 2 has sp 7ffff6fb4d70
Process 678 stopped

  • thread #8: tid = 0x0006, 0x000000000049f3b1 vc.testvc.createVCs(~r0=stream.XVC at 0x000000c208030668, ~r1=stream.XVC at 0x000000c208030678) + 129 at xvc_test.go:24, stop reason = breakpoint 2.1 frame #0: 0x000000000049f3b1 vc.testvc.createVCs(~r0=stream.XVC at 0x000000c208030668, ~r1=stream.XVC at 0x000000c208030678) + 129 at xvc_test.go:24
    (lldb)

Debugging the same binary on a different machine with the same lldb binary, I get different behavior:

(lldb) r
There is a running process, kill it and restart?: [Y/n] y
Process 127429 exited with status = 9 (0x00000009)
OperatingSystemGo::UpdateThreadList(0, 1, 1) fetching thread data from Go for pid 127449
Process 127449 launched: ‘vc.test’ (x86_64)
I0528 17:10:43.950815 127449 rand.go:86] Seeding pseudo-random number generator with 1432858243950810731
OperatingSystemGo::UpdateThreadList(1, 1, 1) fetching thread data from Go for pid 127449
OperatingSystemGo: RealThread 0 has sp 7fffffffde50
Process 127449 stopped

  • thread #1: tid = 0x1f1d9, 0x00007ffff7ddb2d0, name = ‘vc.test’, stop reason = signal SIGSTOP
    frame #0: 0x00007ffff7ddb2d0
    → 0x7ffff7ddb2d0: movq %rsp, %rdi
    0x7ffff7ddb2d3: callq 0x7ffff7ddea70
    0x7ffff7ddb2d8: movq %rax, %r12
    0x7ffff7ddb2db: movl 0x221b17(%rip), %eax
    (lldb) register read
    General Purpose Registers:
    rax = 0x0000000000000000
    rbx = 0x0000000000000000
    rcx = 0x0000000000000000
    rdx = 0x0000000000000000
    rdi = 0x0000000000000000
    rsi = 0x0000000000000000
    rbp = 0x0000000000000000
    rsp = 0x00007fffffffde50
    r8 = 0x0000000000000000
    r9 = 0x0000000000000000
    r10 = 0x0000000000000000
    r11 = 0x0000000000000000
    r12 = 0x0000000000000000
    r13 = 0x0000000000000000
    r14 = 0x0000000000000000
    r15 = 0x0000000000000000
    rip = 0x00007ffff7ddb2d0
    rflags = 0x0000000000000200
    cs = 0x0000000000000033
    fs = 0x0000000000000000
    gs = 0x0000000000000000
    ss = 0x000000000000002b
    ds = 0x0000000000000000
    es = 0x0000000000000000

(lldb) bt

  • thread #1: tid = 0x1f1d9, 0x00007ffff7ddb2d0, name = ‘vc.test’, stop reason = signal SIGSTOP
  • frame #0: 0x00007ffff7ddb2d0

This time the plugin gets the invalid register values, but when lldb stops at the prompt things are still invalid. The rip is garbage, so lldb doesn’t recognize that it’s hit a breakpoint and the backtrace is wrong.

Any ideas what could be going on here, or how I can debug this?

Hello,

TBH, I don't think anybody here tests the functionality of the
"process connect". I'm not even sure what is it supposed to do.

In your case, I would try to enable packet logging (log enable
gdb-remote packets) to see the communication with lldb-server. This
way you can at least determine whether the register corruption happens
in the server or the client. You can send it over and I'll try to take
a look.

It would help if you could explain what the OS plugin is trying to do
(for example, when it is fetching go threads). I'm afraid I don't know
much about the purpose of these plugins. Does it try to change
registers? Evaluate expressions? Allocate memory?

cheers,
pl

Hello,

I'm not really sure what is going on, but I don't think the problem is
in the server component. The server reports the correct register
values. You say your plugin does not do anything which should change
the registers (and the packet log confirms this), so it looks like the
corruption happens in LLDB. I'm afraid I don't know enough about the
client to know what is the problem, but it seems like you are hitting
some kind of a race there. Possibly you are reading the register
values before they have been populated (possibly even before the whole
thread list is populated, which could explain why you are getting
inconsistent thread counts).

It seemed like this is the way to connect to an already running gdbserver. I
was trying to see the output of gdb server to see why it was crashing.
Although I suppose the "OS or CPU not supported!" message might just mean I
started lldb-server with the wrong args and not the real problem.

Yeah, debugging the server is not very easy. I usually just attach to
a running instance, or load the resulting core file. As I said, this
looks like a client issue, but it should still not crash the server.
If you encounter a server crash (without using "process connect"), I
would be interested in it and can probably fix it.

Here's a log from one session:

<snip>

< 588> read packet:
$T05thread:5cdf;name:vc.test;threads:5cd5,5cdb,5cdc,5cdd,5cde,5cdf,5ce0,5ce1,5ce2,5ce3;00:883e4c08c2000000;01:0000000000000000;02:401a0008c2000000;03:90d3950000000000;04:0000000000000000;05:683f4c08c2000000;06:78d5ab0000000000;07:083e4c08c2000000;08:2e8d685500000000;09:0700000000000000;0a:403d053600000000;0b:7b00000000000000;0c:1100000000000000;0d:7cc4950000000000;0e:0800000000000000;0f:0000000000000000;10:79f1490000000000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;reason:breakpoint;#14

This is the packet notifying the client you've hit a breakpoint. It
also provides the values of the general purpose registers. The values
seem reasonable here.
After this, you are only doing memory read commands, and these should
not affect the registers.

(lldb) register read
General Purpose Registers:
       rax = 0x0000000000000000
       rbx = 0x0000000000000000
       rcx = 0x0000000000000000
       rdx = 0x0000000000000000
       rdi = 0x0000000000000000
       rsi = 0x0000000000000000
       rbp = 0x0000000000000000
       rsp = 0x00007fffffffdde0
        r8 = 0x0000000000000000
        r9 = 0x0000000000000000
       r10 = 0x0000000000000000
       r11 = 0x0000000000000000
       r12 = 0x0000000000000000
       r13 = 0x0000000000000000
       r14 = 0x0000000000000000
       r15 = 0x0000000000000000
       rip = 0x00007ffff7ddb2d0
    rflags = 0x0000000000000200
        cs = 0x0000000000000033
        fs = 0x0000000000000000
        gs = 0x0000000000000000
        ss = 0x000000000000002b
        ds = 0x0000000000000000
        es = 0x0000000000000000

"register read" does not need to issue any new server commands, since
it thinks the register values have already been provided. However, it
seems they got corrupted on their way...

hope this helps,
pl

I can’t reproduce the server crash at the moment, but it was crashing immediately at process launch, so I don’t think attaching to it would work.

I think I found the other problem.
ProcessGDBRemote now has a stack of stop infos.
RefreshStateAfterStop loops through them and calls SetThreadStopInfo, which in turn calls UpdateThreadListIfNecessary.

But what if these stop infos have different thread lists? Which do you believe? Currently the first update succeeds and then the others seem to be ignored because the stop info hasn’t changed.

So ignoring the threads for the 2nd stop packet is part of the problem. But why are there two in the first place?
ProcessGDBRemote::DoLaunch calls SetLastStopPacket which pushes the packet onto the stack. But then it directly calls SetThreadStopInfo, bypassing RefreshStateAfterStop. So the packet stays on the stack and gets processed again along with the next stop packet.

Hi Ryan,
I made changes to StopInfo recently to add the stack, r238323. I’ll have a closer look at this and see if I can figure out what’s going on.
Thanks for bringing this up.

Best,
Ewan

One question for you:

Are you associating your real threads with Go threads? Or are you letting the real threads show and also showing the go threads? Any ThreadMemory that has had ThreadMemory::SetBackingThread() called with a real thread will be just a thread that is in memory. Any that have had this function called with answer all thread related questions using the ThreadMemory::m_backing_thread_sp. For any threads that aren't backed by a real thread the thread registers will be created by calling OperatingSystemGo::CreateRegisterContextForThread(...) so you should be able to debug what is going wrong by stepping through this. For any threads that are backed by a real thread, it will just get the register context from the real thread via ThreadGDBRemote.

If you can send me a copy of your OperatingSystemGo.cpp and .h file I might be able to help out.

Greg

The codes is at http://reviews.llvm.org/D5871
But I don’t think that will help. OperatingSystemGo::UpdateThreadList is getting called with the wrong real_thread_list the first time lldb stops.

This seems to fix it:

diff --git a/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp b/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
index 3ab8c18…174d3e7 100644
— a/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
+++ b/source/Plugins/Process/gdb-remote/ProcessGDBRemote.cpp
@@ -790,6 +790,8 @@ ProcessGDBRemote::DoConnectRemote (Stream *strm, const char *remote_url)
}

const StateType state = SetThreadStopInfo (response);

  • m_stop_packet_stack.clear();