Different Thread Plan Depending on when `target module add` is Invoked

Hi again everyone,

I have a curious issue, and I wondered if any thread plan experts could help me understand what I’ve done wrong.

I’ve pored over log output with the following enabled, have some observations below.

log enable lldb process
log enable lldb step
log enable lldb thread
log enable gdb-remote process
log enable gdb-remote step
log enable gdb-remote thread
log enable gdb-remote packets

I can try to redact some info from the logs to make them shareable, if no one has guesses based on the descriptions below. Suggestions for other logs I should turn on would be great, too.

Summary

  • I’m manually loading modules/debuginfo on the host with target module add
  • If I add the module before hitting a breakpoint, everything is fine.
  • If I add the module after hitting a breakpoint, I can’t continue past the breakpoint.
    • NOTE: It can be any ELF given to the command. For instance, I just load another app or library’s debug info (so impossible to have a module on the target), the issue still repros.
  • It’s a remote target, with a from-scratch lldb-server which largely mimics Linux behavior to utilize the remote-linux platform already in the client (client is stock, unchanged).
  • The inferior has two threads, and 4 modules loaded on the target, though, only 2 (one main, and another with lower-level functionality) are involved in the repro. One thread is the main thread where most stuff happens, and it largely executes code from the main module, but, it does also execute code in the second module. The second thread only executes thread in the second module.
  • In a pure Linux environment, with stock client and stock lldb-server, I can’t recreate the issue, so, I suspect something is amiss in my server.

Additional Observations

  • In both repro and no-repro scenarios, both threads are attach-halting in the same module and at the same address.
  • In the repro case (target module add invoked after breakpoint hit), when attempting to continue past the breakpoint, both threads are given a plan that includes stopping the other thread, which may be okay? I suppose the intent is to step each thread over the breakpoint, and then continue? The state for the plan for the is stepping and the other thread is suspended , and then next the client has thread plans: main thread is suspended + stop others = 0 , second thread is stepping + stop others = 1 , which again may be okay, two vCont;s packets are sent to do the step, but then no vCont;c is ever sent to actually continue the process, however, the client believes the process is running.

At the high level, what operation are you doing that changes behavior? Step, next, are the threads sitting at a breakpoint?

When a thread stops at a breakpoint instruction, lldb will say the thread has hit that breakpoint. When the process resumes, lldb will push a thread plan to instruction step past the breakpoint: only allowing this thread to execute, put the original instruction back / disable the breakpoint, then instruction-step, then re-insert the breakpoint instruction. It will do this for all threads sitting at a breakpoint site. Then it will resume the process, or signal a new breakpoint hit if one of the threads is now at another breakpoint location.

When you are doing a source-level step or next, it is often implemented either as a series of instruction-steps (which usually only runs that one thread), or if lldb sees that there is a contiguous block of instructions that do not branch, it will set a breakpoint at the end of the block of instructions and continue all threads.

If you’re doing a step or next and the code branches into a function call that lldb does not want to show the user, it will set a breakpoint on the return address and resume execution.

Hi Jason,

Thanks so much for such a fast reply.

I suppose from the high level, the exact series of invocations might be useful? I’ll give those for the repro case and no-repro:

Repro Steps

platform select remote-linux
platform connect connect://localhost:12345  # remote target is surfaced as a local port.
target create path/to/app.elf
b InsideLoop    # app is simple infinite loop for testing
process attach --pid PID
continue   # continue to breakpoint
target module add some/other/thing.elf
continue
# Process remains halted, but, client believes it's running

No Repro Steps

platform select remote-linux
platform connect connect://localhost:12345  # remote target is surfaced as a local port.
target create path/to/app.elf
b InsideLoop    # app is simple infinite loop for testing
process attach --pid PID
target module add some/other/thing.elf   # Simply moved the add before continuing from the attach-break to the breakpoint.
continue   # continue to breakpoint works
continue   # continue past breakpoint works (comes back and hits the break again.)

It does also occur if I step instead of continue.

I hope this is the high level you meant. I’m happy to provide more description, though.

In the “No Repro” case, can you keep doing a continue, and hit the breakpoint an arbitrary number of times?

What’s the output for “target modules list”? Before and after the continue in the “Repro” case, before and after a 2nd continue in the “No Repro” case. I’m interested in seeing what lldb is saying after the module is added. Feel free to redact anything that might contain proprietary info.

Is InsideLoop in the main module? Are both threads stopped at that breakpoint when you do the continue?

Is some/other/thing.elf a static or dynamic binary? Does it have a static or dynamic symbol table? Are you using “target modules load” to tell lldb the address its symbols are loaded at?

Can you provide packet logs from the Repro and Non Repro continues? I’m interested in seeing if there’s a divergence, because I wouldn’t expect lldb to care about additional modules if we’re stopped in the main module, and don’t do anything with the additional modules at that time.

I rarely work with linux but are you sure target modules add thing.elf is sufficient to load the symbols in the correct virtual address? Normally when you add a binary to the Target by hand like this, you need to do target modules load to tell lldb where the binary is loaded. If you don’t do that, it is possible for the “file addresses” in the binary to overlap with the actual virtual address space of the process you may see inconsistent behavior if something uses those file addresses.

In a pure Linux environment, with stock client and stock lldb-server, I can’t recreate the issue, so, I suspect something is amiss in my server.

I realized I didn’t clarify this well. The stock client is a Windows lldb.exe, the stock server is a Linux server. It doesn’t repro here at all (which is why I suspect a deficiency in my custom Linux-mimicking server)

And one other note, my actual use case is using the SB C++ API, where I do SBTarget::AddModule followed by a SBTarget::SetModuleLoadAddress. As @jasonmolenda points out below, I should probably do a target modules load for the repro in lldb.exe – my thought was since I’m not actually executing code in the module I’m loading, or interacting with the symbols in anyway besides loading them, that the target module add would be sufficient.



In the “No Repro” case, can you keep doing a continue, and hit the breakpoint an arbitrary number of times?

Yup. Can just hold down enter and let it rip.

What’s the output for “target modules list”? Before and after the continue in the “Repro” case, before and after a 2nd continue in the “No Repro” case. I’m interested in seeing what lldb is saying after the module is added. Feel free to redact anything that might contain proprietary info.

Repro

(lldb) target modules list
(lldb) attach 42
Process 42 stopped
# NOTE: Both threads attach-break in second module
* thread #1, name = 'Primary', stop reason = signal SIGTRAP
(lldb) c
Process 42 resuming
Process 42 stopped
* thread #1, name = 'Primary', stop reason = breakpoint 1.1
(lldb) target module add C:\Users\me\work\thing.elf
(lldb) target modules list
[  0] 26DF71DF-0E19-5DB1-4D52-B5D056FB7C99-CFC9ECF0 0x0000000006020000 C:\Users\me\work\app.elf
[  1] 3B32420F-917E-E4B8-3F47-0F41171440A5-E861BF9E thing.elf[0x0000000000000000] C:\Users\me\work\thing.elf
(lldb) c
Process 42 resuming
(lldb) target modules list
[  0] 26DF71DF-0E19-5DB1-4D52-B5D056FB7C99-CFC9ECF0 0x0000000006020000 C:\Users\me\work\app.elf
[  1] 3B32420F-917E-E4B8-3F47-0F41171440A5-E861BF9E thing.elf[0x0000000000000000] C:\Users\me\work\thing.elf

No Repro

(lldb) attach 42
Process 42 stopped
# NOTE: Both threads attach-break in second module
(lldb) target modules list
[  0] 26DF71DF-0E19-5DB1-4D52-B5D056FB7C99-CFC9ECF0 0x0000000006020000 C:\Users\me\work\app.elf
(lldb) target modules add C:\Users\me\work\thing.elf
(lldb) target modules list
[  0] 26DF71DF-0E19-5DB1-4D52-B5D056FB7C99-CFC9ECF0 0x0000000006020000 C:\Users\me\work\app.elf
[  1] 3B32420F-917E-E4B8-3F47-0F41171440A5-E861BF9E thing.elf[0x0000000000000000] C:\Users\me\work\thing.elf
(lldb) c   # FIRST
Process 42 resuming
(Process 42 stopped
* thread #1, name = 'Primary', stop reason = breakpoint 1.1
(lldb) target modules list
[  0] 26DF71DF-0E19-5DB1-4D52-B5D056FB7C99-CFC9ECF0 0x0000000006020000 C:\Users\me\work\app.elf
[  1] 3B32420F-917E-E4B8-3F47-0F41171440A5-E861BF9E thing.elf[0x0000000000000000] C:\Users\me\work\thing.elf
(lldb) c   # SECOND
Process 42 resuming
Process 42 stopped
* thread #1, name = 'Primary', stop reason = breakpoint 1.1
(lldb) target modules list
[  0] 26DF71DF-0E19-5DB1-4D52-B5D056FB7C99-CFC9ECF0 0x0000000006020000 C:\Users\me\work\app.elf
[  1] 3B32420F-917E-E4B8-3F47-0F41171440A5-E861BF9E thing.elf[0x0000000000000000] C:\Users\me\work\thing.elf

Is InsideLoop in the main module? Are both threads stopped at that breakpoint when you do the continue?

Yup, it is in the main module. Only the first thread should execute code inside the main module, and it is the only thread that hits the breakpoint there.

Is some/other/thing.elf a static or dynamic binary? Does it have a static or dynamic symbol table? Are you using “target modules load” to tell lldb the address its symbols are loaded at?

Dynamic for the actual use-case, but, I’ve repro’d by just loading any module (including static) at all (i.e. a module that isn’t linked or used by the app).

Can you provide packet logs from the Repro and Non Repro continues? I’m interested in seeing if there’s a divergence, because I wouldn’t expect lldb to care about additional modules if we’re stopped in the main module, and don’t do anything with the additional modules at that time.

Yeah, I will attempt clean up some logs to share. I’ll respond again when I have them. I will note, the attach-halt (in both repro and no-repro) shows both threads halted at the same address (which does make sense, as the main thread does execute code in the second module as well, and the address is a sync point for event handling.)

I rarely work with linux but are you sure target modules add thing.elf is sufficient to load the symbols in the correct virtual address? Normally when you add a binary to the Target by hand like this, you need to do target modules load to tell lldb where the binary is loaded. If you don’t do that, it is possible for the “file addresses” in the binary to overlap with the actual virtual address space of the process you may see inconsistent behavior if something uses those file addresses.

This is a good thought, and hadn’t occurred to me. I noted just now, in my response to Ted, my original repro was with the SB C++ API, where I SBTarget::AddModule to load the symbols, and then use SBTarget::SetModuleLoadAddress to provided the correct load address (I get from the target).

It was after that, I attempted to repro on CLI, but, you’re correct I probably should have done target modules load there. Though, I would expect (perhaps wrongly) that since I’m not actually do anything with the module, and no threads are executing code in it in any repos, that it should be okay?

EDIT: Although I guess even though the module has a 0 load address, section address offsets could still conflict with modules for which load addresses were resolved?

A little bit of a tangent, but several years ago we improved the remote-linux platform, so it didn’t care what the host triple was. My motivation for the part I did was debugging Hexagon Linux while on x86 Windows. I wanted to use the Visual Studio debugger instead of command line (or Eclipse) LLDB or GDB on Linux.

Bottom line - debugging a “Linux” program from Windows LLDB is 100% ok.

1 Like

I can absolutely attest to it working exceptionally well, Windows+Linux. Until I break things, everything is extremely robust, and Just Works :tm:.

So, big thanks for that, as well as all your help in this thread.

Got distracted and meant to come fill in some details. Huge thanks to Ted, no telling how long I’d have beat my head against this without their help.

The underlying issue turned out to be the p packets. The server reported supporting the thread-id extension, but, didn’t respond with the requested thread ID, and I was assuming it would get that info from the jThreadsInfo packet, instead.