strange behaviour at lldb cmd line

Hi All,

I would like to start contributing to lldb project and help improve it on linux. I am seeing some strange behaviour that makes lldb appear a little flakey. Some details of my system:

  • Ubuntu 14.04, 64 bit running inside a VM on windows

  • built from top of top of tree with gcc 4.8.2. Issue happens either configure/make or cmake/ninja

  • stock lldb-3.4 version shipped with Ubuntu does not exhibit this behaviour

There are two intermittent issues: 1. When I run a program, I see messages that do not belong (indicating the process was stopped) 2. There appears to be a race condition sending text to the console where (lldb) prompt will come out of order making it appear there is no command prompt.

shawn@shawn-VirtualBox:~/Projects$ ./lldb.sh
(lldb) file a.out
Current executable set to ‘a.out’ (x86_64).
(lldb) br se -l 7
Breakpoint 1: where = a.out`main + 35 at hello2.cpp:7, address = 0x0000000000400553
(lldb) run
Process 2509 launching
Process 2509 launched: ‘/home/shawn/Projects/a.out’ (x86_64)
Process 2509 stopped

  • thread #1: tid = 2509, 0x00007f50bd2af2d0, name = ‘a.out’, stop reason = trace
    frame #0: 0x00007f50bd2af2d0
    → 0x7f50bd2af2d0: movq %rsp, %rdi
    0x7f50bd2af2d3: callq 0x7f50bd2b2a70
    0x7f50bd2af2d8: movq %rax, %r12
    0x7f50bd2af2db: movl 0x221b17(%rip), %eax
    Hello world!
    Process 2509 stopped
  • thread #1: tid = 2509, 0x0000000000400553 a.outmain + 35 at hello2.cpp:7, name = 'a.out', stop reason = breakpoint 1.1 frame #0: 0x0000000000400553 a.outmain + 35 at hello2.cpp:7
    4 {
    5 printf(“Hello world!\n”);
    6
    → 7 return 0;
    8 }
    (lldb) cont
    Process 2509 resuming
    (lldb) Process 2509 exited with status = 0 (0x00000000)

My process was:

Build simple hello world program. gcc -g hello.cpp

Run lldb:

file a.out

br se -l 7

run

cont

Notice all the unexpected stuff before it prints “Hello world!”, also notice the (lldb) prompt that shows up before the “Process 2509 exited” message.

Any suggestions where I can look in the code and start tracking this down?

Thanks,
Shawn Best.

Blueshift Inc.

Shawn,

All of this code is in:

Debugger::HandleProcessEvent (const EventSP &event_sp);

A little background:

LLDB has IOHandlers to handle the stdin/out/err given to the debugger. There is a stack of IOHandler objects that allows us to redirect the stdin/out/err to the appropriate place. The command line interpreter is the top most IOHandler when the process isn't running. So when you launch LLDB but haven't launched your process yet, you get a IOHandler stack like:

1 - LLDB Command Interpreter

When your process is launched, it hooks a pseudo terminal (pty) up the the stdin/out/err of the program you are debugging. When we resume the process and if we launched the process, and stdio was enabled for the process, then we push a ProcessIOHandler by calling:

process->PushProcessIOHandler ()

While your process is running you have:

1 - LLDB Command Interpreter
2 - Process IOHandler

The top item on the IOHandler stack is the "active" IOHandler. As soon as your process stops the Process IOHandler should pop itself off of the stack. When this happens it causes the "LLDB command interpreter" to become active again. As soon as that happens, the "(lldb)" prompt comes out again. Again, all of this is managed in Debugger::HandleProcessEvent().

Now if you attach to a process, then we don't have anything hooked up to the debugged process' in/out/err, so we don't have an IOHandler to push/pop (even though the process->PushProcessIOHandler() and process->PopProcessIOHandler() function might be called, they will actually not push/pop anything).

So now we launch the process and stop, resume the process and it prints "hello" to stdout, we might get this IO asynchronously through a GDB remote packet and the flow is:

1 - get the process stopped event
2 - check if there is any stdout or stderr to display, and if so tell the current IOHandler to hide itself (which will cause the "(lldb) " prompt to disappear, then display the stdout ("hello") followed by a refresh to the top IOHandler (causes "(lldb) " to redisplay itself.

The Debugger::HandleProcessEvent() tries to carefully do all of this to avoid any overlaps or any cases where the prompt comes out in the wrong place. The first thing is set a breakpoint in "Debugger::HandleProcessEvent()" and slowly follow the code and see where things come out incorrectly. No other code should be pushing or popping the ProcessIOHandler. So if you find the ProcessLinux plug-in or some other code trying to pop the process IOHandler (process->PopProcessIOHandler()), the you should remove that and let the Debugger::HandleProcessEvent() handle it.

Let me know what you find and I hope the above info gives you enough to go on.

Greg

Hi Shawn,

There are also some previous threads on this topic in the archives
that could help give you some background. The discussion was in reply
to the r202525 commit message.

I described a hacky workaround that works for me in:
http://lists.cs.uiuc.edu/pipermail/lldb-commits/Week-of-Mon-20140519/011166.html

Can you give it a try? If it "works" for you that will at least
suggest we share the same issue on FreeBSD and Linux.

-Ed

Thanks for the pointers Greg and Ed. Lots to chew on. I will get back with what I find.

Shawn.

I usually find it best to debug one LLDB using another and step through the Debugger::HandleProcessEvent() and watch as things come out wrong.

Hi Ed,

I tried your hack. I can confirm that inserting a 100ms sleep around line 3530 of Process.cpp fixes the issue of the (lldb) prompt getting scrambled on linux.

Shawn.

Hi Greg,

As far as I can tell what's happening here is just that
Process::Resume() completes and the next prompt is emitted (from the
main-thread?) before the IOHandler gets pushed in another thread.

Output from "log enable -n lldb process" with an added log printf
where ::Resume returns:

step
main-thread Process::Resume -- locking run lock
main-thread Process::PrivateResume() m_stop_id = 4, public state:
stopped private state: stopped
main-thread Process::SetPrivateState (running)
main-thread Process thinks the process has resumed.
internal-state(p Process::ShouldBroadcastEvent (0x80c410480) => new
state: running, last broadcast state: running - YES
main-thread Process::PrivateResume() returning
(lldb) internal-state(p Process::HandlePrivateEvent (pid = 15646)
broadcasting new state running (old state stopped) to public
wait4(pid=15646) MonitorChildProcessThreadFunction ::waitpid (pid =
15646, &status, options = 0) => pid = -15646, status = 0x0000057f
(STOPPED), signal = 5, exit_state = 0
internal-state(p PushIOHandler
wait4(pid=15646) Process::SetPrivateState (stopped)

As before, I don't see how we intend to enforce synchronization
between those two threads. It looks like my tiny usleep in
::PrivateResume delays the next prompt just long enough for the other
IOHandler to be pushed.

-Ed

That will do it. It is tough because Process::Resume() might not succeed so we can't always push the ProcessIOHandler.

I need to find a better way to coordinate the pushing of the ProcessIOHandler so it happens from the same thread that initiates the resume. Then we won't have this issue, but I need to carefully do this so it doesn't push it when the process won't be resumed (since it might already be resumed) or in other edge cases.

Other ideas would be to have the Process::Resume() do some synchronization between the current thread and the internal-state thread so it waits for the internal-state thread to get to the running state before it returns from Process::Resume()...

Greg

In addition to the (lldb) prompt out of order, I am also investigating some other strange messages when I run a simple application with no breakpoints. It seems related to thread synchronization surrounding the startup/management of the inferior process.

(lldb) run

Process 4417 launching
Process 4417 stopped

  • thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = ‘a.out’, stop reason = trace
    frame #0: 0x00007f3b99b9c2d0
    → 0x7f3b99b9c2d0: movq %rsp, %rdi
    0x7f3b99b9c2d3: callq 0x7f3b99b9fa70
    0x7f3b99b9c2d8: movq %rax, %r12
    0x7f3b99b9c2db: movl 0x221b17(%rip), %eax

Process 4417 launched: ‘/home/shawn/Projects/a.out’ (x86_64)
Hello world!
The string is Test String : 5
Process 4417 exited with status = 0 (0x00000000)
(lldb)

------------- or ----------------

(lldb) run

Process 4454 launching
Process 4454 launched: ‘/home/shawn/Projects/a.out’ (x86_64)
Process 4454 stopped

  • thread #1: tid = 4454, 0x00007ffdec16c2d0, name = ‘a.out’, stop reason = trace
    frame #0: 0x00007ffdec16c2d0
    error: No such process

Hello world!
The string is Test String : 5
Process 4454 exited with status = 0 (0x00000000)
(lldb)

As it is launching the target application, it appears to stop in a random place (stop reason = trace), and then continue exectuting. When it momentarily stops, I see it pop/push an IOHandler.

I added some logging to ProcessPOSIX, and see it hitting RefreshAfterStop() and DoResume() many times. Is this normal/expected?

I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate, HandleProcessEvent and see big differences in the order of events changing from run to run.

One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and checks the return code,

lldb::pid_t wpid;
if ((wpid = waitpid(pid, &status, 0)) < 0)
{
args->m_error.SetErrorToErrno();
goto FINISH;
}
else …

lldb::pid_t is a uint64, while waitpid returns an int32, with negative numbers used for error codes.
This bug is repeated in a few places

In addition to the (lldb) prompt out of order, I am also investigating some other strange messages when I run a simple application with no breakpoints. It seems related to thread synchronization surrounding the startup/management of the inferior process.

(lldb) run

Process 4417 launching
Process 4417 stopped
* thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = 'a.out', stop reason = trace
    frame #0: 0x00007f3b99b9c2d0
-> 0x7f3b99b9c2d0: movq %rsp, %rdi
   0x7f3b99b9c2d3: callq 0x7f3b99b9fa70
   0x7f3b99b9c2d8: movq %rax, %r12
   0x7f3b99b9c2db: movl 0x221b17(%rip), %eax

Process 4417 launched: '/home/shawn/Projects/a.out' (x86_64)
Hello world!
The string is Test String : 5
Process 4417 exited with status = 0 (0x00000000)
(lldb)

------------- or ----------------

(lldb) run

Process 4454 launching
Process 4454 launched: '/home/shawn/Projects/a.out' (x86_64)
Process 4454 stopped
* thread #1: tid = 4454, 0x00007ffdec16c2d0, name = 'a.out', stop reason = trace
    frame #0: 0x00007ffdec16c2d0
error: No such process

Hello world!
The string is Test String : 5
Process 4454 exited with status = 0 (0x00000000)
(lldb)

As it is launching the target application, it appears to stop in a random place (stop reason = trace), and then continue exectuting. When it momentarily stops, I see it pop/push an IOHandler.

Yes the Process IO Handler is pushed and popped on every _public_ stop. There are notions of public stops that the user finds out about, and private stops where the Process might be in the process of trying to single step over a source line and might start/stop the process many many times.

This stopping at random locations seems like a racy bug in the ProcessLinux that we should really look into fixing.

I added some logging to ProcessPOSIX, and see it hitting RefreshAfterStop() and DoResume() many times. Is this normal/expected?

When you start a process, you will run/stop many times as the shared libraries get loaded. Normally a breakpoint is set in the dynamic loader that allows us to intercept when shared libraries are loaded/unloaded so that may explain a few stops you are seeing.

Other run/stop flurries can result when single stepping over a source line, stepping past a software breakpoint (disable bp, single instruction step, re-enable breakpoint, resume).

I have added a bunch of logging to Push/Pop IOHandler, ThreadCreate, HandleProcessEvent and see big differences in the order of events changing from run to run.

We have a lot of threading in LLDB so some of this will be normal, but other times in can indicate a bug much like you are seeing when the process stops at a random location 0x00007ffdec16c2d0. This could also be an uninitialized variable in ProcessLinux that gets a random value when ProcessLinux (or many other classes like ThreadLinux, etc) when a class instance is initialized. Please do try and track that down. To get a handle on process controls you can enable process and step logging:

(lldb) log enable -T -f /tmp/process.txt lldb process step

Then compare a good and bad run and see what differs.

One other small thing, in POSIX/ProcessMonitor, it calls waitpid() and checks the return code,

    lldb::pid_t wpid;
    if ((wpid = waitpid(pid, &status, 0)) < 0)
    {
        args->m_error.SetErrorToErrno();
        goto FINISH;
    }
    else ...

lldb::pid_t is a uint64, while waitpid returns an int32, with negative numbers used for error codes.
This bug is repeated in a few places

This is bad, please use native types (::pid_t) for these locations so that this works correctly.

So a few things regarding your race conditions:
1 - on linux does a process start running first, then you quickly try to attach to it? If so, this could explain the difference you might be seeing when connecting to a process? On Darwin, our posix_spawn() has a non portable flag that stops the process at the entry point with a SIGSTOP so we are guaranteed to not have a race condition when we launch a process for debugging.
2 - The messages coming in out of order seem to be related to sending the eStateLaunching and eStateStopped not being delivered in the correct order. Your first example, they came through OK, and in the second cased we got a eStateStopped first followed by the eStateLaunching. I would take a look at who is sending these out of order. If you fix this out of order events, it might fix your random stopping at an wrong location?

Greg

This stopping at random locations seems like a racy bug in the ProcessLinux that we should really look into fixing.

(Just doing some correlation with the llgs NativeProcessLinux code, which is from the same lineage as the Linux ProcessMonitor but has diverged somewhat).

Here’s another interesting bit. I have been combing through the NativeProcessLinux code in the llgs branch as I tighten everything up and work through known issues.

So for this part Shawn called out above:

(lldb) run

Process 4417 launching
Process 4417 stopped

  • thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = ‘a.out’, stop reason = trace
    frame #0: 0x00007f3b99b9c2d0
    → 0x7f3b99b9c2d0: movq %rsp, %rdi
    0x7f3b99b9c2d3: callq 0x7f3b99b9fa70
    0x7f3b99b9c2d8: movq %rax, %r12
    0x7f3b99b9c2db: movl 0x221b17(%rip), %eax

Process 4417 launched: ‘/home/shawn/Projects/a.out’ (x86_64)

I have been seeing this in lldb connected to llgs as well as on local debugging. It happens sometimes. However, on lldb <=> llgs, I have complete view of the protocol (the gdb-remote packets), and I can see that the disassembled stop point is not ever coming up in the gdb-remote log as a stop notification (a T). I am starting to suspect that there might be something in the stack unwinding, symbolication or something else that is perhaps racey or maybe sensitive to taking too long to perform some step. The address listed in the llgs case is close to some addresses that are getting probed on the gdb-remote protocol.

I’m still tracking this down in the llgs case and I’ll double check to make sure I’m not somehow misreading the gdb-remote packet. More on this later if I find anything useful.

-Todd

Er…

Ok - so that’s not quite right.

In the llgs case, there is an initial probe with a $? command that asks where things are at after the inferior is launched. One of the expedited registers (well, the PC) has the little-endian value for the address I see listed in the T response to the $? on startup. I’m going to look and see if the $? is really supposed to respond with a T on startup. I’ve noticed (and adhere to) not responding with a T on launch for the initial stop. It might be that $? should likewise not respond with anything there.

So this still might be a different thing on llgs than on local Linux.

-Todd

One of the expedited registers (well, the PC) has the little-endian value for the address I see listed in the T response to the $? on startup.

Meaning, for the spurious disassembly on “(lldb) run”, e.g.

thread #1: tid = 4417, 0x00007f3b99b9c2d0, name = ‘a.out’, stop reason = trace

from above, the address of the stop is in fact the PC from the expedited registers in the first T response to the $? query.

As soon as we attach with any GDB server, we expect one of two things:
1 - a process and all its threads are stopped. the $? packet should respond with any thread that has a stop reason _or_ if no threads have a stop reason, then report a $T packet for the first thread. You can probably respond with $T00 if there is no real signal or reason the thread stopped. Some debuggers like to respond with $T05 (SIGTRAP), but I would rather we don't lie and report a bogus SIGTRAP signal and tell the truth ($T00 or no signal).
2 - there is no process which means qProcessInfo respond with an invalid value or error, or for backward compatibility qC responds with an invalid value. This lets us know we don't have a process and no $? should be issued.

If you use the packet disassembler, you should be able to tell where this is stopping and see the packets that lead up to it to see what it is doing. If you have a trace of this issue happening, please send me a copy offline and I might be able to help.

Greg

Thanks, Greg. I’ll get back to you offline on this on the llgs side after I finish the upstream. Since we’re seeing it with both local and llgs-based linux debugging, I’m not going to hold the upstreaming for it.

-Todd

Greg Clayton wrote:

As soon as we attach with any GDB server, we expect one of two things:
1 - a process and all its threads are stopped. the $? packet should respond with any thread that has a stop reason _or_ if no threads have a stop reason, then report a $T packet for the first thread. You can probably respond with $T00 if there is no real signal or reason the thread stopped. Some debuggers like to respond with $T05 (SIGTRAP), but I would rather we don't lie and report a bogus SIGTRAP signal and tell the truth ($T00 or no signal).
2 - there is no process which means qProcessInfo respond with an invalid value or error, or for backward compatibility qC responds with an invalid value. This lets us know we don't have a process and no $? should be issued.

Hi Greg,

I appreciate this question may be slightly off-topic, but you mentioned that on attaching to a GDB server you expect "a process and all it's threads are stopped" (or no process).

Does the concept of a non-invasive attach exist in lldb? By that I mean can you attach to a process without stopping it.

We have this feature in our current debugger, in additional to permitting memory and register reads for certain areas of our chips, it may useful to a developer to inspect whether a device is "still running".

I've also observed that the Microsoft Visual Studio debugger also permits attach onto a running process, without it's interruption.

Is such a feature achievable using lldb via it's "target" or remoting commands?

thanks
Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.

Greg Clayton wrote:

As soon as we attach with any GDB server, we expect one of two things:
1 - a process and all its threads are stopped. the $? packet should respond with any thread that has a stop reason _or_ if no threads have a stop reason, then report a $T packet for the first thread. You can probably respond with $T00 if there is no real signal or reason the thread stopped. Some debuggers like to respond with $T05 (SIGTRAP), but I would rather we don't lie and report a bogus SIGTRAP signal and tell the truth ($T00 or no signal).
2 - there is no process which means qProcessInfo respond with an invalid value or error, or for backward compatibility qC responds with an invalid value. This lets us know we don't have a process and no $? should be issued.

Hi Greg,

I appreciate this question may be slightly off-topic, but you mentioned that on attaching to a GDB server you expect "a process and all it's threads are stopped" (or no process).

Does the concept of a non-invasive attach exist in lldb? By that I mean can you attach to a process without stopping it.

It does, just not with GDB remote. The GDB remote protocol has a very simple design: send a packet then wait for a response and the protocol can do nothing else unless interrupted (by sending a CTRL+C byte (0x03)).

So we would have to issue a command like "launch process" (the 'A' packet) and we get a response ("OK"). Now the process is assumed to be stopped as far as I know since the only way to make the process run is to issue a continue packet ("c" or "vCont:c") or a step packet ("s" or "vCont:s"). When we send these packets we wait for a response ("$TSS;..." where SS is a hex signal number followed by key value pairs.

So there is no real feature in the protocol that allows for running without first sending some form of a run packet (continue or step).

We have this feature in our current debugger, in additional to permitting memory and register reads for certain areas of our chips, it may useful to a developer to inspect whether a device is "still running".

Yep, this can all be done by your custom lldb_process::Process subclass. With the internal LLDB API, we have no problem saying a process is running immediately after an attach, you would simply send a eStateLaunching event followed by a eStateRunning event. We might have to tweak a few things in LLDB to make sure this works, but it is possible.

But if you create a target and then you set breakpoints:
(lldb) target create /bin/ls
(lldb) b malloc
(lldb) b free

Then you launch:

(lldb) process launch

Do you really want LLDB to try and set the "malloc" and "free" breakpoints while your process is running? This will make you potentially miss the first N breakpoints hits if your process doesn't start up stopped.

I've also observed that the Microsoft Visual Studio debugger also permits attach onto a running process, without it's interruption.

On most systems, even though attaching says it doesn't do anything, the process often will briefly be stopped. I believe all unix variants will stop with a SIGSTOP or SIGTRAP when you call ptrace() with the attach command.

Is such a feature achievable using lldb via it's "target" or remoting commands?

We have also had requests for attaching to a running process, but the debugger is going to want to stop right away and try and set breakpoints before continuing to ensure the breakpoints can be hit.

So we can easily modify LLDB to do this kind of thing if this is important, but the GDB remote protocol isn't a great target to make that work with. If we somehow taught the GDB remote protocol to not return from the 'A' packet which launches a process, the only thing we can do is interrupt the 'A' packet in order to send the "set breakpoint" packet. So it would go like:

(lldb) process attach ...

send: send attach packet
wait for response which doesn't come back since we want /bin/ls to run

Now LLDB wants to send a breakpoint packet to set the breakpoints for "malloc" and "free" but it can't since GDB remote can only send one packet at a time so we must interrupt:

send: 0x03 (interrupt)
recv: stop reply packet from interrupt
send: set breakpoint at malloc packet
recv: reply for breakpoint packet
send: continue
send: 0x03 (interrupt)
recv: stop reply packet from interrupt
send: set breakpoint at free packet
recv: reply for breakpoint packet
send: continue

So now we have stopped the target multiple times because we wanted to send the breakpoint packets while the target is running, instead of just once right after we attach when the process is stopped.

So to sum up: yes this is possible with LLDB, but the ProcessGDBRemote is not a great candidate for this change due to its design.

Greg

Greg Clayton wrote:

Greg Clayton wrote:

As soon as we attach with any GDB server, we expect one of two things:
1 - a process and all its threads are stopped. the $? packet should respond with any thread that has a stop reason _or_ if no threads have a stop reason, then report a $T packet for the first thread. You can probably respond with $T00 if there is no real signal or reason the thread stopped. Some debuggers like to respond with $T05 (SIGTRAP), but I would rather we don't lie and report a bogus SIGTRAP signal and tell the truth ($T00 or no signal).
2 - there is no process which means qProcessInfo respond with an invalid value or error, or for backward compatibility qC responds with an invalid value. This lets us know we don't have a process and no $? should be issued.

Hi Greg,

I appreciate this question may be slightly off-topic, but you mentioned that on attaching to a GDB server you expect "a process and all it's threads are stopped" (or no process).

Does the concept of a non-invasive attach exist in lldb? By that I mean can you attach to a process without stopping it.

It does, just not with GDB remote. The GDB remote protocol has a very simple design: send a packet then wait for a response and the protocol can do nothing else unless interrupted (by sending a CTRL+C byte (0x03)).

So we would have to issue a command like "launch process" (the 'A' packet) and we get a response ("OK"). Now the process is assumed to be stopped as far as I know since the only way to make the process run is to issue a continue packet ("c" or "vCont:c") or a step packet ("s" or "vCont:s"). When we send these packets we wait for a response ("$TSS;..." where SS is a hex signal number followed by key value pairs.

So there is no real feature in the protocol that allows for running without first sending some form of a run packet (continue or step).

Yes, that's my conclusion, too.

We have this feature in our current debugger, in additional to permitting memory and register reads for certain areas of our chips, it may useful to a developer to inspect whether a device is "still running".

Yep, this can all be done by your custom lldb_process::Process subclass. With the internal LLDB API, we have no problem saying a process is running immediately after an attach, you would simply send a eStateLaunching event followed by a eStateRunning event. We might have to tweak a few things in LLDB to make sure this works, but it is possible.

I see what you mean. But then again I think I'm stuck on using ProcessGDBRemote since that is the process subclass instantiated by lldb on invoking "gdb-remote". (Especially in the bare-metal case, previously discussed, where an ELF is not supplied, and thus lldb cannot glean the kind of information required to know the process sub-class to create). As such I'm not sure how far I can actually progress any such custom process sub-class.

But if you create a target and then you set breakpoints:
(lldb) target create /bin/ls
(lldb) b malloc
(lldb) b free

Then you launch:

(lldb) process launch

Do you really want LLDB to try and set the "malloc" and "free" breakpoints while your process is running? This will make you potentially miss the first N breakpoints hits if your process doesn't start up stopped.

Right. In my world, with regard your breakpoint example, it is up to "our users" (the people using our debuggers to develop embedded systems), to determine the "safety" of placing breakpoints in running targets. Typically they'd stop the target first, though. (We actually have various architectures, and it is a hardware dependent thing if that hardware support breakpoint modification whilst the chip is running).

I've also observed that the Microsoft Visual Studio debugger also permits attach onto a running process, without it's interruption.

On most systems, even though attaching says it doesn't do anything, the process often will briefly be stopped. I believe all unix variants will stop with a SIGSTOP or SIGTRAP when you call ptrace() with the attach command.

In the UNIX world I totally agree with you (regarding attach SIGSTOP/TRAP etc.) In my embedded experience, though, things can often be very different. We connect to development boards over USB/SPI interfaces and we can (hardware depending) read various "status" registers regardless of the run-state and determine whether the device is running, stopped, at break, PC value etc.

Is such a feature achievable using lldb via it's "target" or remoting commands?

We have also had requests for attaching to a running process, but the debugger is going to want to stop right away and try and set breakpoints before continuing to ensure the breakpoints can be hit.

So we can easily modify LLDB to do this kind of thing if this is important, but the GDB remote protocol isn't a great target to make that work with. If we somehow taught the GDB remote protocol to not return from the 'A' packet which launches a process, the only thing we can do is interrupt the 'A' packet in order to send the "set breakpoint" packet. So it would go like:

(lldb) process attach ...

send: send attach packet
wait for response which doesn't come back since we want /bin/ls to run

Now LLDB wants to send a breakpoint packet to set the breakpoints for "malloc" and "free" but it can't since GDB remote can only send one packet at a time so we must interrupt:

send: 0x03 (interrupt)
recv: stop reply packet from interrupt
send: set breakpoint at malloc packet
recv: reply for breakpoint packet
send: continue
send: 0x03 (interrupt)
recv: stop reply packet from interrupt
send: set breakpoint at free packet
recv: reply for breakpoint packet
send: continue

So now we have stopped the target multiple times because we wanted to send the breakpoint packets while the target is running, instead of just once right after we attach when the process is stopped.

So to sum up: yes this is possible with LLDB, but the ProcessGDBRemote is not a great candidate for this change due to its design.

Greg

Yes, I agree ProcessGDBRemote is not a good place for this. It seems to me that the gdb/UNIX tradition of always assuming a stopped target upon attach is a tough one to buck. Since, for my users, the important use-case is to inspect whether the target is "still running" (prior to an optional invasive attach), I personally think that the platform commands could provide me with a route forward....

I see we have:
(lldb) platform process info <pid>

Perhaps this command can be augmented to provide status information (in a similar same way to how linux /proc/<pid>/status file works)? It's then up to the user to attach and therefore intrude on the process. Do you think such an approach is favourable?

thanks
Matt

Member of the CSR plc group of companies. CSR plc registered in England and Wales, registered number 4187346, registered office Churchill House, Cambridge Business Park, Cowley Road, Cambridge, CB4 0WZ, United Kingdom
More information can be found at www.csr.com. Keep up to date with CSR on our technical blog, www.csr.com/blog, CSR people blog, www.csr.com/people, YouTube, www.youtube.com/user/CSRplc, Facebook, www.facebook.com/pages/CSR/191038434253534, or follow us on Twitter at www.twitter.com/CSR_plc.
New for 2014, you can now access the wide range of products powered by aptX at www.aptx.com.