GDB RSPs non-stop mode capability in v5.0

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Thanks,

Ramana

Hi,

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

The non-stop project was started a couple of years ago, but never really completed. There is some code left in the client which gives off the impression it is supported, but in reality, that’s probably not the case. Furthermore, given the lack of test coverage, there’s no way to tell if even the bits of functionality that were working in the past, are still operational. As for lldb-server, it does not support non-stop mode at all (and has never supported it).

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

Adding non-stop support to lldb-server should be relatively straight-forward actually. For linux, the ptrace api natively operates in “non-stop” mode, and we have to do extra work to the simulate the all-stop mode. There are some details that we would need to figure out, but conceptually, all you would need to do is rip out the all-stop simulation code for this mode (NativeProcessLinux::StopRunningThreads, NativeProcessLinux::SignalIfAllThreadsStopped), etc.). If you’re interested in taking this on, we can discuss it in more detail…

In my mind, the tricky part will be the client, as it contains a lot more code, and a lot of that code assumes that the when Process.GetState()==Stopped, that all threads are stopped as well. Jim Ingham should have the best idea of the kind of work that needs to be done there.

Hope this makes the situation a bit more clear.

cheers,
pl

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

Be sure to speak with myself, Jim Ingham and Pavel in depth before undertaking this task as there will be many changes required.

Greg

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

If we work on this, that’s not the way we should approach breakpoints in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is interested in it, but I think one of the main motivations to add it to GDB was systems where stopping all some threads for even a small amount of time would just break things. You want a way to step over breakpoints without disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to execute the code that has been patched in a different context. You can either move the code someplace else and execute it there or emulate it. Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls this displaced stepping. It’s relatively simple and works great.

I’ve been interested in displaced stepping for different reasons. If we had that capability, it would become much easier to patch code. I’d love to use this to have breakpoint conditions injected and evaluated without round tripping to the debugger when the condition returns false.

Fred

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

If we work on this, that’s not the way we should approach breakpoints in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is interested in it, but I think one of the main motivations to add it to GDB was systems where stopping all some threads for even a small amount of time would just break things. You want a way to step over breakpoints without disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to execute the code that has been patched in a different context. You can either move the code someplace else and execute it there or emulate it. Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls this displaced stepping. It’s relatively simple and works great.

This indeed is one of the changes we would need to do for non-stop mode. We have the EmulateInstruction class in LLDB that is designed just for this kind of thing. You can give the emulator function a read/write memory and read/write register callbacks and a baton and it can execute the instruction and read/write memory and regisrters as needed through the context. It would be very easy to have the read register callback know to take the PC of the original instruction and return it if the PC is requested.

We always got push back in the past about adding full instruction emulation support as Chris Lattner wanted it to exist in LLVM in the tablegen tables, but no one ever got around to doing that part. So we added prologue instruction parsing and any instructions that can modify the PC (for single stepping) to the supported emulated instructions.

So yes, emulating instructions without removing them from the code is one of the things required for this feature. Not impossible, just very time consuming to be able to emulate every instruction out of place. I would love to see that go in and would be happy to review patches for anyone wanting to take this on. Though the question still remains: does this happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some great testing that could happen in the LLVM layers.

I’ve been interested in displaced stepping for different reasons. If we had that capability, it would become much easier to patch code. I’d love to use this to have breakpoint conditions injected and evaluated without round tripping to the debugger when the condition returns false.

Agreed!

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

If we work on this, that’s not the way we should approach breakpoints in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is interested in it, but I think one of the main motivations to add it to GDB was systems where stopping all some threads for even a small amount of time would just break things. You want a way to step over breakpoints without disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to execute the code that has been patched in a different context. You can either move the code someplace else and execute it there or emulate it. Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls this displaced stepping. It’s relatively simple and works great.

This indeed is one of the changes we would need to do for non-stop mode. We have the EmulateInstruction class in LLDB that is designed just for this kind of thing. You can give the emulator function a read/write memory and read/write register callbacks and a baton and it can execute the instruction and read/write memory and regisrters as needed through the context. It would be very easy to have the read register callback know to take the PC of the original instruction and return it if the PC is requested.

We always got push back in the past about adding full instruction emulation support as Chris Lattner wanted it to exist in LLVM in the tablegen tables, but no one ever got around to doing that part. So we added prologue instruction parsing and any instructions that can modify the PC (for single stepping) to the supported emulated instructions.

So yes, emulating instructions without removing them from the code is one of the things required for this feature. Not impossible, just very time consuming to be able to emulate every instruction out of place. I would love to see that go in and would be happy to review patches for anyone wanting to take this on. Though the question still remains: does this happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some great testing that could happen in the LLVM layers.

In my porting experience, emulation is actually rarely needed. Of course, if LLVM had a readily available emulation library we could just use that, but it’s not the case. Most of the time, just copying the instruction to some scratch space and executing it there is enough (you potentially need to patch offsets if the instruction uses PC-relative addressing).

Fred

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

If we work on this, that’s not the way we should approach breakpoints in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is interested in it, but I think one of the main motivations to add it to GDB was systems where stopping all some threads for even a small amount of time would just break things. You want a way to step over breakpoints without disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to execute the code that has been patched in a different context. You can either move the code someplace else and execute it there or emulate it. Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls this displaced stepping. It’s relatively simple and works great.

This indeed is one of the changes we would need to do for non-stop mode. We have the EmulateInstruction class in LLDB that is designed just for this kind of thing. You can give the emulator function a read/write memory and read/write register callbacks and a baton and it can execute the instruction and read/write memory and regisrters as needed through the context. It would be very easy to have the read register callback know to take the PC of the original instruction and return it if the PC is requested.

We always got push back in the past about adding full instruction emulation support as Chris Lattner wanted it to exist in LLVM in the tablegen tables, but no one ever got around to doing that part. So we added prologue instruction parsing and any instructions that can modify the PC (for single stepping) to the supported emulated instructions.

So yes, emulating instructions without removing them from the code is one of the things required for this feature. Not impossible, just very time consuming to be able to emulate every instruction out of place. I would love to see that go in and would be happy to review patches for anyone wanting to take this on. Though the question still remains: does this happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some great testing that could happen in the LLVM layers.

In my porting experience, emulation is actually rarely needed. Of course, if LLVM had a readily available emulation library we could just use that, but it’s not the case. Most of the time, just copying the instruction to some scratch space and executing it there is enough (you potentially need to patch offsets if the instruction uses PC-relative addressing).

That is true but that involves starting and stopping the thread one time which can be time consuming. It is easier to do it this way, but the starting and stopping of a thread is very costly. It would be better to try and emulate all the instructions we can and then fall back to emulating the instruction at another address if needed. Of course, you might be able to emulate the instruction and have a branch that branches to the next real instruction so we just have to start the process again without having to stop it. That would be a nice approach.

The breakpoints aren't a structural problem. If you can figure out a non-code modifying way to handle breakpoints, that would be a very surgical change. And as Fred points out, out of place execution in the target would be really handy for other things, like offloading breakpoint conditions into the target, and only stopping if the condition is true. So this is a well motivated project.

And our model for handling both expression evaluation and execution control are already thread-centric. It would be pretty straight-forward to treat "still running" threads the same way as threads with no interesting stop reasons, for instance.

I think the real difficulty will come at the higher layers. First off, we gate a lot of Command & SB API operations on "is the process running" and that will have to get much more fine-grained. Figuring out a good model for this will be important.

Then you're going to have to figure out what exactly to do when somebody is in the middle of say running a long expression on thread A when thread B stops. What's a useful way to present this information? If lldb is sharing the terminal with the process, you can't just dump output in the middle of command output, but you don't want to delay too long...

Also, the IOHandlers are currently a stack, but that model won't work when the process IOHandler is going to have to be live (at least the output part of it) while the CommandInterpreter IOHandler is also live. That's going to take reworking.

On the event and operations side, I think the fact that we have the separation between the private and public states will make this a lot easier. We can use the event transition from private to public state to serialize the activity that's going on under the covers so that it appears coherent to the user. The fact that lldb goes through separate channels for process I/O and command I/O and we very seldom just dump stuff to stdout will also make solving the problem of competing demands for the user's attention more possible.

And I think we can't do any of this till we have a robust "ProcessMock" plugin that we can use to emulate end-to-end through the debugger all the corner cases that non-stop debugging will bring up. Otherwise there will be no way to reliably test any of this stuff, and it won't ever be stable.

I don't think any of this will be impossible, but it's going to be a lot of work.

Jim

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the "process plugin packet speed-test" command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

If we work on this, that’s not the way we should approach breakpoints in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is interested in it, but I think one of the main motivations to add it to GDB was systems where stopping all some threads for even a small amount of time would just break things. You want a way to step over breakpoints without disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to execute the code that has been patched in a different context. You can either move the code someplace else and execute it there or emulate it. Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls this displaced stepping. It’s relatively simple and works great.

This indeed is one of the changes we would need to do for non-stop mode. We have the EmulateInstruction class in LLDB that is designed just for this kind of thing. You can give the emulator function a read/write memory and read/write register callbacks and a baton and it can execute the instruction and read/write memory and regisrters as needed through the context. It would be very easy to have the read register callback know to take the PC of the original instruction and return it if the PC is requested.

We always got push back in the past about adding full instruction emulation support as Chris Lattner wanted it to exist in LLVM in the tablegen tables, but no one ever got around to doing that part. So we added prologue instruction parsing and any instructions that can modify the PC (for single stepping) to the supported emulated instructions.

So yes, emulating instructions without removing them from the code is one of the things required for this feature. Not impossible, just very time consuming to be able to emulate every instruction out of place. I would _love_ to see that go in and would be happy to review patches for anyone wanting to take this on. Though the question still remains: does this happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some great testing that could happen in the LLVM layers.

In my porting experience, emulation is actually rarely needed. Of course, if LLVM had a readily available emulation library we could just use that, but it’s not the case. Most of the time, just copying the instruction to some scratch space and executing it there is enough (you potentially need to patch offsets if the instruction uses PC-relative addressing).

That is true but that involves starting and stopping the thread one time which can be time consuming. It is easier to do it this way, but the starting and stopping of a thread is very costly. It would be better to try and emulate all the instructions we can and then fall back to emulating the instruction at another address if needed. Of course, you might be able to emulate the instruction and have a branch that branches to the next real instruction so we just have to start the process again without having to stop it. That would be a nice approach.
  
The really cool trick would be to insert the breakpoint as a branch to a landing pad we insert. Then the landing pad could look like:

if (thread_is_supposed_to_stop_at_breakpoints() && breakpoint_condition())
  __builtin_trap();

emulate_instructions_you_needed_to_remove();
jump_back_to_next_instruction();

Then we could support some threads that NEVER stop, and also run breakpoint conditions locally which would make them really cheap. You could even squirrel away some state in the target that told you that a NEVER stop breakpoint hit the breakpoint, so your accounting would still be correct.

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

If we work on this, that’s not the way we should approach breakpoints in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is interested in it, but I think one of the main motivations to add it to GDB was systems where stopping all some threads for even a small amount of time would just break things. You want a way to step over breakpoints without disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to execute the code that has been patched in a different context. You can either move the code someplace else and execute it there or emulate it. Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls this displaced stepping. It’s relatively simple and works great.

This indeed is one of the changes we would need to do for non-stop mode. We have the EmulateInstruction class in LLDB that is designed just for this kind of thing. You can give the emulator function a read/write memory and read/write register callbacks and a baton and it can execute the instruction and read/write memory and regisrters as needed through the context. It would be very easy to have the read register callback know to take the PC of the original instruction and return it if the PC is requested.

We always got push back in the past about adding full instruction emulation support as Chris Lattner wanted it to exist in LLVM in the tablegen tables, but no one ever got around to doing that part. So we added prologue instruction parsing and any instructions that can modify the PC (for single stepping) to the supported emulated instructions.

So yes, emulating instructions without removing them from the code is one of the things required for this feature. Not impossible, just very time consuming to be able to emulate every instruction out of place. I would love to see that go in and would be happy to review patches for anyone wanting to take this on. Though the question still remains: does this happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some great testing that could happen in the LLVM layers.

In my porting experience, emulation is actually rarely needed. Of course, if LLVM had a readily available emulation library we could just use that, but it’s not the case. Most of the time, just copying the instruction to some scratch space and executing it there is enough (you potentially need to patch offsets if the instruction uses PC-relative addressing).

That is true but that involves starting and stopping the thread one time which can be time consuming. It is easier to do it this way, but the starting and stopping of a thread is very costly. It would be better to try and emulate all the instructions we can and then fall back to emulating the instruction at another address if needed. Of course, you might be able to emulate the instruction and have a branch that branches to the next real instruction so we just have to start the process again without having to stop it. That would be a nice approach.

The really cool trick would be to insert the breakpoint as a branch to a landing pad we insert. Then the landing pad could look like:

if (thread_is_supposed_to_stop_at_breakpoints() && breakpoint_condition())
__builtin_trap();

emulate_instructions_you_needed_to_remove();
jump_back_to_next_instruction();

Then we could support some threads that NEVER stop, and also run breakpoint conditions locally which would make them really cheap. You could even squirrel away some state in the target that told you that a NEVER stop breakpoint hit the breakpoint, so your accounting would still be correct.

Even better if we get IDE support to insert a NOP big enough for a branch for each target where the user sets breakpoints if there is a condition on the breakpoint so we don’t need to execute any instructions out of place.

I’m not sure why Ramana is interested in it

Basically http://lists.llvm.org/pipermail/lldb-dev/2017-June/012445.html is what I am trying to implement in lldb which has been discussed in little more details here http://lists.llvm.org/pipermail/lldb-dev/2017-September/012815.html.

Be sure to speak with myself, Jim Ingham and Pavel in depth before undertaking this task as there will be many changes required.

Definitely.

Thank you all for the responses. Will get back after digesting all the responses here.

Regards,

Ramana

You might not need non-stop mode to debug both the CPU and GPU. We did a similar thing, using lldb to debug an Android app and an app running on the Hexagon DSP under Linux. We didn’t use the same lldb, but that’s because Android Studio doesn’t know about Hexagon. The Android app was debugged with Android Studio’s lldb, and in Android Studio we opened a console window and ssh’d to Hexagon Linux, where we ran lldb (yes, Greg, lldb under Linux on the DSP!). We were able to debug the interaction between the CPU and DSP.

The reason I say you might not need non-stop mode is another DSP use case. On our proprietary DSP OS, the debug agent doesn’t stop all threads in a process when one thread stops. Even though lldb acts like all threads are stopped, only one stopped and the others are still running. As long as the stub doesn’t error out when lldb checks the other threads, lldb will behave correctly. If another thread hits a breakpoint while the current one is stopped, the stub waits until it gets a resume to send the stop-reply. So lldb thinks everything is stopped, but it’s not really.

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB
RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_
002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable
the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding
support to LLDB will be harder. One downside of enabling this mode will be
a performance loss in the GDB remote packet transfer. Why? IIRC this mode
requires a read thread where one thread is always reading packets and
putting them into a packet buffer. Threads that want to send a packet an
get a reply must not send the packet then use a condition variable + mutex
to wait for the response. This threading overhead really slows down the
packet transfers. Currently we have a mutex on the GDB remote communication
where each thread that needs to send a packet will take the mutex and then
send the packet and wait for the response on the same thread. I know the
performance differences are large on MacOS, not sure how they are on other
systems. If you do end up enabling this, please run the "process plugin
packet speed-test" command which is available only when debugging with
ProcessGDBRemote. It will send an receive various packets of various sizes
and report speed statistics back to you.

So, in non-stop mode, though we can have threads running asynchronously
(some running, some stopped), the GDB remote packet transfer will be
synchronous i.e. will get queued? And this is because the packet responses
should be matched appropriately as there typically will be a single
connection to the remote target and hence this queueing cannot be avoided?

Also, in lldb at least I see some code relevant to non-stop mode, but is
non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model
where when one thread stops all threads are stopped. There will be quite a
large amount of changes needed for a thread centric model. The biggest
issue I know about is breakpoints. Any time you need to step over a
breakpoint, you must stop all threads, disable the breakpoint, single step
the thread and re-enable the breakpoint, then start all threads again. So
even the thread centric model would need to start and stop all threads many
times.

Greg, what if, while stepping over a breakpoint, the remaining threads can
still continue and no need to disable the breakpoint? What else do I need
to take care of?

The breakpoints aren't a structural problem. If you can figure out a
non-code modifying way to handle breakpoints, that would be a very surgical
change. And as Fred points out, out of place execution in the target would
be really handy for other things, like offloading breakpoint conditions
into the target, and only stopping if the condition is true. So this is a
well motivated project.

And our model for handling both expression evaluation and execution
control are already thread-centric. It would be pretty straight-forward to
treat "still running" threads the same way as threads with no interesting
stop reasons, for instance.

I think the real difficulty will come at the higher layers. First off, we
gate a lot of Command & SB API operations on "is the process running" and
that will have to get much more fine-grained. Figuring out a good model
for this will be important.

Then you're going to have to figure out what exactly to do when somebody
is in the middle of say running a long expression on thread A when thread B
stops. What's a useful way to present this information? If lldb is
sharing the terminal with the process, you can't just dump output in the
middle of command output, but you don't want to delay too long...

Also, the IOHandlers are currently a stack, but that model won't work when
the process IOHandler is going to have to be live (at least the output part
of it) while the CommandInterpreter IOHandler is also live. That's going
to take reworking.

On the event and operations side, I think the fact that we have the
separation between the private and public states will make this a lot
easier. We can use the event transition from private to public state to
serialize the activity that's going on under the covers so that it appears
coherent to the user. The fact that lldb goes through separate channels
for process I/O and command I/O and we very seldom just dump stuff to
stdout will also make solving the problem of competing demands for the
user's attention more possible.

And I think we can't do any of this till we have a robust "ProcessMock"
plugin that we can use to emulate end-to-end through the debugger all the
corner cases that non-stop debugging will bring up. Otherwise there will
be no way to reliably test any of this stuff, and it won't ever be stable.

I don't think any of this will be impossible, but it's going to be a lot
of work.

Jim

Thanks Jim for the comments. Being new to lldb, that's a lot of food for
thought for me. Will get back here after doing some homework on what all
this means.

>
>
>
>>
>>
>>
>>>
>>>
>>>
>>>>
>>>>
>>>>
>>>>>
>>>>>
>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It appears that the lldb-server, as of v5.0, did not implement the
GDB RSPs non-stop mode (https://sourceware.org/gdb/
onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?
>>>>>>
>>>>>> If the support is actually not there, what needs to be changed to
enable the same in lldb-server?
>>>>>
>>>>> As Pavel said, adding support into lldb-server will be easy. Adding
support to LLDB will be harder. One downside of enabling this mode will be
a performance loss in the GDB remote packet transfer. Why? IIRC this mode
requires a read thread where one thread is always reading packets and
putting them into a packet buffer. Threads that want to send a packet an
get a reply must not send the packet then use a condition variable + mutex
to wait for the response. This threading overhead really slows down the
packet transfers. Currently we have a mutex on the GDB remote communication
where each thread that needs to send a packet will take the mutex and then
send the packet and wait for the response on the same thread. I know the
performance differences are large on MacOS, not sure how they are on other
systems. If you do end up enabling this, please run the "process plugin
packet speed-test" command which is available only when debugging with
ProcessGDBRemote. It will send an receive various packets of various sizes
and report speed statistics back to you.
>>>>>>
>>>>>> Also, in lldb at least I see some code relevant to non-stop mode,
but is non-stop mode fully implemented in lldb or there is only partial
support?
>>>>>
>>>>> Everything in LLDB right now assumes a process centric debugging
model where when one thread stops all threads are stopped. There will be
quite a large amount of changes needed for a thread centric model. The
biggest issue I know about is breakpoints. Any time you need to step over a
breakpoint, you must stop all threads, disable the breakpoint, single step
the thread and re-enable the breakpoint, then start all threads again. So
even the thread centric model would need to start and stop all threads many
times.
>>>>
>>>> If we work on this, that’s not the way we should approach breakpoints
in non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is
interested in it, but I think one of the main motivations to add it to GDB
was systems where stopping all some threads for even a small amount of time
would just break things. You want a way to step over breakpoints without
disrupting the other threads.
>>>>
>>>> Instead of removing the breakpoint, you can just teach the debugger
to execute the code that has been patched in a different context. You can
either move the code someplace else and execute it there or emulate it.
Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls
this displaced stepping. It’s relatively simple and works great.
>>>
>>> This indeed is one of the changes we would need to do for non-stop
mode. We have the EmulateInstruction class in LLDB that is designed just
for this kind of thing. You can give the emulator function a read/write
memory and read/write register callbacks and a baton and it can execute the
instruction and read/write memory and regisrters as needed through the
context. It would be very easy to have the read register callback know to
take the PC of the original instruction and return it if the PC is
requested.
>>>
>>> We always got push back in the past about adding full instruction
emulation support as Chris Lattner wanted it to exist in LLVM in the
tablegen tables, but no one ever got around to doing that part. So we added
prologue instruction parsing and any instructions that can modify the PC
(for single stepping) to the supported emulated instructions.
>>>
>>> So yes, emulating instructions without removing them from the code is
one of the things required for this feature. Not impossible, just very time
consuming to be able to emulate every instruction out of place. I would
_love_ to see that go in and would be happy to review patches for anyone
wanting to take this on. Though the question still remains: does this
happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some
great testing that could happen in the LLVM layers.
>>
>> In my porting experience, emulation is actually rarely needed. Of
course, if LLVM had a readily available emulation library we could just use
that, but it’s not the case. Most of the time, just copying the instruction
to some scratch space and executing it there is enough (you potentially
need to patch offsets if the instruction uses PC-relative addressing).
>>
>
> That is true but that involves starting and stopping the thread one time
which can be time consuming. It is easier to do it this way, but the
starting and stopping of a thread is very costly. It would be better to try
and emulate all the instructions we can and then fall back to emulating the
instruction at another address if needed. Of course, you might be able to
emulate the instruction and have a branch that branches to the next real
instruction so we just have to start the process again without having to
stop it. That would be a nice approach.

The really cool trick would be to insert the breakpoint as a branch to a
landing pad we insert. Then the landing pad could look like:

if (thread_is_supposed_to_stop_at_breakpoints() && breakpoint_condition())
  __builtin_trap();

emulate_instructions_you_needed_to_remove();
jump_back_to_next_instruction();

Then we could support some threads that NEVER stop, and also run
breakpoint conditions locally which would make them really cheap. You
could even squirrel away some state in the target that told you that a
NEVER stop breakpoint hit the breakpoint, so your accounting would still be
correct.

In fact, we also do something similar on these lines and it is pretty
straight forward.

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding support to LLDB will be harder. One downside of enabling this mode will be a performance loss in the GDB remote packet transfer. Why? IIRC this mode requires a read thread where one thread is always reading packets and putting them into a packet buffer. Threads that want to send a packet an get a reply must not send the packet then use a condition variable + mutex to wait for the response. This threading overhead really slows down the packet transfers. Currently we have a mutex on the GDB remote communication where each thread that needs to send a packet will take the mutex and then send the packet and wait for the response on the same thread. I know the performance differences are large on MacOS, not sure how they are on other systems. If you do end up enabling this, please run the “process plugin packet speed-test” command which is available only when debugging with ProcessGDBRemote. It will send an receive various packets of various sizes and report speed statistics back to you.

So, in non-stop mode, though we can have threads running asynchronously (some running, some stopped), the GDB remote packet transfer will be synchronous i.e. will get queued?

In the normal mode there is no queueing which means we don’t need a thread to read packets and deliver the right response to the right thread. With non-stop mode we will need a read thread IIRC. The extra threading overhead is costly.

And this is because the packet responses should be matched appropriately as there typically will be a single connection to the remote target and hence this queueing cannot be avoided?

It can’t be avoided because you have to be ready to receive a thread stop packet at any time, even if no packets are being sent. With the normal protocol, you can only receive a stop packet in response to a continue packet. So there is never a time where you can’t just sent the packet and receive the response on the same thread. With non-stop mode, there must be a thread for the stop reply packets for any thread that can stop at any time. Adding threads means ~10,000 cycles of thread synchronization code for each packet.

Also, in lldb at least I see some code relevant to non-stop mode, but is non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model where when one thread stops all threads are stopped. There will be quite a large amount of changes needed for a thread centric model. The biggest issue I know about is breakpoints. Any time you need to step over a breakpoint, you must stop all threads, disable the breakpoint, single step the thread and re-enable the breakpoint, then start all threads again. So even the thread centric model would need to start and stop all threads many times.

Greg, what if, while stepping over a breakpoint, the remaining threads can still continue and no need to disable the breakpoint? What else do I need to take care of?

This is where we would really need the instruction emulation support for executing breakpoint opcodes out of place. I believe the other discussions have highlighted this need. Let me know if that isn’t clear. That is really the only way this feature truly works.

Greg

Hi,

It appears that the lldb-server, as of v5.0, did not implement the GDB
RSPs non-stop mode (https://sourceware.org/gdb/onlinedocs/gdb/Remote-Non_
002dStop.html#Remote-Non_002dStop). Am I wrong?

If the support is actually not there, what needs to be changed to enable
the same in lldb-server?

As Pavel said, adding support into lldb-server will be easy. Adding
support to LLDB will be harder. One downside of enabling this mode will be
a performance loss in the GDB remote packet transfer. Why? IIRC this mode
requires a read thread where one thread is always reading packets and
putting them into a packet buffer. Threads that want to send a packet an
get a reply must not send the packet then use a condition variable + mutex
to wait for the response. This threading overhead really slows down the
packet transfers. Currently we have a mutex on the GDB remote communication
where each thread that needs to send a packet will take the mutex and then
send the packet and wait for the response on the same thread. I know the
performance differences are large on MacOS, not sure how they are on other
systems. If you do end up enabling this, please run the "process plugin
packet speed-test" command which is available only when debugging with
ProcessGDBRemote. It will send an receive various packets of various sizes
and report speed statistics back to you.

Also, in lldb at least I see some code relevant to non-stop mode, but is
non-stop mode fully implemented in lldb or there is only partial support?

Everything in LLDB right now assumes a process centric debugging model
where when one thread stops all threads are stopped. There will be quite a
large amount of changes needed for a thread centric model. The biggest
issue I know about is breakpoints. Any time you need to step over a
breakpoint, you must stop all threads, disable the breakpoint, single step
the thread and re-enable the breakpoint, then start all threads again. So
even the thread centric model would need to start and stop all threads many
times.

If we work on this, that’s not the way we should approach breakpoints in
non-stop mode (and it’s not how GDB does it). I’m not sure why Ramana is
interested in it, but I think one of the main motivations to add it to GDB
was systems where stopping all some threads for even a small amount of time
would just break things. You want a way to step over breakpoints without
disrupting the other threads.

Instead of removing the breakpoint, you can just teach the debugger to
execute the code that has been patched in a different context. You can
either move the code someplace else and execute it there or emulate it.
Sometimes you’ll need to patch it if it is PC-relative. IIRC, GDB calls
this displaced stepping. It’s relatively simple and works great.

This indeed is one of the changes we would need to do for non-stop mode.
We have the EmulateInstruction class in LLDB that is designed just for this
kind of thing. You can give the emulator function a read/write memory and
read/write register callbacks and a baton and it can execute the
instruction and read/write memory and regisrters as needed through the
context. It would be very easy to have the read register callback know to
take the PC of the original instruction and return it if the PC is
requested.

We always got push back in the past about adding full instruction
emulation support as Chris Lattner wanted it to exist in LLVM in the
tablegen tables, but no one ever got around to doing that part. So we added
prologue instruction parsing and any instructions that can modify the PC
(for single stepping) to the supported emulated instructions.

So yes, emulating instructions without removing them from the code is one
of the things required for this feature. Not impossible, just very time
consuming to be able to emulate every instruction out of place. I would
_love_ to see that go in and would be happy to review patches for anyone
wanting to take this on. Though the question still remains: does this
happen in LLVM or in LLDB. Emulating instruction in LLVM might provide some
great testing that could happen in the LLVM layers.

In my porting experience, emulation is actually rarely needed. Of course,
if LLVM had a readily available emulation library we could just use that,
but it’s not the case. Most of the time, just copying the instruction to
some scratch space and executing it there is enough (you potentially need
to patch offsets if the instruction uses PC-relative addressing).

That is true but that involves starting and stopping the thread one time
which can be time consuming. It is easier to do it this way, but the
starting and stopping of a thread is very costly. It would be better to try
and emulate all the instructions we can and then fall back to emulating the
instruction at another address if needed. Of course, you might be able to
emulate the instruction and have a branch that branches to the next real
instruction so we just have to start the process again without having to
stop it. That would be a nice approach.

The really cool trick would be to insert the breakpoint as a branch to a
landing pad we insert. Then the landing pad could look like:

if (thread_is_supposed_to_stop_at_breakpoints() && breakpoint_condition())
__builtin_trap();

emulate_instructions_you_needed_to_remove();
jump_back_to_next_instruction();

Then we could support some threads that NEVER stop, and also run
breakpoint conditions locally which would make them really cheap. You
could even squirrel away some state in the target that told you that a
NEVER stop breakpoint hit the breakpoint, so your accounting would still be
correct.

Even better if we get IDE support to insert a NOP big enough for a branch
for each target where the user sets breakpoints if there is a condition on
the breakpoint so we don't need to execute any instructions out of place.

Yes and that would enable the infrastructure to support in place execution
of the 'cond_list' in GDB RSP's breakpoint packet "Z0,addr,kind[;cond_list…]
[;cmds:persist,cmd_list…]" where cond_list is an optional list of
conditional expressions in byte code form that should be evaluated on the
target’s side. These are the conditions that should be taken into
consideration when deciding if the breakpoint trigger should be reported
back to GDB.

The breakpoints aren't a structural problem. If you can figure out a
non-code modifying way to handle breakpoints, that would be a very surgical
change. And as Fred points out, out of place execution in the target would
be really handy for other things, like offloading breakpoint conditions
into the target, and only stopping if the condition is true. So this is a
well motivated project.

And our model for handling both expression evaluation and execution
control are already thread-centric. It would be pretty straight-forward to
treat "still running" threads the same way as threads with no interesting
stop reasons, for instance.

I think the real difficulty will come at the higher layers. First off, we
gate a lot of Command & SB API operations on "is the process running" and
that will have to get much more fine-grained. Figuring out a good model
for this will be important.

Then you're going to have to figure out what exactly to do when somebody
is in the middle of say running a long expression on thread A when thread B
stops. What's a useful way to present this information? If lldb is
sharing the terminal with the process, you can't just dump output in the
middle of command output, but you don't want to delay too long...

Also, the IOHandlers are currently a stack, but that model won't work when
the process IOHandler is going to have to be live (at least the output part
of it) while the CommandInterpreter IOHandler is also live. That's going
to take reworking.

On the event and operations side, I think the fact that we have the
separation between the private and public states will make this a lot
easier. We can use the event transition from private to public state to
serialize the activity that's going on under the covers so that it appears
coherent to the user. The fact that lldb goes through separate channels
for process I/O and command I/O and we very seldom just dump stuff to
stdout will also make solving the problem of competing demands for the
user's attention more possible.

Thanks Jim for the elaborate view on the non-stop mode support.

BTW my understanding on public vs private states is that the public state
is as known by the user and all the process state changes will be first
tracked with private state which then will be made public, i.e. public
state will be updated, should the user need to know about that process
state change. Is there anything else I am missing on public vs private
states?

I think this is one of the least important obstacles in tackling the non-stop feature, but since we’re already discussing it, I just wanted to point out that there are many ways we can improve the performance here. The read thread is necessary, but only so that we can receieve asynchronous responses when we’re not going any gdb-remote work. If we are already sending some packets, it is superfluous.

As one optimization, we could make sure that the read thread is disabled why we are sending a packet. E.g., the SendPacketAndWaitForResponse could do something like:
SendPacket(msg); // We can do this even while the read thread is doing work

SuspendReadThread(); // Should be cheap as it happens while the remote stub is processing our packet
GetResponse(); // Happens on the main thread, as before
ResumeReadThread(); // Fast.

We could even take this further and have some sort of a RAII object which disables the read thread at a higher level for when we want to be sending a bunch of packets.

Of course, this would need to be implemented with a steady hand and carefully tested, but the good news here is that the gdb-remote protocol is one of the better tested aspects of lldb, with many testing approaches available.

However, I think the place for this discussion is once we have something which is >90% functional…