[RFC] More complete multiprocess support in LLDB

Hi, everyone.

We’d like to put some more work into multiprocess debugging support
in LLDB. We’ve gotten some initial suggestions how to proceed from
Pavel Labath and we’d like to put the problem to wider discussion
on the ml.

Right now, LLDB has some fundamentals useful for multiprocess support.
In particular, to the best of my knowledge:

  1. LLDB client supports multiple targets, and this can be used to debug
    multiple independent processes,

  2. we’ve recently added fork tracing support to lldb-server but it’s
    very limited and requires the client to immediately detach one
    of the processes in order to continue tracing the other one.

In our opinion, what we’re missing the most right now is the ability to
reliably continue tracing both the parent and the child after a fork.
We’re very much into GDB compatibility, so we’d like to implement this
using the same protocol as GDB’s multiprocess extension.

The rough plan we’ve come up with is to:

  1. Implement a subset of GDB non-stop mode at the protocol level,
    The rough idea is to provide the support for asynchronous notifications
    that should make it easier to handle events from multiple processes.
    We’re not planning to work on implementing actual non-stop mode in LLDB
    client at the moment, though this is going to make it easier to do so
    in the future (i.e. we’re going to always stop all threads
    in the stopped process anyway).

  2. Implement complete support for GDB-style multiprocess in lldb-
    server. There’s already some support code present but we need to
    improve all request and response code to handle the possibility of
    multiple processes being traced correctly.

  3. Implement the support for handling multiple LLDB client targets
    through a single serialized gdb-remote connection. Basically, when
    multiprocess support is enabled, LLDB would automatically create new
    targets on forks, and all targets would be multiplexed over the same
    connection using multiprocess extensions.

We believe this to be a most feasible approach for the time being.
At the same time, it shouldn’t block further improvements in the future
(e.g. a full client non-stop mode).

What do you think?

Right now the Process plugin for gdb-remote communication: ProcessGDBRemote owns the connection to the debugserver, but the Process plugins also know about a particular process (threads, etc). So you need to have two different Process plugins objects to support the process specific stuff, but share the same GDBRemote connection.

Were you going to write a ProcessGDBRemoteShared or something like that so that multiple targets can share a connection to the server?

Jim

Actually, I was thinking of using a GDBRemoteCommunicationClient that’s shared between multiple ProcessGDBRemote instances but admittedly, I haven’t verified whether that would be sufficient.

Hello all,

I’d like to explain my reasoning behind this proposal a bit.
[Disclaimer: Michał consulted the initial proposal with me, but I do not speak for him.]

The way I see it, there are two main directions that fork support could take:

  1. establish a new gdb-remote connection for each forked process
  2. use gdb’s multiprocess extensions

In a way, option 1 is easier, as it does require changing our one-connection-per-process assumptions, which are embedded pretty well into the source code. It’s two main disadvantages are:

  • Creating a new connection (possibly with a new lldb-server instance) requires a fairly elaborate dance to hand off the newly forked process
  • it’s not gdb-compatible

The main problem with option 2 is the mismatch between how events are represented in this multiprocess connection and the way we’d like to represent them in lldb. In a all-stop multi-process connection, when a single process (thread) hits a breakpoint, the server stops all processes before notifying the client. In gdb, this is not a problem, since it exposes these events to the user exactly as they appear on the connection.

However, I’m not sure that we would want to do it that way in lldb. This kind of inter-process dependence would be a new concept to lldb, and in order to do it properly, we’d have to add a fair number of new APIs (both CLI and SB), so we can:

  • let the user now what targets are connected and are controlled together
  • this may need a new “this process stopped because due to an event in some other process” stop reason
  • figure out how to suspend/resume individual processes within the process group (as they cannot be controlled independently)
  • etc.

So, I figured it would be most lldb-like if we kept the processes independent (at the UI level), and handled the necessary translation behind the scenes. The simplest way I could think of was to control the processes in non-stop mode at the gdb-remote level, and then simulate the process-wide (but not inter-process stop) in the client. One could also imagine an opposite approach, where we use all-process all-stop mode at the connection level, but then simulate process independence in the client, but this seemed like a lot more fragile approach.

Anyway, tl;dr: I support this approach, but I would like to hear what others think about it as well.

Yes, I don’t think that a shared Process class makes sense in this setup, though I can possibly imagine having some intermediary layer between the ProcessGDBRemote class (representing a single process) and the GDBRemoteCommunicationClient class (representing the connection and all processes within it). This class might handle some of the all-stop/non-stop translation aspects and filter the data that’s not relevant for a particular process, but it’s hard to say whether this is desirable/needed without seeing the implementation.

The “one server many process model” really is a mismatch to the lldb architecture. Supporting it in lldb will be ugly, and involve “I want to set a breakpoint in process A so now I have to interrupt both processes to do so” and other such behavior oddities we would need to paper over. We also don’t get it for free on the stub side. We would have to add support for the gdb multiprocess extensions to our other stubs (lldb-server and debugserver) - since it would be weird for lldb to have a feature that only works when talking to gdbserver.

If this were a great way to do multiprocess debugging, then I could see putting in the effort to support it in lldb and lldb-server and debugserver. But is this really the best way to do multiprocess debugging?

debug stubs tend to exercise many dark corners of their host OS-es and are prone to stalls and other bad behavior. So an architecture that puts all our process eggs in one basket - having first made that basket even more complicated - doesn’t seem like a great design to me.

For the general problem of multiprocess debugging, there are two cases we want to support:

  1. debugging two unrelated processes.
  2. Following the children of a process we are debugging.

For 1) we already have lldb-server in server mode to do that job for us, I can’t see why we would do it through a single server. There’s really no advantage in this case.

For 2) having the server handle the forked child is really attractive, because it already has control of it, and so by keeping that control you avoid the “hand off dance”.

However, lldb-server in platform mode already knows how to make a lldb-server in server mode and attach it to something, and the platform code is sitting around waiting to be used. It does not seem to me particularly hard to get lldb-server in server mode to access that code and make a new lldb-server, attach to the forked child, and send an asynchronous gdb-remote packet with the port of the new lldb-server. You would still have to do a handoff, but it’s within a single process, and the only semi-hard part is keeping it stopped when you detach, which shouldn’t actually be hard…

This way of doing the thing seems to fit well within the lldb toolset we’ve developed so far. And this method seems much more robust long-term, AND fits well with the client model in lldb.

So it seems to me that we are pretty close to a way to handle multiprocess debugging naturally in the lldb/lldb-server ecosystem already. The question then is, is it worth taking something that we can do pretty simply and make it take a lot more work, and add considerable complexity and awkwardness into a really central part of lldb, just so that we can support the gdb remote protocol multiprocess extension?

If we have a lldb-server implementation on the platform, I don’t see the motivation for doing a lot of work so we can ALSO support a fancy mode of gdbserver on that platform. Our hands should be full making lldb-server work great. But I don’t think that there are that many systems that we care about that have a port of gdbserver with support for the multiprocess extensions but not a port of lldb-server. My guess is that even if there are a few, the work to port lldb-server to them is substantially less than the work to get multiplexing gdb-remote servers working in lldb-server and then working well in lldb. The former is work we would want to do, whereas the latter would be disruptive and I don’t see it having benefit on its own.

Given that I don’t think having one debug stub manage all your processes is even a particularly great idea, I don’t see this as a good solution technically. And so doing a whole lot of work to support it when we could do the same thing straightforwardly with the tools we have doesn’t seem like the right direction.

2 Likes

Yes, those would be tricky. Which is why we’ve come up with the non-stop mode idea. In non-stop, you have a mostly-clear line of communication to the other side regardless of what any particular process/thread is doing. You do have the opposite problem in that you now need to simulate an all-stop whenever one thread in the process stops, but this is hopefully easier than the opposite.

BTW, we already have code which temporarily interrupts a single process when we want to send some asynchronous packet. These dances are unnecessary in non-stop mode.

Agreed. Noone is proposing to take that away. Even gdb supports that mode. :slight_smile:

I can’t speak about debugserver, but as far as lldb-server goes, this is the least of my concerns. We’ve had a fair amount of problems in the beginning, but all of those have gone away when we’ve switched to a single-threaded implementation. And I’d definitely want to keep lldb-server single threaded regardless of the number of processes it is debugging.

Keeping the process stopped is relatively easy – you can have it enter the SIGSTOP state. Such a transition would normally be observable by the parent of the process, but since here we have the parent suspended as well, I believe it wouldn’t notice it. The tricky part actually is ensuring the you can reattach to the process from the new server. These days, the default linux setup is that you’re not allowed to attach (launching is fine) to any processes (well, except your children), unless they explicitly allow it. So it could happen that we detach from the process, and then find that we cannot reattach.

Of course, since we have control of the process in first server, we could trick it into allowing the attach operation. This would amount to evaluating an expression like prctl(PR_SET_PTRACER, pid_of_other_server), but that is not completely trivial to do from a server, especially while pretending to not support debugging multiple processes. That’s why, even if we go with the multi-connection model, I would actually try to stick with a single lldb-server instance, and just have a dedicated connection for controlling each process.

Which brings me to my second point, that even simply setting up multiple connections is not as easy as it may sound at first. Our existing platform-server duality suffers from all of the problems that have plagued FTP since the beginning. Having separate connections is nice, and it usually works for local connections and simple network setups, but everything falls apart as soon as you throw firewalls and NATs into the mix. We regularly get questions from people about how to set up port forwards to make the platform connections work. This is why I am not particularly keen on establishing multiple connections, even though this would be a much cleaner solution on several levels.

I would say this is true for “regular” platforms, but there are situations where this logic cannot apply. You cannot e.g. “port” lldb-server to qemu, because the implementation is an integral part of qemu proper. One could make qemu support the lldb way of doing things, but the first question will be “why don’t you do it like gdb?” (as much as we may want otherwise, gdb is still the gold standard in the non-darwin world). Admittedly, qemu is not an ideal example (maybe Michał has better reasons for wanting to support this?) as it does not actually support debugging multiple processes (it barely supports multiple threads), but if it ever does support it, I’m sure it would want to go with the gdb method.

For me, the worst case scenario in inventing our own solution is that some time later, someone will come along with a need to support the “standard” way of doing things, and so we end up needing to support both. This is less likely to happen for multiprocess debugging than for e.g. register descriptions (which has happened already), but it’s still possible…

I think Jim’s point was not that we’re worried that it would go away, but rather than we already have an existing solution for (part of) this problem.

Why is this a problem if we use multiple connections over the same port? Isn’t that what we do for lldb-server in platform mode?

I strongly agree that we don’t want to come up with our own solution. But it sounds like we already have a bunch of components to make this work in a way that fits lldb’s current abstraction.

I’m mostly asking because I want to gain a better understanding, not because I want to push for a particular approach. The way I’m (maybe incorrectly) reading this is that for one approach we’d need to do a lot of work to support a model that doesn’t fit all that well in LLDB while the other could be done with less intrusive changes to the server but wouldn’t fit the way GDB does this.

I think it boils down to whether the goal of this RFC is to have a solution for follow-on-fork in LLDB or about supporting gdb’s multiprocess extensions. If this is about the latter then I assume this discussion is pretty much moot.

| labath
March 24 |

  • | - |

jingham:

Supporting it in lldb will be ugly, and involve “I want to set a breakpoint in process A so now I have to interrupt both processes to do so” and other such behavior oddities we would need to paper over.

Yes, those would be tricky. Which is why we’ve come up with the non-stop mode idea. In non-stop, you have a mostly-clear line of communication to the other side regardless of what any particular process/thread is doing. You do have the opposite problem in that you now need to simulate an all-stop whenever one thread in the process stops, but this is hopefully easier than the opposite.

BTW, we already have code which temporarily interrupts a single process when we want to send some asynchronous packet. These dances are unnecessary in non-stop mode.

jingham:

For 1) we already have lldb-server in server mode to do that job for us, I can’t see why we would do it through a single server. There’s really no advantage in this case.

Agreed. Noone is proposing to take that away. Even gdb supports that mode. :slight_smile:

jingham:

debug stubs tend to exercise many dark corners of their host OS-es and are prone to stalls and other bad behavior.

I can’t speak about debugserver, but as far as lldb-server goes, this is the least of my concerns. We’ve had a fair amount of problems in the beginning, but all of those have gone away when we’ve switched to a single-threaded implementation. And I’d definitely want to keep lldb-server single threaded regardless of the number of processes it is debugging.

I’m not so much worried about lldb-server code as system code we have to call into. A slightly ill system is one you really want the debugger to work on, but is also one that will start getting flakey when you ask it questions… So keeping the stub as simple as you can make it seems like a good practice.

Also, a single threaded lldb server that is going to have to service multiple processes simultaneously will be tricky. You better never call anything that might take the kernel a while to figure out, or you’re going to end up with stutter-y behavior in the response of the processes. That’s not terrible when you are serving a single process, since you are fetching the data it is waiting for. But in the multiprocess mode, each such request blocks all the other processes.

jingham:

However, lldb-server in platform mode already knows how to make a lldb-server in server mode and attach it to something, and the platform code is sitting around waiting to be used. It does not seem to me particularly hard to get lldb-server in server mode to access that code and make a new lldb-server, attach to the forked child, and send an asynchronous gdb-remote packet with the port of the new lldb-server. You would still have to do a handoff, but it’s within a single process, and the only semi-hard part is keeping it stopped when you detach, which shouldn’t actually be hard…

Keeping the process stopped is relatively easy – you can have it enter the SIGSTOP state. Such a transition would normally be observable by the parent of the process, but since here we have the parent suspended as well, I believe it wouldn’t notice it. The tricky part actually is ensuring the you can reattach to the process from the new server. These days, the default linux setup is that you’re not allowed to attach (launching is fine) to any processes (well, except your children), unless they explicitly allow it. So it could happen that we detach from the process, and then find that we cannot reattach.

Of course, since we have control of the process in first server, we could trick it into allowing the attach operation. This would amount to evaluating an expression like prctl(PR_SET_PTRACER, pid_of_other_server), but that is not completely trivial to do from a server, especially while pretending to not support debugging multiple processes. That’s why, even if we go with the multi-connection model, I would actually try to stick with a single lldb-server instance, and just have a dedicated connection for controlling each process.

Which brings me to my second point, that even simply setting up multiple connections is not as easy as it may sound at first. Our existing platform-server duality suffers from all of the problems that have plagued FTP since the beginning. Having separate connections is nice, and it usually works for local connections and simple network setups, but everything falls apart as soon as you throw firewalls and NATs into the mix. We regularly get questions from people about how to set up port forwards to make the platform connections work. This is why I am not particularly keen on establishing multiple connections, even though this would be a much cleaner solution on several levels.

This still seems to me a problem that we can solve in much less time and as a separate non-disruptive piece of work, by writing some utilities that help people set up the necessary port forwarding in these more complex scenarios. If we can’t get any platform connections then we can’t do anything. If we can get one, getting two shouldn’t be intractable, and helping people out in this area would be a good in its own right.

jingham:

If we have a lldb-server implementation on the platform, I don’t see the motivation for doing a lot of work so we can ALSO support a fancy mode of gdbserver on that platform. Our hands should be full making lldb-server work great. But I don’t think that there are that many systems that we care about that have a port of gdbserver with support for the multiprocess extensions but not a port of lldb-server. My guess is that even if there are a few, the work to port lldb-server to them is substantially less than the work to get multiplexing gdb-remote servers working in lldb-server and then working well in lldb. The former is work we would want to do, whereas the latter would be disruptive and I don’t see it having benefit on its own.

I would say this is true for “regular” platforms, but there are situations where this logic cannot apply. You cannot e.g. “port” lldb-server to qemu, because the implementation is an integral part of qemu proper. One could make qemu support the lldb way of doing things, but the first question will be “why don’t you do it like gdb?” (as much as we may want otherwise, gdb is still the gold standard in the non-darwin world). Admittedly, qemu is not an ideal example (maybe Michał has better reasons for wanting to support this?) as it does not actually support debugging multiple processes (it barely supports multiple threads), but if it ever does support it, I’m sure it would want to go with the gdb method.

IME debugging on platforms that support real multiple processes almost always involves lldb-server being one of those processes. Anything that has a fixed gdb-remote stub is unlikely to be in this category.

For me, the worst case scenario in inventing our own solution is that some time later, someone will come along with a need to support the “standard” way of doing things, and so we end up needing to support both. This is less likely to happen for multiprocess debugging than for e.g. register descriptions (which has happened already), but it’s still possible…

If we ever get to this point, we can add a mode to lldb-server where it serves as the multiplexer for the underlying gdb-remote stub. lldb on the host would talk to the various ports handed out by the multi-plexing lldb-server it created on the host to communicate with the multi-process protocol speaking stub from wherever. You were saying earlier that you were planning on having lldb-server multiplex outbound connections even if we decide to go the one connection per ProcessGDBRemote route anyway to avoid the reattach difficulties. If you did it that way, the adaptor that sends packets from the stub to the port of the process they pertain to, and adding pid markers to packets coming from lldb should be pretty straightforward.

Jim

Multiple connections over the same port are great, but that’s not what happens when you’re starting a debug session over lldb-server-platform (qLaunchGDBServer). Here lldb-platform launches a new lldb-gdb-server (or debugserver) process, listening on some port, and then sends back that port in the response.

I don’t think we have as many readily available components as you might think we have, but overall, I think we can agree that implementing the custom multi-connection solution would be (substantially?) simpler than supporting the gdb multiprocess method.

The trickiest problem there becomes the reliable establishment of the second connection. Drawing inspiration from the “multiple connections over the same port” observation, I think we can come up with something that should work most of the time, but it’s going to depend on the method that we used to establish the initial connection:

  • if the initial connection uses a (unix) socket pair (this is what we use for local connection nowadays), then we can create another socketpair(2) and use the SCM_RIGHTS ancillary message to send one of the descriptors over the connection
  • for forward TCP connections (client connects to server), we can have the client establish another connection to the same port and handle debugging of the fork child there. The server would need to keep the listening socket open for the duration of the debug session, to ensure the port remains available.
  • for reverse TCP connections (server connects to client) we would do the same thing, only with the roles reversed

I’ll let Michał answer that.

The two are kind of related. If the app is single-threaded, then the kernel access patterns will be more predictable (and single-threaded) as well. The main problem with the original multi-threaded implementation was that we’d the kernel would be sometimes returning bogus data if two threads were performing debug operations concurrently.

This is true, but given the difficulty in transferring ownership to a different process (actually even just different thread, as linux ptrace expects all actions to come from a single thread), I’d be willing to take that chance. As this should not be observable from the outside, we could leave the choice up to the individual platform.

I would disagree here. The second connection presents a fundamentally new problem, because (unlike the initial connection) it is supposed to happen automatically behind the users back, ideally without him even being aware of its existence. We can try to make some tools to automate the common scenarios, but it’s going to be hard to make a definitive solution, as it’s going to really depend on the exact networking setup.

I’ll draw attention to the fact that our “forward” and “reverse” connection modes are very similar to FTP’s “active” and “passive” connections – and neither of those was sufficient to save it. At one point, you even had routers inspect and/or rewrite FTP control connection data to make it work seamlessly (I doubt we’ll get that level of support). Modern protocols try hard to avoid using multiple connections (or at least multiple ports) for their work.

This is how it works in system-mode (emulating the entire stack down to hardware level) qemu. You can run lldb-server as a regular process there. There’s also a gdb stub which provides system level debugging, and it has no notion of processes.

OTOH in user-mode qemu (emulated userspace on top of host kernel) there definitely are real processes. While you could in theory run an emulated lldb-server there, doing it would require reimplementing the entire ptrace API in userspace, as the (host) kernel APIs would return the data for the host process. Instead you have a built-in stub, which has direct access to the emulator data structures, and can present the state of the emulated user process.

That would work, but it would still mean implementing two ways of doing things. And doing it in lldb-server would be harder, since we would have to parse the packets twice (at least to figure out how to route the packets coming from the stub, and possibly those from lldb as well.

The multiprocess extensions don’t just prefix the packet with the pid. In a way, they treat the entire world as one giant process with many threads, except the thread IDs become pairs, so you can reconstruct individual processes from that.

Doing this stuff in the client would be easier (though definitely not trivial) as it can operate on a slightly higher level.

First of all, I’m sorry for not replying earlier. I have to admit I’ve been defeated by Discourse once again, and somehow managed to make email notifications stop working.

Secondly, I’d like to thank you all for your feedback, and especially @labath for answering many of the questions directed at my proposal. In fact, he probably did it better than I would have been able to.

@JDevlieghere, to be honest, I was primarily hoping for ideas around implementing GDB-compatible multiprocess. I don’t want to discard other ideas outright but I still think a solution as close to GDB as possible is the way to go, as I’ll try to explain below.

@jingham, thank you for all your points. I agree that implementing multiprocess debugging on top of multiple connections would probably be easier. However, at this point we’re prepared to invest more effort for a harder but more future-proof solution.

I really do think that proceed with the GDB-compatible multiprocess extension approach is better.

I largely dislike the necessity of establishing multiple connections but I think @labath has covered that point very well. I’d really prefer to avoid having multiprocess support depend on special properties of transport to work. I’d like the multiprocess support to work on top on any primitive transport that isn’t capable of establishing multiple connections.

To achieve that, we need to multiplex packets over a single connection. And if we’re already multiplexing stuff, why not follow the de facto standard and use the gdb-remote protocol way of doing it?

Even if it’s more work and more code, I think we’ll have to eventually implement it anyway for compatibility with gdbserver. And then, I think it’s better to use it for lldb-server as well, rather than having to maintain and test two different solutions to the same problem.

We’ve already put some effort into improving compatibility with GDB in the past, including implementing a small subset of multiprocess extension. I think proceeding with furthering the protocol support is a natural continuation of that work.

I must admit, I’m still not convinced this is the right way to do this, OTOH, I don’t have the time to implement this “the right way” instead, and it’s not an unmotivated or unworkable method, so in this case the “implementor wins on strategy” rule seems the remaining determining factor.

I look forward (with a little trepidation) to your progress on this task.

FWIW the single-gdbserver approach is better for people using rr as their gdbserver. Spinning up a separate gdbserver per debuggee process would mean either spinning up a separate rr replay per debuggee process and trying to keep them all in sync — which would be costly and nasty, or exposing multiple gdbservers all interacting with a single replay — which would be lots of work and ultimately doesn’t make sense, because rr can’t execute one process independently of another, execution always moves the whole set of processes forwards (or backwards!) together though the replay.