RFC: Interactive kernel/user-space co-debugging with Scripted Processes

Debugging interactions between kernel space and user space processes can be a difficult task. However, for many kernel engineers and low-level programmers, this workflow could greatly improve their debugging experience.

Thanks to the new Scripted Process infrastructure, we can solve this issue and make interactive co-debugging between a kernel process and multiple user-space processes a reality.

Before diving into the approach we’re considering, here is a quick reminder of what a Scripted Process is.

Scripted Process Refresher

As you may know, LLDB has many scriptable extension points where users can provide a python class or function to enhance their debugging session. Some examples of that are Data Formatters, Summary Providers and Stackframe Recognizers.

OS Threads Plugin are another good example of scripting affordance in LLDB that lets the user create threads using a python script file on a stopped process. Scripted Processes extends that capability even further.

Scripted Processes allow the user to synthesize a process in LLDB from a script file. To achieve that, they rely on 2 components:

The script file

It’s a user-provided python script that implements a class derived from the base Scripted Process class.

That class should conform to the Scripted Process Interface and implement the various methods required to instantiate the process in LLDB. The class should provide a way to simulate a launch and/or an attach, read memory, fetch the thread information and get the list of loaded images.

Similarly, a Scripted Process can create Scripted Thread objects. They will describe each thread (TID, name, state, stop reason) and provide the register context. Scripted Threads works similarly to the OS Threads Plugin.

The Scripted Process process plugin

The ScriptedProcess Process plugin is actually part of LLDB. It will be calling into the script file and reconstruct the process to allow the user to interact with it like any other process in LLDB, to see sources, unwind the stack frame, inspect variables and so on.

Scripted processes can already be loaded into LLDB as standalone processes. For now the feature is limited to a static view of the process, e.g. in the context of a core file, crashlog or process that’s halted and cannot be resumed. However, we think we can greatly improve the debugging experience with scripted processes by tying them with another real process that will drive their execution.

Proposed Approach

Our goal here is to have the kernel process provide a view of the user-space processes running on the system, while debugging the kernel, and to be able to interact with them as if they were real processes (i.e. step a thread, set a breakpoint, continue, etc).

                                             +---------------------------------------+
                                             | Scripted Processes                    |
                                             |          +------------------------+   |
                                             |     +--->|  User-space Process 1  |   |
                                             |     |    +------------------------+   |
                                             |     |                                 |
+------------------+        +------------+   |     |    +------------------------+   |
|  Kernel Process  |<------>|  Debugger  |<--+-----+--->|  User-space Process 2  |   |
+------------------+        +------------+   |     |    +------------------------+   |
                                             |     |                                 |
                                             |     |    +------------------------+   |
                                             |     +--->|  User-space Process 3  |   |
                                             |          +------------------------+   |
                                             |                                       |
                                             +---------------------------------------+

As mentioned earlier, in order to have interactive debugging capabilities in scripted processes, their execution will be handled by a driving process (the kernel process in our previous example).

But first, before even creating the user-space scripted processes, we need a way to fetch the list of running processes in the kernel to be able to choose which one we will attach to. For that, we will introduce a new Scripted Platform that will be in charge of listing and creating the user-space Scripted Processes. Similarly to Scripted Processes, this platform needs to have user-defined python callbacks to list and create the scripted process, since there could be different ways to perform these tasks depending on the nature of the underlying system.

To create the user-space scripted processes, the user will need to interrupt the driving process execution. From there, the user will be able to request the process list from the platform and have LLDB read the driving process memory to fetch the metadata required to synthesize the user-space scripted processes. Once all the scripted processes are created, we can try to set a breakpoint on one of them and continue the execution.

This already causes some problems because the user-space process might be using virtual memory instead of the kernel physical memory , which would imply that they are on different address spaces. To be able to write that trap instruction for the user-space process breakpoint site in the kernel memory, we need an address translation layer (ATL) between the user-space and kernel-space processes. Because we cannot anticipate the heuristic used in the ATL, it needs to be a pluggable component and user-defined, so we also need to expose a new python API for this purpose.

                           +------------+
         +---------------->|  Debugger  |<------------------------------+
         |                 +------------+                               v
         |                                       +---------------------------------------+
         |                                       | Scripted Processes                    |
         |                                       |          +------------------------+   |
         |                                       |     +--->|  User-space Process 1  |   |
         |                                       |     |    +------------------------+   |
         v                +---------------+      |     |                                 |
+------------------+      |    Address    |      |     |    +------------------------+   |
|  Kernel Process  +----->|  Translation  |<-----+-----+--->|  User-space Process 2  |   |
+------------------+      |  Layer (ATL)  |      |     |    +------------------------+   |
                          +---------------+      |     |                                 |
                                                 |     |    +------------------------+   |
                                                 |     +--->|  User-space Process 3  |   |
                                                 |          +------------------------+   |
                                                 |                                       |
                                                 +---------------------------------------+

If we take a closer look, every scripted process that is wrapping the user-space process will be doing the same thing:

  1. The scripted process will read memory from the driving process
  2. The ATL will do the address translation and perform the read bytes
  3. With that memory, the scripted process will construct the expected object for that specific call and return it to LLDB

The user-space scripted processes essentially act as a passthrough process, which implies that they could all share the same python implementation. But how can the ATL distinguish which process should map to which address space ? We can take advantage of the scripted process instantiation in the Scripted Platform to assign each process with a unique identifier. Then, by embedding the ATL in the Scripted Platform, we should be able to match the scripted process ID to a specific address space when performing memory reads.

That means however that every actions performed on the Scripted Process need to the pass through the Scripted Platform.
That’s not an issue: we can create a new C++ process plugin derived from the Scripted Process Plugin. This new passthrough class should have a reference to the Scripted Platform and a reference to its platform identifier (passed in the launch info dictionary). Then, for every call to that Passthrough Scripted Process, the process plugin will actually pass down the call the Scripted Platform including the process identifier. This suggests that the Scripted Platform should also hold a reference / conform to the Scripted Process / Thread Interface.

                          +-----------------------------+
                          |         Passthrough         |
                     +--->|   Scripted Process Plugin   |
                     |    +--------------+--------------+
                     |                   |
                     |                   |
                     |                   v
   +------------+    |    +-----------------------------+    +---------------------------------------+
   |  Debugger  |<---+    |                             |    | Scripted Processes                    |
   +------------+    |    |      Scripted Platform      |    |          +------------------------+   |
          ^          |    |              +              |    |     +--->|  User-space Process 1  |   |
          |          |    |   Address Traslation Layer  |    |     |    +------------------------+   |
          |          |    |                             |    |     |                                 |
          |          |    | --------------------------- |    |     |    +------------------------+   |
          |          +--->|  List/Create Processes      | <--+-----+--->|  User-space Process 2  |   |
          |               | --------------------------- |    |     |    +------------------------+   |
          v               |  Scripted Process Interface |    |     |                                 |
+------------------+      | --------------------------- |    |     |    +------------------------+   |
|  Kernel Process  |      |  Scripted  Thread Interface |    |     +--->|  User-space Process 3  |   |
+------------------+      | --------------------------- |    |          +------------------------+   |
                          |                             |    |                                       |
                          +-----------------------------+    +---------------------------------------+

This design has the advantage that all the scripted process python implementation will be the same: they should all read memory from the driving process passing through the ATL and return the expected return object to the Scripted Platform.

Once all of that is implemented, we should be able set a breakpoint and continue the user-space process. But, what would happen when the trap is hit ? The exception is handled in LLDB’s debug agent and is reported to LLDB as a public stop event.

However, in our previous example, because the execution is only driven by the kernel process, the stop event will be sent to the kernel process instead of the user-space scripted process.

The main issue here is that only one process can generate events, the ProcessGDBRemote talking to the kernel. However those events could be for any one of the user-space scripted processes, or they could be for the kernel directly. So we need something that will inspect the stop events and dispatch them to the right process.

To address that, we will implement a stop event Multiplexer that will stand between the driving process, the scripted processes and the debugger listeners. When initiating co-debugging, the driving process would forward all its events to the multiplexer. The multiplexer will be able to call into a user-provided python function that will determine which process should the stop event be forwarded to. The Scripted Platform is already conveniently located between the driving process and the scripted processes so it will also implement the multiplexer.


                          +-----------------------------+
                          |         Passthrough         |
                     +--->|   Scripted Process Plugin   |
                     |    +--------------+--------------+
                     |                   |
                     |                   |
   +------------+    |                   v
   |  Debugger  |<---+    +-----------------------------+    +---------------------------------------+
   +------------+    |    |      Scripted Platform      |    | Scripted Processes                    |
                     |    |              +              |    |          +------------------------+   |
                     |    |   Address Traslation Layer  |    |     +--->|  User-space Process 1  |   |
                     +--->|              +              |    |     |    +------------------------+   |
                          |         Multiplexer         |    |     |                                 |
                          |                             |    |     |    +------------------------+   |
         +--------------->| --------------------------- | <--+-----+--->|  User-space Process 2  |   |
         |                |  List/Create Processes      |    |     |    +------------------------+   |
         |                | --------------------------- |    |     |                                 |
+--------+---------+      |  Scripted Process Interface |    |     |    +------------------------+   |
|  Kernel Process  |      | --------------------------- |    |     +--->|  User-space Process 3  |   |
+------------------+      |  Scripted  Thread Interface |    |          +------------------------+   |
                          | --------------------------- |    |                                       |
                          +-----------------------------+    +---------------------------------------+

Now, we have the driving process on one side sending events to the multiplexer that will dispatch them to user-space scripted processes on the other side. But what should we do with the driving process stop events that are also forwarded to the multiplexer ? Because we don’t want to alter the driving process plugin implementation to have it listen and handle the events that it has just sent to the multiplexer, we will also wrap the driving process in a pass-through scripted process.

That will keep us from using another side channel (like SBAPI) to fetch the driving process metadata and like for the user-space scripted processes, the wrapped driving process will listen for stop events from the multiplexer and handle them by itself.

                         +-----------------------------+
                          |         Passthrough         |
                     +--->|   Scripted Process Plugin   |
                     |    +--------------+--------------+    +---------------------------------------+
                     |                   |                   | Scripted Processes                    |
                     |                   |                   |          +------------------+         |
   +------------+    |                   v                   |     +--->|  Kernel Process  |         |
   |  Debugger  |<---+    +-----------------------------+    |     |    +------------------+         |
   +------------+    |    |      Scripted Platform      |    |     |                                 |
                     |    |              +              |    |     |    +------------------------+   |
                     |    |   Address Traslation Layer  |    |     +--->|  User-space Process 1  |   |
                     +--->|              +              |    |     |    +------------------------+   |
                          |         Multiplexer         |    |     |                                 |
                          |                             |    |     |    +------------------------+   |
         +--------------->| --------------------------- | <--+-----+--->|  User-space Process 2  |   |
         |                |  List/Create Processes      |    |     |    +------------------------+   |
         |                | --------------------------- |    |     |                                 |
+--------+---------+      |  Scripted Process Interface |    |     |    +------------------------+   |
|  Kernel Process  |      | --------------------------- |    |     +--->|  User-space Process 3  |   |
+------------------+      |  Scripted  Thread Interface |    |          +------------------------+   |
                          | --------------------------- |    |                                       |
                          +-----------------------------+    +---------------------------------------+

Finally, to avoid some confusion to the user, we want to introduce the notion of “hidden targets”, where some targets can be hidden in lldb, when listing all targets. Of course, the user should be able to list them all with a flag -a|--all or with a boolean setting target.always_show_hidden. This will be used to hide the real driving process target, to avoid any confusion between that and the wrapped one, but also to prevent the user from tempering with it.

This could even be used to only debug user-space scripted processes interactively, as if they were real standalone processes:


                          +-----------------------------+
                          |         Passthrough         |
                     +--->|   Scripted Process Plugin   |
                     |    +--------------+--------------+    +---------------------------------------+
                     |                   |                   | Scripted Processes                    |
                     |                   |                   |          +- - - - - - - - - +         |
   +------------+    |                   v                   |     +--->   Kernel Process            |
   |  Debugger  |<---+    +-----------------------------+    |     |    +- - - - - - - - - +         |
   +------------+    |    |      Scripted Platform      |    |     |                                 |
                     |    |              +              |    |     |    +------------------------+   |
                     |    |   Address Traslation Layer  |    |     +--->|  User-space Process 1  |   |
                     +--->|              +              |    |     |    +------------------------+   |
                          |         Multiplexer         |    |     |                                 |
                          |                             |    |     |    +------------------------+   |
         +--------------->| --------------------------- | <--+-----+--->|  User-space Process 2  |   |
         |                |  List/Create Processes      |    |     |    +------------------------+   |
         |                | --------------------------- |    |     |                                 |
+- - - - + - - - - +      |  Scripted Process Interface |    |     |    +------------------------+   |
   Kernel Process         | --------------------------- |    |     +--->|  User-space Process 3  |   |
+- - - - - - - - - +      |  Scripted  Thread Interface |    |          +------------------------+   |
                          | --------------------------- |    |                                       |
                          +-----------------------------+    +---------------------------------------+

Implementation Breakdown

As presented previously, many new concepts and improvements still need to be added to LLDB to achieve interactive co-debugging. The proposed approach already breaks down the work in various parts, but in order test that, here is the list of tasks that need to be implemented:

  1. We need to ensure that a Scripted Process can act as a pass-through for another real process. It needs to behave like the real process (be able to set breakpoints, continue, step…) and broadcast stop events to the debugger listeners. That will be used to wrap the driving process.
  2. Implement the Scripted Platform to list and create scripted processes.
  3. Implement the multiplexer, in the Scripted Platform, listening to the driving process stop events and broadcasting them to the passthrough and debugger’s event listener.
  4. We need to introduce some filtering logic in the multiplexer for the stop events. This will be used to separate between kernel process events and user-space process events.
  5. Introduce the address translation layer in the multiplexer. This is necessary when reading/writing memory to translate addresses from virtual memory for user-space processes to kernel memory (and vice-versa?)
  6. Add support for multiple scripted process debugging with address translation and breakpoint support. This should get us close to a real life scenario where a user would co-debug a kernel process while interacting with user-space processes at the same time.
  7. We need to make sure that the multiplexer is able to coordinate multiple process events and keep everything in sync.

Request for comments

Before moving forward with this design, I would like to get the community’s input.

What do you think about this approach? Any feedback would be greatly appreciated.

Thanks!

1 Like

This is an interesting feature, but I had trouble reading the RFC to the end. I keep getting lost in all the boxes. I understand the concept of a scripted platform and why it is needed, but I am starting to get confused around the time you introduce the “address translation layer”.

I mean, I understand why address translation is necessary, but it’s not clear to me why it needs to feature so prominently in the design. In the way I would imagine this to work, the address translation could be handled completely inside the scripted process (or platform) plugin, as an implementation detail completely irrelevant to lldb. To make things more concrete, I imagining that the scripted process interface would have a function that get called when lldb wants to physically place a breakpoint somewhere – essentially a way to override Process::EnableBreakpointSite from python. The python function implementing that would do whatever it takes to get the job done. That will likely include translating the breakpoint address to the address space of the real process (presumably by reading the process page tables), and setting a breakpoint there (via real_process.target.BreakpointCreateByAddress ?) – but all of that is completely irrelevant to the lldb code running the scripted process.

Of course, we always can provide some helper class to simplify the address translation task for some common scenarios, but that would be just that – an optional helper that the users can choose to ignore if they so desire.

The way you speak about these things makes me believe this is not the design you had in mind. However, I am not able to figure out what is that design. Could you maybe elaborate on this part? Perhaps by putting some (mock) interfaces to each of these components, so that I can get an idea on how would they interact with each other (for example to accomplish the task of setting a breakpoint).

Hi @labath!

Your explanation is actually exactly what I’m planning to do:

The Scripted Platform python class will have a read_memory_from_process(pid, addr, size) python method that the user will be implement. As you said, the address translation layer is just an implementation detail and doesn’t have anything to do with lldb but I think it’s important to mention it, to understand how this whole system works.

I submitted some patch for review that introduces the Scripted Platform plugin: ⚙ D139250 [lldb] Add ScriptedPlatform python implementation, however the memory read affordance is not implemented yet. It will come in an upcoming patch.

Let me know if there are other things that I need to clarify.

Thanks!

Thanks for the clarification. I’m glad we agree that the ATL thingy is an implementation detail. However, I don’t think we should stop there. I would say that even this read_memory_from_process(pid, addr, size) method should be an implementation detail.
It’s not that I think this is a bad way to implement this functionality. However, I don’t think this is the only possible good way to do that either. I mean, in LLDB, “regular” platforms don’t have anything to do with reading memory from a process. Why should the scripted platforms be any different? Normally, a process class is responsible for reading memory, and we already have a scripted process class, so why don’t we just let it do that? If the user wishes to implement that process functionality by forwarding it to the platform class, then he is free to do that. He just needs to write something like:

class MyProcess:
  def read_memory(self, addr, size):
    return self.platform.read_memory_from_process(self.pid, addr, size)

However, that is completely up to him, and he can choose to do it differently as well. For example, I think that the following implementation could be equally valid:

class MyProcess:
  def read_memory(self, addr, size):
    host_addr = self.platform.get_host_addr(self.pid, addr)
    return self.platform.read_host_memory(host_addr, size)

You may prefer one or the other, but my point is that we don’t have to be the ones deciding that. We just need to provide the right building blocks so that the users can use them to build the thing they need. If we provide simpler building blocks then the there’s less code to maintain for us, and the end users have more flexibility to combine them in new ways.

For example, right now I have zero interest in scripted processes or co-debugging, but the ability to write a simple platform plugin in python might be interesting for me, as I could rewrite our internal platform plugin in python. This is a very ordinary platform and the processes that it creates are completely independent of each other, so the read_memory_from_process functionality just does not make sense there. However, I would still need the ability launch/attach to a process.

in LLDB, “regular” platforms don’t have anything to do with reading memory from a process. Why should the scripted platforms be any different?

In our usecase (co-debugging), all the user-space scripted processes will share the same implementation and every method will require going through the ATL to read the driving process memory. That’s why we want to “centralize” the implementation there, because the scripted platform will be in charge of creating and coordinating all the scripted processes.

We just need to provide the right building blocks so that the users can use them to build the thing they need. If we provide simpler building blocks then the there’s less code to maintain for us, and the end users have more flexibility to combine them in new ways.

I agree, however, the scripted platform would simply call into the user’s scripted platform implementation, so we need to agree how that “interface” should look like. This will save us from having to maintain more code in LLDB.

I think I need to continue working on this to figure out what’s the simplest building blocks that we can provide to the users to implement their scripted platform, but if you have some ideas, I’m happy to hear about them :slight_smile: