[RFC][LLDB] A proposal to refactor Platform

bulbazord · October 23, 2024, 12:30am

LLDB Platform Refactor RFC

Authors: Alex Langford, Jonas Devlieghere, Ismail Bennani, Jim Ingham

This RFC proposes breaking up LLDB’s Platform into different abstractions: PlatformInfo and Device. It is motivated by specific weaknesses of Platform, both functional and ergonomic. Additionally, it proposes a new abstraction (DeviceProvider) for device discovery.

Motivation

To make it easier to debug devices from the command line, LLDB in Xcode 16 now integrates with CoreDevice. CoreDevice is a framework for interacting with remote devices. Its features include listing connected devices, listing processes on a device, and attaching to a process with debugserver. While integrating CoreDevice, we hit a few limitations of the current Platform abstraction. Specifically, Platform represents both how to connect to a device (i.e. over gdb-remote) and how to debug on particular device (e.g. what OS version it’s running). Adding a new way of talking to a device (i.e. using CoreDevice) would require duplicating all existing supported platforms. This leads to an unsustainable combinatorial explosion of implementations and makes it harder to add new platforms in the future.

In addition to the problem of Platform representing two concepts, it has additional shortcomings:

Talking to more than one device in platform mode is not really supported. Each platform owns its own connection information, so one Platform instance cannot connect to multiple devices at the same time. To talk to two Linux machines with Platforms, you would therefore need two platform objects. From the command line, this is not possible today. Running platform select remote-linux and platform connect will certainly connect you to a remote linux machine, but any sequential platform connect commands would terminate the previous connection. Creating additional remote-linux platforms is also not possible, because the debugger creates one Platform instance for each kind of platform. The only possible way is through the SBAPI: Create multiple SBPlatforminstances and switch between them with SBDebugger::SetSelectedPlatform.
Platform implementations are more complex than they need to be. The Platform class has a large number of methods, some of which are confusing or have unclear overlap. There are functions that are duplicated to support remote variants, e.g. GetOSBuildString vs GetRemoteOSBuildString. There is also a Host platform which does not behave like a remote platform. Instead of having methods overridden like the other platforms, there is bespoke logic in the base class to check if the current platform is considered the host platform.
Some platforms support device discovery, e.g. Adb for Android (adb devices) or CoreDevice for Apple devices (devicectl list devices). Regardless of how this functionality is implemented, Platforms are a natural integration point. This functionality would be very valuable for workflows that involve connecting to and debugging on devices repeatedly. Existing workflows frequently include setting up tunnels manually and copying around files individually. Platform has support for moving files around over SSH and Rsync, but those also may require involved setups.

The Proposal

Internal LLDB Architecture

The responsibilities of Platform can be broken down into two distinct categories:

How do we debug a given platform? The answer to this question doesn’t require you to have a concrete device running that platform: it is known statically. An example of this is the library file extension on that platform. We call this PlatformInfo.
How do we debug a particular platform? The answer to this question does require you to have a concrete device. You need to know how to talk to it and how to ask it query it for information: it is known dynamically. An example of this information is the version of the operating system that’s running on the device. We call this Device.

PlatformInfo

PlatformInfo provides static information about the platform such as Platform::GetFullNameForDylib which depending on the platform returns libfoo.dylib or libfoo.so or foo.dll .

Some of this static information might be host-specific. For example: Platform::GetSDKDirectory which attempts to find a directory on the host machine containing SDK information for said platform. Moving that complexity into the host platform is outside the scope of this RFC.

Device

Device represents a specific device, like a remote iOS device or a remote Windows machine. It will own its own connection information and supports the device and connection-specific information that Platform currently owns, such as interacting with the file system or processes. Devices will be responsible for dynamic knowledge, such as retrieving an OS version or Build ID for a remote device.

A useful consequence of a device owning its own connection means that LLDB can talk to devices using things other than the gdb-remote protocol. For example, CoreDevice can perform many of the operations that the gdb-remote platform packets can do.

DeviceProvider

DeviceProvider is the abstraction that allows LLDB to discover and communicate with devices. It will use LLDB’s plugin mechanism to provide this functionality conditionally. Concretely, on Apple platforms, this will be powered by CoreDevice. Some other platforms, such as Android, could use technologies like ADB. Some platforms will have no DeviceProvider and connections to devices can only be formed using a URL.
For example, our fork currently supports a device list command that shows all devices that CoreDevice knows about.

(lldb) device list
Name          Identifier                           State       Configuration
------------- ------------------------------------ ----------- ------------------
Alex’s iPhone 57D7F4E1-3B3D-4CEE-8A5F-4E6287A4BC2C available   iOS 18.0.1
iPad          D41398E1-841E-420D-AB7F-72726923A601 unavailable iOS 18.0

Changes to existing architecture

Platforms will no longer represent both the knowledge of a platform and abilities to interact with a device. The device-specific functionality will live in the new Device class. To obtain a device for a given platform, Platform will have a new overload of Platform::Connect that takes some identifier for a device and returns a handle for that device. To illustrate:

lldb_private::DeviceSP Platform::Connect(lldb_private::DeviceIdentifier device_id);

How the identifier is discovered will be addressed later. Internally, the platform will talk to DeviceProviders to form that device connection. For platforms with no existing DeviceProviders, a connection URL will need to be supplied instead.

Targets currently hold onto a Platform instance and will continue to do so. They will also optionally contain a handle to a Device. Where Target uses Platform today, it will now use Platform, Device, or a combination of the two to accomplish the same task.

The existing host platform would be treated a little differently. Specifically, the host device would be created automatically at start up and always exist. From an interface perspective, the host device will be like any other device, but implementation-wise it will talk to the local filesystem instead of talking over the network.

SBAPI

The existing SBPlatform functionality will continue to work as-is to preserve API compatibility. This will be achieved by associating each SBPlatform instance with a single Device internally.

In addition, we propose introducing a new SBDevice class to correspond to the new Device abstraction. SBDevices would be created by a new method on SBPlatform. New clients would be encouraged to use a combination of SBPlatform and SBDevice to debug instead of the existing SBPlatform functionality.

Commands

Similar to the SBAPI, we propose maintaining existing command-line functionality without breaking compatibility. To support distinguishing between different devices, we also propose adding a new device subcommand to the top-level platform command. Examples of new commands:

platform device list would list the devices that a platform is aware of. For example, if your selected platform was remote-android, this command would list every device that the ADB DeviceProvider can find as well as any devices formed from URL connections.
platform device connect would connect to a device with some kind of identifier.

After that, interacting with specific devices would involve a top-level device command. Some examples:

device list would list all connected devices, regardless of platform.
device select $ID would select a specific device ID.
device [--device-id $ID] process list would list every process on the selected device or with from the device with id $ID.
device --device-id 2 get-file /tmp/some-file.txt would transfer the file /tmp/some-file.txt off device with ID 2.

ashgti · October 25, 2024, 6:23pm

How does this the existing ios-simulator platform? (or other *-simulator platforms).

Today when you select an ios-simulator you can connect to a specific simulator.

$ lldb
(lldb) platform select ios-simulator
  Platform: ios-simulator
    Triple: arm64-apple-macosx
OS Version: 14.7 (23H124)
  Hostname: 127.0.0.1
    Kernel: Darwin Kernel Version 23.6.0: Wed Jul 31 20:49:39 PDT 2024; root:xnu-10063.141.1.700.5~1/RELEASE_ARM64_T6000
  SDK Path: "/Applications/Xcode_16.0.app/Contents/Developer/Platforms/iPhoneSimulator.platform/Developer/SDKs/iPhoneSimulator18.0.sdk"
Available devices:
   69F9C664-90E2-4392-8F37-41C89F45A6EC: iPhone 16
No current device is selected, "platform connect <ARG>" where <ARG> is a device UDID or a device name to connect to a specific device.

It sounds like this might be useful for improving this as well, would this be something that could be integrated with the device infrastructure?

This sounds like the process information should be coming from the Device now instead of the Platform, or would process management be a combination of the two?

Would it be up to the Device implementation to advertise what is supported? Maybe we need a set of capabilities the device can advertise are supported? e.g. if an embedded system doesn’t have a file system then device get-file might not be supported.

DavidSpickett · October 28, 2024, 11:35am

remote-linux vs. qemu-user is an example of that to some extent. An actual implementation might have to blur the lines some because there are assumptions you can make on a full system that you can’t on the emulated one e.g. finding program files in certain paths, or trying to list PIDs.

The qemu-user platform also launches it for you. I guess you’d put this in the “how to talk to” part of things? “launch some simulator first” would be the generic version of “launch qemu-user first”.

The qemu-user platform is either the best example here or the worst

Is way of talking the “medium” here? As in:

TCP/IP
JTAG
OS level pipe
and so on

Does “launch a thing first then…” get included in the “way of talking”? I think Android does some port mapping maybe that would be another example of that.

labath · October 28, 2024, 2:35pm

We’ve discussed this at the llvm dev meeting, but for everyone’s benefit, and in order to continue the discussion, I’m going to repeat what I’ve said there.

I’m generally supportive of this proposal, but the thing I’d like to make sure is to have a way (or at least leave room for it) to create and configure these devices “out of thin air” (in addition to just enumerating existing ones). Let me illustrate what I mean with three examples:

Emulator platforms, e.g. “qemu-user”.

Qemu emulates processes by translating/emulating their instructions and passing the system calls to the host kernel. All of this happens within the context of the emulator (qemu) process – which exists only as long as the emulated (guest) process is running. This means there’s nothing to “enumerate” until we run the process.

Nonetheless, if I understand this proposal correctly, I think it would be natural to represent the emulated environment as a “device”: it has it’s own way of starting a debug session; and it has its own way of finding libraries (usually in a subdirectory of the host system).

The main difference is that this information does not exist in (cannot be determined from) the outside world. The properties of the emulated “device” are given by how we choose to launch the emulator.

Remote platforms, e.g. “remote-linux”

Unlike emulated platforms, remote linux machines definitely /exist/ in the outside world. However, we generally aren’t able to enumerate them. The user somehow has to “know” that a specific host has an lldb-server running. The user also has to “know” where are the local copies of the binaries from that host stored (if at all). There’s no central repository for “all sysroots of all linux distributions”.

cloud debugging

Google has an internal cloud test infrastructure, which can be (with some difficulty) used to debug binaries. The process involves talking to various servers, forwarding ports and whatnot, but at the end of the day, you get a gdb-remote connection to an lldb-server on the test machine. This also sounds like a good candidate for a “device”. Unlike the previous examples, this one could be enumerated (because of operating in a centralized environment). I don’t know if we’ll ever want to go through the trouble of exposing this as an lldb “device”, but I’m mentioning this because I think it’s an interesting use case and I think it’d be nice if there was upstream support for the creation of something like this.

The interesting part about this is that all these use cases represent different ways of talking to a linux machine (emulated, remote, cloud, etc.) They all have the same signal numbers, library suffixes, etc. This – separation of “information about a platform” and “how to talk to a ‘device’” is the part I like about this proposal.

The part that I’m not sure about – mainly because the proposal is somewhat vague on details – is how will the interaction between platforms and devices work. The mention of Platform::Connect makes it sound like the Platform class will be in the driver’s seat, which sounds somewhat problematic. As the examples above illustrate, there are countless ways (some of which will never be upstreamed) that one can use to connect to a linux “device”, so I don’t think we can expect that PlatformLinux will know about all of them. It sounds to my like it would be better if a specific device (device provider?) class was in charge of creating the device. Then a downstream implementation (or even just an upstream plugin) could create it in any way it deems fit. I’m sure there’s a reason why you’re proposing it this way (and I’d like to understand what it is), but given that a platform isn’t supposed to know anything about a specific device, the notion of Platform::Connect seems a bit out of place.

The other aspect of this is the configurability. In decentralized environments (like linux), enumerating automatically configured devices is going to be rare. The more common/useful use case will be creating a device and configuring it, so I think we should leave some room for that. For a qemu-user “device”, configuring it might consist of setting the path to the emulator, telling it which (sub)architecture to emulate and setting some environment variables (library path) so that the process can find its shared libraries. Currently this is done through lldb settings, but that is not ideal because they are global. Ideally these should be set on a per-device basis, so that we can (e.g.) create one qemu “device” emulating arm, and another for risc-v. I don’t know whether that should be through an lldb command like “device settings” or through (per-device) lldb settings (settings set device.???). What I do know is that I don’t think this should be done through URL-encoding the arguments (even know this “configuration” sounds a bit like “connecting”, and some sort of a “cloud device” might actually end up passing these settings through URLs), because the resulting ergonomics will be bad.

bulbazord · October 31, 2024, 8:28pm

Yes, this would affect the simulator platforms as well. Our goal is to remove functionality, so this should work the same or better.

It would be a combination of the two in general. Some operations will only need a device, some will only need a Platform, some will need both.

The way I was thinking about it, a Device would have to advertise what’s supported or otherwise handle unsupported operations gracefully.

I think from an architecture standpoint, qemu-user doesn’t make a ton of sense as a platform with our proposal. You can run lots of things under qemu, so there will be overlap with existing platforms. I don’t know enough about the qemu-user platform to definitely say what we plan to do with it, but we do want to continue supporting the ability for lldb to launch qemu instances for you.

I was thinking more like gdb-remote protocol, which sits on top of TCP/IP. Maybe JTAG would also fit into there? Either way, existing platforms heavily assume you will be talking over the gdb-remote protocol to achieve platform operations, which CoreDevice does not do.

I don’t think I did a good job of capturing this detail but this is how we would like to architect things as well. DeviceProviders create the Devices, downstream you could create any kind of DeviceProvider you want.

That’s fair. I think we can iterate on exactly how LLDB connects to a device, DeviceProvider will definitely play an instrumental role here.

Topic		Replies	Views
Target and platform clarification request LLDB	11	189	July 4, 2014
[RFC] lldb integration with (user mode) qemu LLDB	15	590	March 11, 2022
Multiple platforms with the same name LLDB	15	830	February 23, 2022
LLDB for Android initiative LLDB	31	436	February 5, 2014
Remote debugging with lldb LLDB	14	413	September 7, 2011