[RFC] Fixing incompatibilties of the `x` packet w.r.t. GDB

GDB recently (to my surprise – I thought it already had this) implemented the x packet, but it’s implementation is different from that of lldb. The only difference is that GDB uses a leading b character in the response in order to disambiguate errors and successful (possibly truncated) responses that happen to begin with an E.

This has already caused problems for some clients like RR, which implemented the LLDB version of the packet. GDB maintainers are now in the process of implementing a way to detect which flavour is being used by the server.

Given that we have users which use lldb with servers (e.g., qemu) that primarily (or solely) implement the gdb flavour of the protocol, I think it’s only a matter of time before we run into the same problem – a server which implements the gdb definition stops working with lldb.

I propose that we do not wait for these users to show up, but that we pre-emptively add support for the gdb implementation. I think it’d be best to use the same detection mechanism that gdb is going to use. The two options being discussed are to use a zero-length read to probe the server behavior (lldb uses an OK response to detect support) and a new qSupported feature string. Both of these seem fine to me, but the advantage of doing it now is that we may also be able to influence the gdb implementation if needed. If it is necessary, I can act as an intermediary between the two groups.

In addition to that, I propose that we start the (probably, multi-year) process of migration to the gdb definition of the packet. The reasons for that are twofold:

  • gdb sort of is the official/reference implementation of the protocol (even we define our packets as additions to that).
  • frankly, I think the gdb definition is better

I propose the following process:

  • LLDB version X adds client support for the gdb implementation of the packet (detects the flavour being used and acts accordingly). Ideally X would be the current release (20). It’s somewhat tricky, as it’s being branched approximately now, but there may still be enough time to do a cherry pick. If not, it means we wait 6 months for the next step.
  • LLDB version X+1 changes servers (lldb-server and debugserver) to use the new implementation of the packet. The previous step ensures that they can still work with a slightly older client. If we can get the first step into the current release, then this could be committed to trunk straight away. Otherwise, we wait 6 months.
  • LLDB version X+n removes support for the old lldb implementation of the packet. I’m leaving n unspecified because I don’t know what’s the support window for old servers. Apple probably has the longest support window, so I’d go with whatever @jasonmolenda says. I don’t think the compatibility code will be particularly intrusive, so it can stay for quite some time, but it’d be nice to have a plan to remove it eventually. (I’ll also note that removing compatibility will not completely break the servers – it will just cause us to use less efficient m packets instead.)

Thanks for coordinating this, a colleague from GDB had also pinged me about this.

To elaborate / show my reasoning for agreeing with this.

https://lldb.llvm.org/resources/lldbgdbremote.html#x-binary-memory-read

lldb-server can respond with:

  • <encoded data> - when it was successful
  • OK - when a 0 length read was successful
  • E<error number> - when there was an error
  • <encoded data, that starts with an E> - actually a successful read, but it looks like an error

GDB always adds the prefix b to the data response so you instead get:

  • b<encoded data> - when it was successful
  • b - when a zero length read was successful (I assume, seems logical)
  • E<error number> - when there was an error

Even if the <encoded data> happened to start with b, you’d get bb<rest of data>. Or in other words, the first character of the response is always the type of response, even if that same character appears later in the data.

I think this situation is:

  • lldb sends x
  • lldb-server responds with <something that isn't b><rest of data>
  • lldb sees it as an invalid response

We should add a test case that shows that this invalid response triggers the fallback to m, rather than assuming the remote is entirely broken. Since an invalid response is sort of half way between not supported and is supported, it’s there but it doesn’t work.

We might just give up in that case.

Also this is the commit from @jasonmolenda that introduced x to lldb.

Thanks for digging that up.

Correct. I should add that this is not how it works right now (lldb will not fall back to m if it detects x to fail, but as we said, reliable detection of failure is not currently possible anyway. It’s definitely possible to make it fall back after support for the old/current x style is removed.

I’ve also created [lldb] Add support for gdb-style 'x' packet by labath · Pull Request #124733 · llvm/llvm-project · GitHub as a prototype of how could support for both packet formats look like. I’ve tried to make the patch cherry-pickable, so I’ve resisted the urge to sneak in bigger cleanups that I would have normally done. In particular, I did not try to implement the fallback to the m packet as that’s only necessary at the final stage of the process.

The proposed plan sounds good to me, assuming @jasonmolenda has no objections.

It’s hard to predict what our support window will be like, we don’t knowingly/intentionally break backwards compatibility, but if something else were to happen that forces us to drop support for older debugservers, that would be an opportunity to remove the compatibility code.

I was wondering if the GDB guys had updated the RSP spec with the new behavior. The answer is yes - the ‘x’ packet now states that the reply starts with a b.

Someone should try the latest GDB and force a 0 length x packet to see what gdbserver responds with. We should go with that as our test, perhaps only performed when we want to send our first x packet.

Thanks for digging in to this issue, Pavel, I didn’t realize the gdb community had added an incompatible ‘x’ packet. I agree that we need to avoid incompatible packets between lldb and gdb, so moving lldb is the right change. I don’t see any particular benefit to gdb’s “prefix bytes with b”, it seems like a pretty odd choice to me, but whatever, it exists and we need to interoperate. I think you mentioned you prefer the gdb style behavior but I don’t see how it’s an improvement, shrug.

[ rewriting most of this after thinking about it today, commenting on the initial PR proposal ]

We have four use cases to think about:

  1. old lldb talking to new lldb-server/debugserver
  2. new lldb talking to old lldb-server/debugserver
  3. lldb talking to random gdb stub - gdbserver, jtag device, qemu
  4. gdb talking to new lldb-server/debugserver

For (1), this is a short-term support requirement; occasionally we (at Apple, anyway) will have someone using an older lldb on their mac, remote debugging a newer remote device - newer mac, newer iOS device, running a newer debugserver. One release cycle / one year?

For (2), a new lldb on a mac will need to communicate with debugservers up to about five years old (running and debugging apps on older iOS devices), that’s roughly our support window.

For now, lldb-server and debugserver should default to lldb’s-x behavior, same as currently. New lldb will send a binary-upload+ feature in the qSupported packet it sends at the start of the session, and lldb-server/debugserver changes to gdb’s-x behavior for this session. lldb-server/debugserver do not need to send anything to indicate this feature was understood and acted upon, IMO.

When lldb sends its x0,0 packet to see if binary is supported, we can get a reply of OK which means the remote stub only implements lldb’s-x. We get a reply of b or Exx, then this is stub using gdb’s-x. Any other reply (including empty-string indicating unrecognized packet), we fall back to m asciihex memory reads.

(even if gdbserver replies with b today to x0,0, that’s not a documented behavior in the gdb remote serial protocol documentation so another stub may return an error code. lldb-server/debugserver will never return an error code, and a stub that doesn’t know the “x” packet at all will return empty string. I believe Exx definitely indicates a stub that knows gdb’s-x.)

We get #3 for free with new-lldb when it sends x0,0 and gets back b or Exx.

#4 has to wait until we flip the default switch for lldb-server/debugserver to start in gdb’s-x behavior. lldb-server may choose to be more aggressive in this behavior change than debugserver is - I will realistically want to keep existing behavior closer to 2 years, I know what weird combinations my users will try to use.

That (them changing the spec during implementation – shame on them!) was my first thought as well, but no, it turns out the x packet didn’t exist in the spec at all. LLDB came with it first, and then GDB “reinvented” it last month.

Now you could say that means they should implement our packet. In fact, @rocallahan tried that, but it wasn’t very well received, mainly because our implementation isn’t self-consistent (more on that below).

It responds with b, but maybe not in all cases (I think the problematic are situations where the gdbserver does not have any process attached (I guess it checks for this before even looking at the packet), and although a read of size zero on an unreadable region currently succeeds it wasn’t clear whether that’s actually desirable. This is why they’ve added the new qSupported for feature detection.

The benefit is the unambiguity between errors and successful reads (i.e., is E47 an error or actual memory content?). Triggering this problem is not completely easy because lldb will normally read memory in 512-byte blocks, but I can trigger it when I disable memory caching:

(lldb) set set target.process.disable-memory-cache true
(lldb) memory read 0x00007fffffffeffc
lldb             <  20> send packet: $x7fffffffeffc,20#9b
lldb             <   8> read packet: $@E47#f0
lldb             <  20> send packet: $x7ffffffff000,1c#2f
lldb             <   7> read packet: $E08#ad
0x7fffffffeffc: 40 45 34 37 00 00 00 00 00 00 00 00 00 00 00 00  @E47............
0x7ffffffff00c: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
warning: Not all bytes (4/32) were able to be read from 0x7fffffffeffc.
(lldb) memory read 0x00007fffffffeffd
lldb             <  20> send packet: $x7fffffffeffd,20#9c
lldb             <   7> read packet: $E47#b0
error: memory read failed for 0x7fffffffeffd

The m packet does not have this problem because it will always return an even number of hex characters and it can use the lowercase e for hex digits.

That could work, but it’s not compatible with how gdb has implemented this feature (server sends the feature, and client does the detection). We might me still able to change that, but we’d need a very good reason, and the time is running very short as they’re getting ready to do a new release for this fix.

That said, I don’t think that’s necessary. I think my original proposal can satisfy all of these requirements – we just need to tweak the timeline a bit:

  • lldb starts recognising the binary-upload+ server feature ~now
  • in ~two years, servers start using the gdb implementation (and sending the qSupported feature)
  • in 2+5 = ~seven years, lldb stops supporting the old lldb implementation

Scenario (1) works because we hold off on the serverside change until the new clients are available. Scenario (2) works because we keep the client supporting the old protocol sufficiently long. Scenario (3) works immediately after the first step as long as those stubs follow either version of the protocol (sending binary-upload+ or responding to the x0,0 probe). Scenario (4) works, but it uses the less efficient m implementation for the first two years.

WDYT?

(the seven-year horizon could potentially be shortened, given that it just means falling back to m packets. That’s one way to encourage users to upgrade… :stuck_out_tongue: )

1 Like

Ah, I forgot to think of the case where our current lldb-x could reply Exx and it is possible for that to be either an error code or a 3-bytes-of-memory response, which is unlikely normally but the x packet is allowed to return fewer bytes than requested (or, hilariously, more bytes than requested which ProcessGDBRemote ignores).

I didn’t understand that gdbserver sends the binary-upload+ in its response to the qSupported query. I only suggested sending binary-upload+ lldb’s features list it sends in qSupported so an existing lldb-server/gdbserver could default to the lldb-x but could send the standards-conforming one when lldb indicated it could support it. That’s clearly not important though.

I’m fine with your suggested approach, it all make sense. Thanks for explaining those details.