Support for Error Strings in remote protocol

Hello all,
Currently the remote protocol in LLDB does not allow sending Error Strings in response to remote packets, it only allows for “ENN” format where N is a hex integer. In our current ongoing work, we would like to have support for Sending Error Strings from lldb-server. I would like to invite any opinions or suggestions in this matter ?

A very simple proposal would be to just attach an error string maybe as a Name:Value Pair ? like so →

EXX;“Error String”
or
EXX;M"Error String"

I guess removing EXX would make it incompatible with gdb-server. Also adding new packets to query errors might not be desired ?

Regards,
A Ravi Theja

+1 one from me. I like the idea a lot. Specific details below.

Hello all,
       Currently the remote protocol in LLDB does not allow sending Error
Strings in response to remote packets, it only allows for "ENN" format where
N is a hex integer. In our current ongoing work, we would like to have
support for Sending Error Strings from lldb-server. I would like to invite
any opinions or suggestions in this matter ?

A very simple proposal would be to just attach an error string maybe as a
Name:Value Pair ? like so ->

EXX;"Error String"
or
EXX;M"Error String"

I guess the decision here depends on how forward-compatible we want to
be. If we don't anticipate adding further fields here, then the format
can just be
EXX;message
and no quoting is needed. If we want to add more fields in the future
(I don't really see what could they be though), then we should stick
to the standard semi-colon delimited list format. So, something like:
EXX;message:<message>
But then we need to decide how to encode <message>. I guess the most
"standard" approach would be to hex-encode it, although it will make
them hard to read manually.

I guess removing EXX would make it incompatible with gdb-server. Also adding
new packets to query errors might not be desired ?

I think we should keep the numeric codes. Sometimes it may be useful
to programmatically switch on them, and that's hard to do with a
string only. For compatibility's sake, I'd only send the error
messages if the client explicitly enables it via some packet.

pl

What we can't do is require the remote server to support the new protocol. lldb-server isn't the only thing we talk to, and failing because we didn't get a specific non-RSP conformant error packet would be bad.

I like Pavel's idea of enabling it via a Q packet, and after being enabled it should always be optional.

What’s the specific use case that you’re trying to support with error messages in the protocol? My initial thought on this is that it’s not really the debug server’s job to generate human-readable error messages and that the debugger is better suited to do the job.

Can this problem be solved by extending the current integer list used for errors?

Because the gdb remote protocol docs explicitly state:

The error response returned for some packets includes a two character error number. That number is not well defined.

we don't put much stock in the actual error numbers.

If you can determine that you are talking to lldb-server, then we could actually make these meaningful by keeping a common table. But that would only work for lldbserver.

Jim

True, but the error strings would be only available with lldb-server as well. Keeping a common table of error numbers seems like a good solution.

Right. I wasn't actually arguing one method over the other. Mostly pointing out that you can't take error numbers seriously in general, and that consequently if we go the error number route, you have to know you are talking to lldb-server and particularly one that has rational error numbers. Maybe have a qUsesLLDBSERVERErrorNumbers packet as part of the handshake.

Jim

That’s the other option of decoding error codes at the client, there is the obvious downside of the common error table to become very big ? considering the number of OS’s and Targets ?
Also the lldb-server already knows the target, would be useful if it could generate an error message as well ?

The use case is as follows →
we are currently implementing support for Intel Processor Trace for lldb, the way it is structured is that the lldb-server gathers trace data and we have a tool running on top of SB API’s
which does all the trace specific handling. So basically the client is sort of transparent. We choose such a design so as to do the least amount of changes in lldb.

I think this really depends on the use case. Take the A (launch)
packet for example. Launching a process can fail for any number of
reasons. Let's just list the main syscalls involved in doing that:
- open (for redirecting stdio)
- chdir
- fork
- execve
And each of these can fail for several reasons. So if you want to pass
full information, you'd need to pass some sort of operation+errno
combination, and the 8-bit address space is quite crowded for that,
particularly if you want it to be backward compatible. As an example,
take a look at this command:
(lldb) process launch -e /non/existing/file
error: process launch failed: 'A' packet returned an error: 8

Wouldn't it be better if it read:
error: process launch failed: failed to open file
'/non/existing/file': No such file or directory

I think it would be hard to get this level of detail for the client
with just numeric error codes, whereas producing this string is
trivial for the server.

Note that it is still the client who decides what error to print to
the user, and if it thinks it has a better error message, it is free
to ignore the one returned by the server.

I’m just a new lurker here, so maybe this is obvious…

Is the string part of the programmatic interface? Or just a comment?
Does the same numeric code always have the same string?
If the same numeric code can have different strings, then the string
represents a specialization of the error code? If clients depend on
the data that’s in the string, then they may not work correctly in
modes of operation where the string is not available from the server.

If it’s intended to be part of the API then maybe a structured name/value
approach might be better?

Or maybe it’s just supposed to be a form self-documentation so that inspection
of the raw error codes is easier to diagnose? In that case maybe the string
is always 1-to-1 with the error code?

Hello,
Now the SBError class in the public API interface of lldb does contain a string. I think in erroneous cases, lldb seems to set the String
in the lldb Status class more often than the error code. I seriously doubt if there is a coherent structure in the error classes.

This whole error structure is borrowed from GDB, which is vague about error codes. Now there are two questions we need to answer →

  1. Do we want to have ability to send error strings in the error packets ?
  2. If Yes then how ?

My main concern was that if strings are added, there’s some
clear documentation about the relationship between the string
and the number to explain what’s going on. Based on other
emails in this thread it seems like the numbers are so unreliable that
it might not be worth the trouble.

What about this approach instead?

Define a new mode of operation called something like “extended error response”
and invent some way for the client to 1) detect if it’s supported in the server and then
2) to enable the mode in the server.

Then define a better error interface. You’d want it to resemble the existing one
to make it easy for clients to enable it without having to write a bunch of new code.

If many things can go wrong in the server, then you might want to have some arbitrary
lines of text that can be retrieved by the client, and which are defined as
“human readable only” In other words, warn clients not to parse this extended
“Error log” type of string stuff. The client could dump that to the human on request.

That would give a lot of flexibility for the server to spit ad-hoc strings into the error buffer.

You could also define a strict set of numeric codes for things that are supposed to
common and stable between server versions and implementations. But that would
still be within the “extended error response” mode.

The gdb remote protocol documents say the error numbers are not well defined. They are not meant to have any meaning.

The lldb Status (née Error) objects have different namespaces, some of which (like the Posix errors) have significant numbering, so the numbers in those cases are preserved, since they might be significant to somebody.

But we never saw any advantage to adding significant numbers to the error strings for lldb's internal errors, so we didn't do that. The primary things in the Status class are that you had an error and the string. This has not caused any problems internal to lldb. We don't have a lot of operations that can fail in multiple ways that can't be independently screened and which you would react to differently depending on the mode of failure. So this would have been an unnecessary complexity.

As to error strings from lldbserver, it seems useful to send them, but mostly because the server might be able to add bits of detail that are unavailable to lldb. To that end, having the server present the errors is more likely to generate helpful information for the user than having a set of numbers lldb decodes statically. Again, I doubt that making the numbers significant will be particularly useful, but carrying some extra info about the failure that might be significant to the user of lldb does seem worth a little effort.

Jim

My main concern was that *if* strings are added, there's some
clear documentation about the relationship between the string
and the number to explain what's going on. Based on other
emails in this thread it seems like the numbers are so unreliable that
it might not be worth the trouble.

What about this approach instead?

Define a new mode of operation called something like "extended error
response"
and invent some way for the client to 1) detect if it's supported in the
server and then
2) to enable the mode in the server.

Then define a better error interface. You'd want it to resemble the
existing one
to make it easy for clients to enable it without having to write a bunch of
new code.

If many things can go wrong in the server, then you might want to have some
arbitrary
lines of text that can be retrieved by the client, and which are defined as
"human readable only" In other words, warn clients not to parse this
extended
"Error log" type of string stuff. The client could dump that to the human
on request.

That would give a lot of flexibility for the server to spit ad-hoc strings
into the error buffer.

I think that pretty much sums up what we were talking about. The
client would enable the packet via e.g. QEnableStringErrors packet
(the servers already know to reply "unsupported packet" to any packets
they don't understand). Then the server can send
"Exx;some_error_string" (instead of the usual "Exx"). Later, when the
client converts that into a Status object he will use that string to
initialize the error string. If the error string is not present he
will simply initialize the error string to "Error 47". All of this
would be completely invisible to all but the lowest layers of the gdb
protocol code.

You could also define a strict set of numeric codes for things that are
supposed to
common and stable between server versions and implementations. But that
would
still be within the "extended error response" mode.

Right I don't think we have a use case for these extended error codes,
so I'd postpone that discussion until a need arises. (Mainly because
maintaining a set of backward/forward compatible set of numeric error
codes is a pain).

Hello,
I would just like to add one more point to this discussion about error strings being human readable or not, I guess the whole purpose
of having error strings, is to present them to human users right ? i mean a use case of sending strings, that are not human readable won’t be
required.

So I can work on this point and upload a patch for review. I will add all people in this discussion in the review as well.