Note: This RFC talks about a feature I’d like to see standardised across
LLDB and GDB. For now this document only deals with LLDB. I’ll be incorporating
feedback from this into a later proposal to GDB.
Current State
LLDB 18.x is able to show users the fields within a register, using the target
XML elements also supported by GDB (see the previous RFCs).
(lldb) register read fpcr
fpcr = 0x00000000
= (AHP = 0, DN = 0, FZ = 0, RMode = 0, ...
This means developers don’t have to read the manual to find out where each field
is and then decode it.
The Problem
Some register fields are easy to interpret from a number. The number of
hardware breakpoints might be in a field that simply contains that same number.
Others are arbitrary mappings of value to meaning. For instance AArch64’s
floating point rounding modes.
fpcr = 0x00000000
= (... RMode = 0, ...
Without looking in the manual, the user won’t know what 0 means. In the manual
we find:
| Value | Meaning |
|---|---|
| 0b00 | Round to Nearest (RN) |
| 0b01 | Round towards Plus Infinity (RP) |
| 0b10 | Round towards Minus Infinity (RM) |
| 0b11 | Round towards Zero (RZ) |
(Challenge Validation,
“C5.2.8 FPCR, Floating-point Control Register”)
LLDB should be able to show those meanings along with the value.
Changes to LLDB Commands
For register read, we’ll show the field like a C enum, and they’ll be in the
register type as if they were one.
Unlike a C enum I want to show the value and the name of that value. Since the
integer value is the one that is most likely to be in the developer’s program.
For example:
uint64_t new_reg_val = (1 << 2) /* enable foo */ | (5 << 10) /* use mode bar */;
Such formatting was proposed previously so I want to add the same thing, initially limited to register
printing.
(lldb) register read fpcr
fpcr = 0x00000000
= (... RMode = RN (0), ...)
For register info, any field enums will be shown at the end of the output,
and may include a descriptions as well.
(I think adding descriptions to fields is a good idea, but not addressed in this
work)
(lldb) register info fpcr
<...>
| 31-27 | 26 | 25 | 24 | 23-22 | <...>
|-------|-----|----|----|-------| <...>
| | AHP | DN | FZ | RMode | <...>
<...>
RMode: 0 = RN
Round to Nearest
1 = RP
Round towards Plus Infinity
2 = RM
Round towards Minus Infinity
3 = RZ
Round towards Zero
Field Enum Properties
The following assumptions inform the design and parsing of the XML.
- Not all values of a field will have a name as they may be reserved values for
future implementations.
This means you might have 0 = Mode A, 1 = Mode B, 2 = reserved, 3 = reserved.
It is simpler to not require the debug server to write out N = reserved for
every reserved value. The consequence of that is that there will be gaps in the enum.
- Some values may map to the same name.
For example 0 = Mode A, 1 = Mode A. This could happen if new hardware only
supports this Mode A, but still accepts the value 1 to keep software compatibility.
- Fields don’t have to have an integral type.
I only have a made up example here, I think it will be rare in practice.
A register might have a field that is an 8 bit floating point value that is a
threshold for some accuracy check. Now assume that the hardware only
accepts 4 different values here, but for whatever reason the
designers decided to put the whole fp8 in the register (maybe so software can
grab it easily).
You might want to label those 4 allowed values as an enum “conservative”,
“balanced”, “risky”, “anything goes”.
This field would have a type of fp8. Which is not going to work if the
debugger is modelling these as C enums.
I think these will be rare, so I propose not disallowing this in the
target XML spec and just noting that debuggers will only show enums for fields
of type they know how to handle. Which in 99% of cases will be integral types
only.
If the producer of this XML really wants it to work with existing debuggers,
they could also encode the field as an integer that contains the raw value of
the fp8.
- Architecture designers do not follow C naming rules.
Field enumerators could contain spaces and reserved characters, newlines, etc.
Which may cause problems for a debugger modelling them as C enums.
LLDB will go directly to Clang AST, which doesn’t care about these rules.
It will have no problem with an enumerator called
123 this isn't a valid C name?.
- Longer descriptions will be in a separate
descriptionattribute.
Some enumerators will be explainable just with their name, others will need a
sentence or two of background information.
This will be included in a seperate attribute so that debuggers know they’re
potentially printing a lot of text.
Changes to Target XML
These changes extend GDB’s target XML format. Prior to any changes
landing in LLDB, I am going to RFC these changes to GDB as well so we can both
work from the same standard.
Register field enums will extend the existing register flags elements. Which
look something like:
<flags id="fpcr_flags" size="4">
<field name="AHP" start="26" end="26"/>
<field name="DN" start="25" end="25"/>
<field name="FZ" start="24" end="24"/>
<field name="RMode" start="22" end="23"/>
<...>
</flags>
<reg name="fpcr" bitsize="32" regnum="160" encoding="uint" format="hex" type="fpcr_flags" group="Floating Point Registers" />
We tie the flags to the reg by setting the reg’s type to the id of the
flags element.
To add enums to this we’ll add an enum child element to the field, and
enumerator children to that enum.
Using RMode as the example:
<field name="RMode" start="22" end="23">
<enum>
<enumerator name="RN" value="0" description="Round to Nearest"/>
<enumerator name="RP" value="1" description="Round towards Plus Infinity"/>
<enumerator name="RM" value="2" description="Round towards Minus Infinity"/>
<enumerator name="RZ" value="3" description="Round towards Zero"/>
</enum>
</field>
We could just have enumerator elements and no overall enum, however having
a enum to group them all:
- Keeps them together and means we won’t mix future children of
field
in withenumerators. Which is more pleasent to read. - Leaves the door open to sharing enums definitions by writing
<enum id=...the first time
it’s defined and later<enum idref=...>(https://www.informit.com/articles/article.aspx?p=26946&seqNum=4). This could be useful for registers
repeated for example at every AArch64 exception level, but which only have a
few fields in common across all exception levels.
enum has no attributes, it’s just a container element.
Each enumerator has:
namerequired - The short name of the value, usually an acronym or
single word. This does not have to follow C naming rules, or any other specific
language rules. However it cannot be an empty string (which is equivalent
to not naming the value at all).valuerequired - The value that maps to thename. The type of this
value comes from
the type of the field.descriptionoptional - Longer text documentation for the value. This is
used by the debugger when it’s appropriate to show more text
e.g.register infoin LLDB.
The enumerators can be in any order. I don’t see a reason to enforce increasing
value order, as it may be more logical to show them in another order. Also doing
so is more work for the debugger. Debuggers may choose to sort them if they wish.
For instance, you might have 0 = Running, 1 = Starting. Perhaps a
previous hardware revision only had a Running mode, and a later one added
Starting. Now you have a value order of Running/Starting but to a user
it may make more sense to see them in register info as Starting/Running as
that’s how the hardware moves through the modes in practice. So you can specify
them in that order in XML, and most debuggers will keep that order.
I’ve only talked about adding enum to flags and not to struct. struct is
a second way to define register fields which gives you some different features
compared to flags when used with GDB.
LLDB only supports flags and as far as I can tell, could implement every
feature struct provides using only flags. So I’ve ignored it here. When I
propose changes to GDB, I’ll likely add enum to struct as well, just for
consistency.
XML Parsing
Some of this is going to be debugger specific, some of it might end up in the target XML specification.
Overall, having enum as a child of field means that we don’t have to look
elsewhere for the enum as we have to do with flags.
If there’s more than one enum for a field, LLDB will take the first one that
parses correctly and has a number of enumerators greater than zero. We’ll log
once that we’re dropping the following enum, then ignore the rest.
When parsing enumerator:
- It must have at least
nameandvalue. namemust be a non-empty string (if empty, enumerator is ignored).valuemust be within the possible range of the field for its size and type.
(the enumerator is ignored if not)- Only the first instance of
valueis used and subsequentenumeratorwith
thatvalueare ignored
(if you really want multiple names for one value, use oneenumeratorwith
a name like"Foo/Bar").
XML Generation
Nothing unusual here. lldb-server already uses the RegisterFlags ToXML
method to output XML for the flags and fields, I’ll add enum to that.
XML character escaping will be applied to the name values, as we do now for
register and field names.
Compatibility
I don’t know of any other open source debugger that can do this, so we are free
to define how this works.
GDB has not attempted this, so there is a good chance we can make this the
standard for LLDB and GDB.
Older LLDB and GDB will ignore the new enum child element of field, so
old debugger → new server will not be a problem.
Prototype Implementation
I have a branch with an implementation of this that adds the supporting
functions and field information for control registers on AArch64 Linux.
When I am confident that this approach will be accepted by GDB, I’ll split that
code into PRs to go into LLDB.
Known Limitations
Non-integral Fields
Fields with non-integral types will require special handling, as stated earlier.
This can be improved in each debugger as (and if) we encounter a need for it.
Split Values
Fields that contain one part of a split value are in general not supported well
by target XML, so this will extend to enums. For example:
| 31 | 30 | ... | 3-0 |
| N[4] | M | ... | N[3-0] |
A potential solution here is to define fields as expressions run on the register
value. Here N would be (reg >> 27) | (reg & 0xF). If that approach was taken,
we would still be able to attach an enum to that field as before, so this is
not a problem.
Dependent Values
A field may have one set of meanings when in one mode and another in a different
mode. Supporting this would require a way to lookup that other mode. In pseudo code:
if controlreg.mode === ModeA:
otherreg.enum = [("A0", 0), ("A1", 1)]
elif controlreg.mode === ModeB:
otherreg.enum = [("B0", 0), ("B1", 1)]
A real world example is registers that contain exception details. Depending on
where the exception is taken from, the details are different. So programmers
need to first lookup the exception type, then interpret the register.
The simplest workaround here is just to not attempt to describe these fields.
If they are that complex, maybe there is some value in sending the programmer
to the manual and making sure they understand what they’re looking at.
To support this you could have
<enum condition="some expression to evaluate". Dynamic fields could be done
this way too. Not proposing we do this now, the point is that enum is not
going to put us in a position where we cannot choose to later.
Next Steps
Please give your feedback on what I am proposing here.
Some starter questions:
- Do any fields of your target’s registers break the assumptions stated?
- Is there a feature you’d like to see, that would use this information?
(and is there missing information that would facilitate that) - Have you used a debugger that had this feature? What were the good and bad
points of it?
Once I have feedback from this RFC my plan is:
- Integrate it into the LLDB prototype.
- Propose to GDB the changes to the target XML spec.
- Update GDB’s target XML spec.
- Put the LLDB implementation up for review.
- Aim for LLDB 19.x to have field enums for AArch64 Linux
(supporting code will work for any architecture, if someone contributes
the register information).
I hope this will avoid the need to namespace the XML changes (e.g. <lldb_enum>)
while we wait on agreement with GDB.
There’s a good chance GDB will support this feature at some point, but I don’t
plan to do it myself. So no idea of timing there (I have only had informal
talks just to make sure this is in theory possible).