RFC: Adding Register Field Enums to LLDB

Note: This RFC talks about a feature I’d like to see standardised across
LLDB and GDB. For now this document only deals with LLDB. I’ll be incorporating
feedback from this into a later proposal to GDB.

Current State

LLDB 18.x is able to show users the fields within a register, using the target
XML elements also supported by GDB (see the previous RFCs).

(lldb) register read fpcr
    fpcr = 0x00000000
         = (AHP = 0, DN = 0, FZ = 0, RMode = 0, ...

This means developers don’t have to read the manual to find out where each field
is and then decode it.

The Problem

Some register fields are easy to interpret from a number. The number of
hardware breakpoints might be in a field that simply contains that same number.

Others are arbitrary mappings of value to meaning. For instance AArch64’s
floating point rounding modes.

fpcr = 0x00000000
     = (... RMode = 0, ...

Without looking in the manual, the user won’t know what 0 means. In the manual
we find:

Value Meaning
0b00 Round to Nearest (RN)
0b01 Round towards Plus Infinity (RP)
0b10 Round towards Minus Infinity (RM)
0b11 Round towards Zero (RZ)

(Challenge Validation,
“C5.2.8 FPCR, Floating-point Control Register”)

LLDB should be able to show those meanings along with the value.

Changes to LLDB Commands

For register read, we’ll show the field like a C enum, and they’ll be in the
register type as if they were one.

Unlike a C enum I want to show the value and the name of that value. Since the
integer value is the one that is most likely to be in the developer’s program.

For example:

uint64_t new_reg_val = (1 << 2) /* enable foo */ | (5 << 10) /* use mode bar */;

Such formatting was proposed previously so I want to add the same thing, initially limited to register
printing.

(lldb) register read fpcr
    fpcr = 0x00000000
         = (... RMode = RN (0), ...)

For register info, any field enums will be shown at the end of the output,
and may include a descriptions as well.

(I think adding descriptions to fields is a good idea, but not addressed in this
work)

(lldb) register info fpcr
<...>
| 31-27 | 26  | 25 | 24 | 23-22 | <...>
|-------|-----|----|----|-------| <...>
|       | AHP | DN | FZ | RMode | <...>
<...>
RMode: 0 = RN
         Round to Nearest
       1 = RP
         Round towards Plus Infinity
       2 = RM
         Round towards Minus Infinity
       3 = RZ
         Round towards Zero

Field Enum Properties

The following assumptions inform the design and parsing of the XML.

  • Not all values of a field will have a name as they may be reserved values for
    future implementations.

This means you might have 0 = Mode A, 1 = Mode B, 2 = reserved, 3 = reserved.

It is simpler to not require the debug server to write out N = reserved for
every reserved value. The consequence of that is that there will be gaps in the enum.

  • Some values may map to the same name.

For example 0 = Mode A, 1 = Mode A. This could happen if new hardware only
supports this Mode A, but still accepts the value 1 to keep software compatibility.

  • Fields don’t have to have an integral type.

I only have a made up example here, I think it will be rare in practice.

A register might have a field that is an 8 bit floating point value that is a
threshold for some accuracy check. Now assume that the hardware only
accepts 4 different values here, but for whatever reason the
designers decided to put the whole fp8 in the register (maybe so software can
grab it easily).

You might want to label those 4 allowed values as an enum “conservative”,
“balanced”, “risky”, “anything goes”.

This field would have a type of fp8. Which is not going to work if the
debugger is modelling these as C enums.

I think these will be rare, so I propose not disallowing this in the
target XML spec and just noting that debuggers will only show enums for fields
of type they know how to handle. Which in 99% of cases will be integral types
only.

If the producer of this XML really wants it to work with existing debuggers,
they could also encode the field as an integer that contains the raw value of
the fp8.

  • Architecture designers do not follow C naming rules.

Field enumerators could contain spaces and reserved characters, newlines, etc.
Which may cause problems for a debugger modelling them as C enums.

LLDB will go directly to Clang AST, which doesn’t care about these rules.
It will have no problem with an enumerator called
123 this isn't a valid C name?.

  • Longer descriptions will be in a separate description attribute.

Some enumerators will be explainable just with their name, others will need a
sentence or two of background information.

This will be included in a seperate attribute so that debuggers know they’re
potentially printing a lot of text.

Changes to Target XML

These changes extend GDB’s target XML format. Prior to any changes
landing in LLDB, I am going to RFC these changes to GDB as well so we can both
work from the same standard.

Register field enums will extend the existing register flags elements. Which
look something like:

<flags id="fpcr_flags" size="4">
  <field name="AHP" start="26" end="26"/>
  <field name="DN" start="25" end="25"/>
  <field name="FZ" start="24" end="24"/>
  <field name="RMode" start="22" end="23"/>
  <...>
</flags>
<reg name="fpcr" bitsize="32" regnum="160" encoding="uint" format="hex" type="fpcr_flags" group="Floating Point Registers" />

We tie the flags to the reg by setting the reg’s type to the id of the
flags element.

To add enums to this we’ll add an enum child element to the field, and
enumerator children to that enum.

Using RMode as the example:

<field name="RMode" start="22" end="23">
  <enum>
    <enumerator name="RN" value="0" description="Round to Nearest"/>
    <enumerator name="RP" value="1" description="Round towards Plus Infinity"/>
    <enumerator name="RM" value="2" description="Round towards Minus Infinity"/>
    <enumerator name="RZ" value="3" description="Round towards Zero"/>
  </enum>
</field>

We could just have enumerator elements and no overall enum, however having
a enum to group them all:

  1. Keeps them together and means we won’t mix future children of field
    in with enumerators. Which is more pleasent to read.
  2. Leaves the door open to sharing enums definitions by writing <enum id=... the first time
    it’s defined and later <enum idref=...> (https://www.informit.com/articles/article.aspx?p=26946&seqNum=4). This could be useful for registers
    repeated for example at every AArch64 exception level, but which only have a
    few fields in common across all exception levels.

enum has no attributes, it’s just a container element.

Each enumerator has:

  • name required - The short name of the value, usually an acronym or
    single word. This does not have to follow C naming rules, or any other specific
    language rules. However it cannot be an empty string (which is equivalent
    to not naming the value at all).
  • value required - The value that maps to the name. The type of this
    value comes from
    the type of the field.
  • description optional - Longer text documentation for the value. This is
    used by the debugger when it’s appropriate to show more text
    e.g. register info in LLDB.

The enumerators can be in any order. I don’t see a reason to enforce increasing
value order, as it may be more logical to show them in another order. Also doing
so is more work for the debugger. Debuggers may choose to sort them if they wish.

For instance, you might have 0 = Running, 1 = Starting. Perhaps a
previous hardware revision only had a Running mode, and a later one added
Starting. Now you have a value order of Running/Starting but to a user
it may make more sense to see them in register info as Starting/Running as
that’s how the hardware moves through the modes in practice. So you can specify
them in that order in XML, and most debuggers will keep that order.

I’ve only talked about adding enum to flags and not to struct. struct is
a second way to define register fields which gives you some different features
compared to flags when used with GDB.

LLDB only supports flags and as far as I can tell, could implement every
feature struct provides using only flags. So I’ve ignored it here. When I
propose changes to GDB, I’ll likely add enum to struct as well, just for
consistency.

XML Parsing

Some of this is going to be debugger specific, some of it might end up in the target XML specification.

Overall, having enum as a child of field means that we don’t have to look
elsewhere for the enum as we have to do with flags.

If there’s more than one enum for a field, LLDB will take the first one that
parses correctly and has a number of enumerators greater than zero. We’ll log
once that we’re dropping the following enum, then ignore the rest.

When parsing enumerator:

  • It must have at least name and value.
  • name must be a non-empty string (if empty, enumerator is ignored).
  • value must be within the possible range of the field for its size and type.
    (the enumerator is ignored if not)
  • Only the first instance of value is used and subsequent enumerator with
    that value are ignored
    (if you really want multiple names for one value, use one enumerator with
    a name like "Foo/Bar").

XML Generation

Nothing unusual here. lldb-server already uses the RegisterFlags ToXML
method to output XML for the flags and fields, I’ll add enum to that.

XML character escaping will be applied to the name values, as we do now for
register and field names.

Compatibility

I don’t know of any other open source debugger that can do this, so we are free
to define how this works.

GDB has not attempted this, so there is a good chance we can make this the
standard for LLDB and GDB.

Older LLDB and GDB will ignore the new enum child element of field, so
old debugger → new server will not be a problem.

Prototype Implementation

I have a branch with an implementation of this that adds the supporting
functions and field information for control registers on AArch64 Linux.

When I am confident that this approach will be accepted by GDB, I’ll split that
code into PRs to go into LLDB.

Known Limitations

Non-integral Fields

Fields with non-integral types will require special handling, as stated earlier.
This can be improved in each debugger as (and if) we encounter a need for it.

Split Values

Fields that contain one part of a split value are in general not supported well
by target XML, so this will extend to enums. For example:

| 31   | 30 | ... | 3-0    |
| N[4] | M  | ... | N[3-0] |

A potential solution here is to define fields as expressions run on the register
value. Here N would be (reg >> 27) | (reg & 0xF). If that approach was taken,
we would still be able to attach an enum to that field as before, so this is
not a problem.

Dependent Values

A field may have one set of meanings when in one mode and another in a different
mode. Supporting this would require a way to lookup that other mode. In pseudo code:

if controlreg.mode === ModeA:
  otherreg.enum = [("A0", 0), ("A1", 1)]
elif controlreg.mode === ModeB:
  otherreg.enum = [("B0", 0), ("B1", 1)]

A real world example is registers that contain exception details. Depending on
where the exception is taken from, the details are different. So programmers
need to first lookup the exception type, then interpret the register.

The simplest workaround here is just to not attempt to describe these fields.
If they are that complex, maybe there is some value in sending the programmer
to the manual and making sure they understand what they’re looking at.

To support this you could have
<enum condition="some expression to evaluate". Dynamic fields could be done
this way too. Not proposing we do this now, the point is that enum is not
going to put us in a position where we cannot choose to later.

Next Steps

Please give your feedback on what I am proposing here.

Some starter questions:

  • Do any fields of your target’s registers break the assumptions stated?
  • Is there a feature you’d like to see, that would use this information?
    (and is there missing information that would facilitate that)
  • Have you used a debugger that had this feature? What were the good and bad
    points of it?

Once I have feedback from this RFC my plan is:

  • Integrate it into the LLDB prototype.
  • Propose to GDB the changes to the target XML spec.
  • Update GDB’s target XML spec.
  • Put the LLDB implementation up for review.
  • Aim for LLDB 19.x to have field enums for AArch64 Linux
    (supporting code will work for any architecture, if someone contributes
    the register information).

I hope this will avoid the need to namespace the XML changes (e.g. <lldb_enum>)
while we wait on agreement with GDB.

There’s a good chance GDB will support this feature at some point, but I don’t
plan to do it myself. So no idea of timing there (I have only had informal
talks just to make sure this is in theory possible).

Just finished read this, sorry that took a little time. It’s a real nice enhancement you’re proposing, and while I was trying to think of suggestions or potential problems, you’d already thought past anything I could come up with in your proposal. I haven’t had a chance to look at your WIP branch of changes but reading everything here, I’m a fan of this proposal. I hope it can get traction in the gdb community as well.

Thanks Jason!

I’ll answer my own questions too, in the interest of transparency.

  • Do any fields of your target’s registers break the assumptions stated?

AArch64 has everything listed in the limitations. I haven’t seen anything we couldn’t work around. Haven’t looked as closely at Arm 32 bit but I expect the same.

  • Is there a feature you’d like to see, that would use this information?
    (and is there missing information that would facilitate that)

I do want to add a description field to the fields themselves and the register, so that we can make register info “complete”. Not now, but once enumerators have it it should be an easier pitch to do it elsewhere.

  • Have you used a debugger that had this feature? What were the good and bad
    points of it?

I wrote parts of one :slight_smile: . Codescape Debugger used an XML info bundle to generate similar things. Allowing users to do the equivalent of register write Foo.Bar SomeEnumValue, while still being able to treat the register value like an integer.

That won’t be as smooth for LLDB I expect, as we can’t know what enum values the debug target will have ahead of time. We can come up with an equivalent though I’m sure.

The challenges of this were broadly the same as the limitations section, except this database did have the conditionals and expressions I mentioned.

It was very useful for scripting and for our kernel developers. I remember a feature request to render a register as shift and mask values to aid writing C source. Too niche to add directly to LLDB but if the information is in the API in future it would be easy to implement as a script.

Well, my colleague found Enum Target Types (Debugging with GDB). Turns out GDB does support enums! Though no upstream target supplies the information, it does work if you hack some in.

So I don’t have to RFC to or change GDB after all. Though I will propose the “description” attributes for fields and enums after this work is done.

So [lldb] Add ability to show enum as name and value at the same time by DavidSpickett · Pull Request #90059 · llvm/llvm-project · GitHub is the first of several PRs to implement GDB’s version of this feature.

1 Like

As of [llvm][Docs] Add release note for lldb's support for register enums · llvm/llvm-project@08888d0 · GitHub, this proposal is done and will be included in LLDB 19.

There are a couple of differences between what was proposed and what landed:

  • Enums do not have descriptions. I plan to address this by adding descriptions to all levels of the register types instead. To registers, fields and enums.
  • The format RMode = RN (0) where the enumerator and value are printed, will not be included. I’m still pursuing this but it won’t make it into 19. For now you’ll need to use register info to see the enum values, or get it from the hex value.