RFC: Register fields in expressions

DavidSpickett · June 6, 2024, 2:40pm

The next installment in my quest not to read the processor manuals.

Current State

lldb-server tells lldb what fields are in a register, using the features I added previously.

Note: The following output includes some pending work on register enums. I have included this because it shows you the final form of the feature I propose, but my proposal does not depend on register enum support.

(lldb) register info fpcr
       Name: fpcr
       Size: 4 bytes (32 bits)
    In sets: Floating Point Registers (index 1)

| 31-27 | 26  | 25 | 24 | 23-22 | 21-20 |  19  | 18-16 | 15  | 14-13 | 12  | 11  | 10  |  9  |  8  | 7-0 |
|-------|-----|----|----|-------|-------|------|-------|-----|-------|-----|-----|-----|-----|-----|-----|
|       | AHP | DN | FZ | RMode |       | FZ16 |       | IDE |       | IXE | UFE | OFE | DZE | IOE |     |

RMode: 0 = RN, 1 = RP, 2 = RM, 3 = RZ

register read will then format the value for you:

(lldb) register read fpcr
    fpcr = 0x00c00000
         = (AHP = 0, DN = 0, FZ = 0, RMode = RZ (0x3), FZ16 = 0, IDE = 0, IXE = 0, UFE = 0, OFE = 0, DZE = 0, IOE = 0)

You can also use registers in expressions:

(lldb) expr -- $fpcr
(unsigned int) $0 = 0

However they are just integers, even if lldb knows they contain fields.

Enabling Register Fields in expressions

I want to allow access to register fields in an expression. I know this is possible and I have a prototype,
but there are some tradeoffs to discuss.

The obvious approach is to make registers into struct types:

(lldb) expr -- $fpcr
(__lldb_register_fields_fpcr) $1 = (0, IOE = 0, DZE = 0, OFE = 0, UFE = 0, IXE = 0, 0, IDE = 0, 0, FZ16 = 0, 0, RMode = RN, FZ = 0, DN = 0, AHP = 0, 0)

So you can read and write each field:

(lldb) expr -- $fpcr.RMode
(__lldb_register_fields_fpcr_RMode_enum) $2 = RN

(this is something Jim hinted at during the first part of the register work)

You can even use the enums for the field:

(lldb) expr -- $fpcr.RMode = RZ
(__lldb_register_fields_fpcr_RMode_enum) $3 = RZ
(lldb) expr -- $fpcr_struct.RMode
(__lldb_register_fields_fpcr_RMode_enum) $4 = RZ
(lldb) register read fpcr
    fpcr = 0x00c00000
         = (... RMode = RZ (0x3) ...

Use Cases

Watch Windows
- Always know what mode you’re in, for example AArch64 Streaming Vector modes.
- IDEs like Visual Studio Code
  (that may not support registers being anything other than integer).
Stop commands
- As above but for the command line.
Scripting
- Flip bits at will without working out where they are in the register.
- Easier to read and maintain code.

Struct vs. Integer

This is the key reason to RFC this. The type change from integer to struct may break existing expressions and even
break expressions as you move between debug servers.

It is important to remember here that the register information comes from the debug server, not the debug client.

(I know some disagreed with this previously, and I have my doubts too but for GDB compatibility reasons that is what I
went with)

Existing Expressions

If anyone has a script that treats $fpcr as an integer, and the debug server sends field information for it,
it will turn into a struct and the expression will fail.

This can happen when you upgrade lldb-sever, or even connect to another existing lldb-server. As each one may
send different field information.

Integer is Sometimes Better

In a past job I wrote scripts to simulate the work an exception handler does, stacking registers.
Except we “stacked” them into a Python object for later restoration. In this case we just wanted an integer for each register.

If any of those registers structs, I would have to do this ugly workaround:

(lldb) expr -f hex -- *(uint32_t*)(&($cpsr))

This assumes that the user knows the size of the register. The whole point
of lldb providing this information is so that they do not have to know that, so I do not think this is reasonable to ask of them.

Struct, Integer, Both?

What does GDB do?

GDB defines register information in:

It defines 2 ways to represent register fields, struct and flags.

Which to choose? Structures or flags?

Registers defined with ‘flags’ have these advantages over defining them with ‘struct’:

Arithmetic may be performed on them as if they were integers.
They are printed in a more readable fashion.
Registers defined with ‘struct’ have one advantage over defining them with ‘flags’:

One can fetch individual fields like in ‘C’.
(gdb) print $my_struct_reg.field3
$1 = 42

In other words:

If you want to see nicely printed registers that are actually integers, use flags.
If you want nicely printed registers that are actually structs, use struct.

This means that GDB has decided:

Breaking expressions when moving between debug servers is not a concern. If the
new debug server sends field information, or does not, or it is different, that is
for the user to deal with.
The XML information is deciding how you can interact with the register
(which feels wrong to me, but more on that later).
There should be no difference in the expressions, just use $register as you always do.

I am reliably informed that the reason for this struct flags dinstinction is this usabiity choice.

For lldb I did not see the need for a distinction between struct and
flags so my plan has always been to only support flags and provide all features based on that.

Should lldb support `struct`?

lldb definitely could but I do not think that the decision of struct
vs integer being made at the target XML level is an ideal choice.

As a user I would expect to have both for the same register in the same debug session. Ideally via one interface but at least the
ability to choose from within lldb.

In addition, the code to support struct and flags would be almmost identical
but for the naming and more tests to run. It would not bring any additional features to lldb.

Walks like an Integer, Quacks Like a Struct?

In that previous job, we built a feature like this feature in Python, which allowed us to have
objects that were for most purposes integers but could act like structs on demand.

In lldb we are confined by a C type system, so I do not think we can do this.
Even if we move to C++'s rules, we would have to JIT compile extra methods for these
struct types to allow implicit conversion to integer.

I found ways to add methods to a type that point to existing functions elsewhere,
but not to craft entirely new ones.

Though I am new to the type systems and we do have complete control over that system, so please correct me if you know of a
way to achieve this. As it would be the ideal solution here.

Type Suffixes

Another approach is to say that integer registers are the norm and therefore
struct types must be opt in.

For example:

(lldb) expr -- $fpcr_struct
(__lldb_register_fields_expr_fpcr) $1 = (0, IOE = 0, DZE = 0, OFE = 0, UFE = 0, IXE = 0, 0, IDE = 0, 0, FZ16 = 0, 0, RMode = RN, FZ = 0, DN = 0, AHP = 0, 0)
(lldb) expr -- $fpcr_struct.RMode = RZ
(__lldb_register_fields_expr_fpcr_RMode_enum) $3 = RZ

This also means that if you opt into struct but we don’t have the field information,
you get a slightly more obvious error:

(lldb) expr -- $fpcr_struct.RMode = rz
error: <user expression 3>:1:22: use of undeclared identifier 'rz'
    1 | $fpcr_struct.RMode = rz
      |

We can special case this to improve it, for example
“requested struct type for a register with no field information”.

You can also use either or both types in scripting. Want to save the register
context then flip a mode bit and continue? Save them all as integers then use
the _struct type to flip the bit.

There are some drawbacks:

It is hard to discover that the suffix exists (help expression is one place at least).
Perhaps a suffix is not the best, maybe a prefix $struct_...?
_struct should be a more generic name like _typed, because we may have vector and union registers
(lldb does not support "<vector>" and "<union>" when describing registers · Issue #87471 · llvm/llvm-project · GitHub) in future.
This does create a difference with GDB. We do not commit to scripts being portable, but users do tend to assume it is mostly the same.

Why Not Add This To Register Read/Write?

It is possible to add these same features to those commands. However, the complexity
there is writing all the type lookup and walking of the fields. I started this and
realised that I was essentially implementing the expression parser over again.

I do think that long term it would make sense for these commands to be able to
handle fields, but short term, 80% of the value of the feature comes from expressions.

Summary

I want to allow register fields in expressions.
I know it can work but it has usability problems.
I need your feedback on how to handle that.
The prototype is here, it is surprisngly little code.
Thanks for reading this far

Starter questions for you the reader:

Is there any precedent for special (perhaps variant), types like these?
What do you think our commitment is to existing expressions?
Do you have existing scripting that uses register expressions, what are the costs associated with changing that?
What would least surprise you as a user?
As a user, how would you expect to “discover” this feature?
(perhaps we need a design doc / guide on how register information works overall in lldb)

Any other comments and questions welcome of course.

DavidSpickett · June 6, 2024, 3:03pm

Someone asked me if we support register fields that are themselves structs. Which would bring up the question of whether the second struct type should also be opt in.

We do not support this now but in theory we could allow the XML type of a field could point to another flags. The current workaround is to flatten the type of the field into multiple fields in the top level register.

So if register A has field f1, that is broken into f1A and f1B, the current advice would be to define f1A and f1B as parts of register A. Not as parts of field f1 of register A.

If we did support nesting, the question would be does the expression $A.f1 return an integer or a struct type? Seems like whatever strategy is decided for the top level case should apply here too.

If/when we have vector and union, we can have limited nesting. Since a vector register is usually described as a union of vectors where each vector has a different element size.

Though in that case I do not think vector registers (especially > 128 bit) can be handled as integers in the first place. Then if you are accessing part of that union, that functions as the “opt in” to getting this vector type back (an array in C terms).

And in some sense <union><vector arrays></union> is a union of register views not of register fields. Each part of the union is the whole register value. But we are getting into semantics there.

So I do not think this nesting is a concern.

jingham · June 6, 2024, 6:14pm

I would be surprised if there were lots of code around that has expressions accessing fp flags this way, as long as the transition is easy, then I don’t think breaking these is a major concern. Since we have control over the struct definitions we make for these register fields on the lldb side, we could always add a field - value or something - that has the register value as an int. That way if people did have code that uses the full register value, they would only have to change their expression to $regname.value. If we did it that way, then when you do expr $regname you would clearly see the value field, so this would be self-documenting.

DavidSpickett · June 20, 2024, 3:51pm

Nice idea, self-documenting is definitely a big benefit.

Probably needs a few __ adding to the name to prevent a clash with a field, but I can figure that out.

The other issue is how to implement this. The plain C way would be a union of struct and integer like:

union {
    uint64_t value;
    struct .... {} fields;
};

Which keeps the type the same size as the register. Drawback there is that you’d have to make an explicit choice every time, as $cpsr is the union. You can dump $cpsr to see what to do at least.

The other way is to make a double sized (as in 2x the size) value that is like:

struct {
      <...64 bits of bitfields...>
      uint64_t value;
};

And we pick which half to write back to the real value. Unless someone writes to both halves in the same expression of course. We could just error to tell them not do to that.

Another way is to have the expression parser recognise $<register name>.<special value attribute> and return the value type, whereas $<register name> may return the struct type with the fields. However this is not self documenting.

(which I know realise is essentially the reverse of my suffix idea)

I will prototype some of this and see how it goes.

Topic		Replies	Views
RFC: Adding Register Field Enums to LLDB LLDB	4	392	July 3, 2024
[RFC] Adding Register Field Information to lldb-server LLDB	13	866	November 10, 2023
[RFC] Showing register fields in LLDB LLDB	19	2149	November 10, 2023
display register fields? LLDB	2	136	January 29, 2018
"reg read -a" and x86 drN registers LLDB	3	91	November 18, 2014