The next installment in my quest not to read the processor manuals.
Current State
lldb-server
tells lldb
what fields are in a register, using the features I added previously.
Note: The following output includes some pending work on register enums. I have included this because it shows you the final form of the feature I propose, but my proposal does not depend on register enum support.
(lldb) register info fpcr
Name: fpcr
Size: 4 bytes (32 bits)
In sets: Floating Point Registers (index 1)
| 31-27 | 26 | 25 | 24 | 23-22 | 21-20 | 19 | 18-16 | 15 | 14-13 | 12 | 11 | 10 | 9 | 8 | 7-0 |
|-------|-----|----|----|-------|-------|------|-------|-----|-------|-----|-----|-----|-----|-----|-----|
| | AHP | DN | FZ | RMode | | FZ16 | | IDE | | IXE | UFE | OFE | DZE | IOE | |
RMode: 0 = RN, 1 = RP, 2 = RM, 3 = RZ
register read
will then format the value for you:
(lldb) register read fpcr
fpcr = 0x00c00000
= (AHP = 0, DN = 0, FZ = 0, RMode = RZ (0x3), FZ16 = 0, IDE = 0, IXE = 0, UFE = 0, OFE = 0, DZE = 0, IOE = 0)
You can also use registers in expressions:
(lldb) expr -- $fpcr
(unsigned int) $0 = 0
However they are just integers, even if lldb
knows they contain fields.
Enabling Register Fields in expressions
I want to allow access to register fields in an expression. I know this is possible and I have a prototype,
but there are some tradeoffs to discuss.
The obvious approach is to make registers into struct
types:
(lldb) expr -- $fpcr
(__lldb_register_fields_fpcr) $1 = (0, IOE = 0, DZE = 0, OFE = 0, UFE = 0, IXE = 0, 0, IDE = 0, 0, FZ16 = 0, 0, RMode = RN, FZ = 0, DN = 0, AHP = 0, 0)
So you can read and write each field:
(lldb) expr -- $fpcr.RMode
(__lldb_register_fields_fpcr_RMode_enum) $2 = RN
(this is something Jim hinted at during the first part of the register work)
You can even use the enums for the field:
(lldb) expr -- $fpcr.RMode = RZ
(__lldb_register_fields_fpcr_RMode_enum) $3 = RZ
(lldb) expr -- $fpcr_struct.RMode
(__lldb_register_fields_fpcr_RMode_enum) $4 = RZ
(lldb) register read fpcr
fpcr = 0x00c00000
= (... RMode = RZ (0x3) ...
Use Cases
- Watch Windows
- Always know what mode youâre in, for example AArch64 Streaming Vector modes.
- IDEs like Visual Studio Code
(that may not support registers being anything other than integer).
- Stop commands
- As above but for the command line.
- Scripting
- Flip bits at will without working out where they are in the register.
- Easier to read and maintain code.
Struct vs. Integer
This is the key reason to RFC this. The type change from integer to struct may break existing expressions and even
break expressions as you move between debug servers.
It is important to remember here that the register information comes from the debug server, not the debug client.
(I know some disagreed with this previously, and I have my doubts too but for GDB compatibility reasons that is what I
went with)
Existing Expressions
If anyone has a script that treats $fpcr
as an integer, and the debug server sends field information for it,
it will turn into a struct and the expression will fail.
This can happen when you upgrade lldb-sever
, or even connect to another existing lldb-server
. As each one may
send different field information.
Integer is Sometimes Better
In a past job I wrote scripts to simulate the work an exception handler does, stacking registers.
Except we âstackedâ them into a Python object for later restoration. In this case we just wanted an integer for each register.
If any of those registers structs, I would have to do this ugly workaround:
(lldb) expr -f hex -- *(uint32_t*)(&($cpsr))
This assumes that the user knows the size of the register. The whole point
of lldb providing this information is so that they do not have to know that, so I do not think this is reasonable to ask of them.
Struct, Integer, Both?
What does GDB do?
GDB defines register information in:
It defines 2 ways to represent register fields, struct
and flags
.
Which to choose? Structures or flags?
Registers defined with âflagsâ have these advantages over defining them with âstructâ:
Arithmetic may be performed on them as if they were integers.
They are printed in a more readable fashion.
Registers defined with âstructâ have one advantage over defining them with âflagsâ:
One can fetch individual fields like in âCâ.
(gdb) print $my_struct_reg.field3
$1 = 42
In other words:
- If you want to see nicely printed registers that are actually integers, use
flags
. - If you want nicely printed registers that are actually structs, use
struct
.
This means that GDB has decided:
- Breaking expressions when moving between debug servers is not a concern. If the
new debug server sends field information, or does not, or it is different, that is
for the user to deal with. - The XML information is deciding how you can interact with the register
(which feels wrong to me, but more on that later). - There should be no difference in the expressions, just use
$register
as you always do.
I am reliably informed that the reason for this struct flags dinstinction is this usabiity choice.
For lldb
I did not see the need for a distinction between struct
and
flags
so my plan has always been to only support flags
and provide all features based on that.
Should lldb support struct
?
lldb definitely could but I do not think that the decision of struct
vs integer being made at the target XML level is an ideal choice.
As a user I would expect to have both for the same register in the same debug session. Ideally via one interface but at least the
ability to choose from within lldb
.
In addition, the code to support struct
and flags
would be almmost identical
but for the naming and more tests to run. It would not bring any additional features to lldb
.
Walks like an Integer, Quacks Like a Struct?
In that previous job, we built a feature like this feature in Python, which allowed us to have
objects that were for most purposes integers but could act like structs on demand.
In lldb
we are confined by a C type system, so I do not think we can do this.
Even if we move to C++'s rules, we would have to JIT compile extra methods for these
struct types to allow implicit conversion to integer.
I found ways to add methods to a type that point to existing functions elsewhere,
but not to craft entirely new ones.
Though I am new to the type systems and we do have complete control over that system, so please correct me if you know of a
way to achieve this. As it would be the ideal solution here.
Type Suffixes
Another approach is to say that integer registers are the norm and therefore
struct types must be opt in.
For example:
(lldb) expr -- $fpcr_struct
(__lldb_register_fields_expr_fpcr) $1 = (0, IOE = 0, DZE = 0, OFE = 0, UFE = 0, IXE = 0, 0, IDE = 0, 0, FZ16 = 0, 0, RMode = RN, FZ = 0, DN = 0, AHP = 0, 0)
(lldb) expr -- $fpcr_struct.RMode = RZ
(__lldb_register_fields_expr_fpcr_RMode_enum) $3 = RZ
This also means that if you opt into struct but we donât have the field information,
you get a slightly more obvious error:
(lldb) expr -- $fpcr_struct.RMode = rz
error: <user expression 3>:1:22: use of undeclared identifier 'rz'
1 | $fpcr_struct.RMode = rz
|
We can special case this to improve it, for example
ârequested struct type for a register with no field informationâ.
You can also use either or both types in scripting. Want to save the register
context then flip a mode bit and continue? Save them all as integers then use
the _struct
type to flip the bit.
There are some drawbacks:
- It is hard to discover that the suffix exists (
help expression
is one place at least). - Perhaps a suffix is not the best, maybe a prefix
$struct_...
? _struct
should be a more generic name like_typed
, because we may havevector
andunion
registers
(lldb does not support "<vector>" and "<union>" when describing registers ¡ Issue #87471 ¡ llvm/llvm-project ¡ GitHub) in future.- This does create a difference with GDB. We do not commit to scripts being portable, but users do tend to assume it is mostly the same.
Why Not Add This To Register Read/Write?
It is possible to add these same features to those commands. However, the complexity
there is writing all the type lookup and walking of the fields. I started this and
realised that I was essentially implementing the expression parser over again.
I do think that long term it would make sense for these commands to be able to
handle fields, but short term, 80% of the value of the feature comes from expressions.
Summary
- I want to allow register fields in expressions.
- I know it can work but it has usability problems.
- I need your feedback on how to handle that.
- The prototype is here, it is surprisngly little code.
- Thanks for reading this far
Starter questions for you the reader:
- Is there any precedent for special (perhaps variant), types like these?
- What do you think our commitment is to existing expressions?
- Do you have existing scripting that uses register expressions, what are the costs associated with changing that?
- What would least surprise you as a user?
- As a user, how would you expect to âdiscoverâ this feature?
(perhaps we need a design doc / guide on how register information works overall in lldb)
Any other comments and questions welcome of course.