I’ve been working on improving LLDB’s support for DWARF 5, and I’m hitting an
issue with the new debug_addr section. In particular, it seems like the
DWARFExpression
class 1 can’t handle the extra level of indirection provided
by the new op DW_OP_addrx
.
Let’s consider a variable with a single location in DWARF 4:
DW_AT_location [DW_FORM_exprloc] (DW_OP_addr <some_address>)
When LLDB needs to work on this variable, it creates a DWARFExpression
on top
of a blob of data containing DW_OP_addr <some_address>
. It may need to
perform two types of operations on top of this:
- Read the address (
DWARFExpression::GetLocation_DW_OP_addr
) 2 - Update the address (
DWARFExpression::Update_DW_OP_addr(new_address)
) 3
To update the address, it just replaces some_address
with new_address
in
that blob of data. This is all fairly straightforward.
Now consider what happens with the newly introduced DW_OP_addrx
, which is an
index into an address table. For example:
DW_AT_location [DW_FORM_exprloc] (DW_OP_addrx <some_index>)
...
debug_addr:
0: addr0
...
<some_index>: <some_address>
...
The DWARFExpression
class knows how to read these addresses correctly, but
it does not know how to update them (see 3). Intuitively, this makes sense: its
blob of data only contains an index into some other read-only table. As it
stands, LLDB can’t work with these variables if updating an address is required.
ELF files seem to dodge this issue because when LLDB reads the debug_address
section, the addresses there are already “correct” (relocated, etc); as such,
the “update address” method is never called. This is not the case for MachO object
files in general, we do call the update address method (they also dodge the issue
when using dSYM bundles, as dsymutil currently rewrites DW_OP_addrx
into DW_OP_addr
).
Assuming you are on platform using MachO, you can repro the problem with:
echo 'auto myvar = 42; int main(){}' | clang -gdwarf-5 -x c++ - -c -o obj.o
clang obj.o -o main.out
lldb --batch -o "b main" -o "run" -o "v myvar" main.out
I see two alternatives to fix the issue, both involving changes to
Update_DW_OP_addr
:
- Make the
Update_DW_OP_addr
method rewrite its blob of data so that it
also rewrites theDW_OP_addrx
opcode intoDW_OP_addr
. This would
effectively change the location of the variable to be different from what is in
the debug_info section, but it doesn’t seem to be a problem as far as I can
tell. I have a prototype here 4.
This could be expensive, since it involves copying buffers. However, note that: 1) We already do this forDW_OP_addr
, 2) Variables that have anDW_OP_addr{x}
are unlikely to have multiple locations, so these buffers are hardly ever bigger than
9 bytes. - Make
DWARFExpression
objects carry a map of index_in_debug_addr →
real_address, which gets updated/read as needed byUpdate_DW_OP_addr
/
GetLocation_DW_OP_addr
. I have a prototype here 5. This works but seems a
bit overkill.
I would appreciate any thoughts / suggestions on this!