I’ve been working on improving LLDB’s support for DWARF 5, and I’m hitting an
issue with the new debug_addr section. In particular, it seems like the
DWARFExpression class 1 can’t handle the extra level of indirection provided
by the new op DW_OP_addrx.
Let’s consider a variable with a single location in DWARF 4:
DW_AT_location [DW_FORM_exprloc] (DW_OP_addr <some_address>)
When LLDB needs to work on this variable, it creates a DWARFExpression on top
of a blob of data containing DW_OP_addr <some_address>. It may need to
perform two types of operations on top of this:
- Read the address (
DWARFExpression::GetLocation_DW_OP_addr) 2 - Update the address (
DWARFExpression::Update_DW_OP_addr(new_address)) 3
To update the address, it just replaces some_address with new_address in
that blob of data. This is all fairly straightforward.
Now consider what happens with the newly introduced DW_OP_addrx, which is an
index into an address table. For example:
DW_AT_location [DW_FORM_exprloc] (DW_OP_addrx <some_index>)
...
debug_addr:
0: addr0
...
<some_index>: <some_address>
...
The DWARFExpression class knows how to read these addresses correctly, but
it does not know how to update them (see 3). Intuitively, this makes sense: its
blob of data only contains an index into some other read-only table. As it
stands, LLDB can’t work with these variables if updating an address is required.
ELF files seem to dodge this issue because when LLDB reads the debug_address
section, the addresses there are already “correct” (relocated, etc); as such,
the “update address” method is never called. This is not the case for MachO object
files in general, we do call the update address method (they also dodge the issue
when using dSYM bundles, as dsymutil currently rewrites DW_OP_addrx into DW_OP_addr).
Assuming you are on platform using MachO, you can repro the problem with:
echo 'auto myvar = 42; int main(){}' | clang -gdwarf-5 -x c++ - -c -o obj.o
clang obj.o -o main.out
lldb --batch -o "b main" -o "run" -o "v myvar" main.out
I see two alternatives to fix the issue, both involving changes to
Update_DW_OP_addr:
- Make the
Update_DW_OP_addrmethod rewrite its blob of data so that it
also rewrites theDW_OP_addrxopcode intoDW_OP_addr. This would
effectively change the location of the variable to be different from what is in
the debug_info section, but it doesn’t seem to be a problem as far as I can
tell. I have a prototype here 4.
This could be expensive, since it involves copying buffers. However, note that: 1) We already do this forDW_OP_addr, 2) Variables that have anDW_OP_addr{x}are unlikely to have multiple locations, so these buffers are hardly ever bigger than
9 bytes. - Make
DWARFExpressionobjects carry a map of index_in_debug_addr →
real_address, which gets updated/read as needed byUpdate_DW_OP_addr/
GetLocation_DW_OP_addr. I have a prototype here 5. This works but seems a
bit overkill.
I would appreciate any thoughts / suggestions on this!