LLDB behaviour for GCed sections

Hi,

I’m currently investigating the behaviour of different debuggers when functions have been stripped by the linker because they are unused. I tried looking at the source code, but couldn’t really make enough sense of it to answer the question. Would someone be able to explain what LLDB’s behaviour is when it encounters a function in debug information with an address of zero (as is the case for lld and other linker output with --gc-sections)? In particular, does it simply ignore the relevant block of debug information, as appears to be the case for gdb? I’m coming at this from a DWARF perspective, if that makes any difference.

James

Hi James,

I performed a quick experiment with clang –g –ffunction-sections and link with –gc-sections with Hexagon DSP tools based on llvm3.5.

This shows an unused function as having DW_AT_low_pc as zero as you predict:

readelf –W extract of DWARF4 .debug_info)

<1><2b7>: Abbrev Number: 11 (DW_TAG_subprogram)
     DW_AT_low_pc : 0
     DW_AT_high_pc : 168
     DW_AT_frame_base :
     DW_AT_name : (indirect string, offset: 0x1bb): not_used
     DW_AT_decl_file : 1
     DW_AT_decl_line : 83
     DW_AT_prototyped : 1
     DW_AT_type : <45d>
     DW_AT_external : 1
     DW_AT_accessibility: 1

Yet lldb 3.5.0 and lldb 3.9 both appear to _not_ ignore that DWARF DIE.

(lldb) expression &not_used
(int (*)(unsigned char *, const unsigned char *, unsigned long)) $0 = 0x00000000

and

(lldb) breakpoint set --name not_used
Breakpoint 1: where = hexlto.elf`not_used + 20 at lto_test.c:92, address = 0x00000014

Line 92 would be the first statement after prolog in that source file - if the function were used - but address 0x14 is just wrong
as it is inside the start-up code. I assume lldb is compensating for the instruction address offset to the first statement in that removed function.

The command ‘disassemble –name not_used’ reveals the confusion - which makes it appear as though not_used() is present but does not begin with the
typical Hexagon allocframe instruction that I’d expect for C and instead shows startup code.

The newest llvm3.9 linker for Hexagon (an internal unreleased version here) sets the DW_AT_low_pc to a none zero value apparently far beyond the end of the .text extent which might be a hint
to a DWARF consumer or a mistake.

For a DW_TAG_subprogram would n’t it be more sensible for the linker to set DW_AT_low_pc and DW_AT_high_pc to the same value for garbage collected unreferenced code?
From the DWARF definition that the high_pc - when of class ADDRESS - is beyond the end of the subprogram extent - It then clearly has no machine code.
Or for high_pc to also be set to 0 -when of class CONSTANT - it then clearly has no machine code.

For data objects and a Harvard architecture address 0 is however valid, as weird as that may seem to C programmers. So for the Qualcomm Kalimba DSP and the XAP RISC CPU used in
Bluetooth devices our proprietary debuggers look for a specific address value that we hope is greater than we ever expect to be used as a real address of any memory attached to an embedded device.

As far as I know there’s no way for DWARF to convey that something has been optimised away except a DW_AT_location empty location list for a variable.

Of course, a linker should be able to remove the DWARF DIEs for unreferenced code and data when it garbage collects.
But I’ve not come across one that does this yet. I believe it is difficult for linker implementors because of the inter-section references and relative offsets in DWARF.

Not tried with llvm 4.0 or lld or gold.

David Earlam
Staff-Senior[Engineer]/Manager ? Software : Development Tools() {
Qualcomm Technologies International, Ltd.
.

Thank you for the effort put into this response. It’s interesting to see that there are problems with LLDB’s handling here. This might mean we need to have a wider discussion about how LLD does its --gc-sections and/or how LLDB treats this situation. From experience working with our proprietary linker and debugger, this issue is something that both sides have to agree on. It looks like we currently follow a similar policy to your tools.

Due to changes in how the compiler emitted DWARF information, we discovered some time ago that we cannot simply patch the high_pc field to be the same as the low_pc field, because the compiler started to use the high_pc field to indicate the size instead. Whilst this reduces the number of relocations the linker has to perform, it does make it difficult for the linker to patch the high_pc field reliably, without parsing the debug_info section, so it is necessary to come up with a different approach, hence the invalid address approach used in some cases.

Anyway, thanks for this once again.

Regards,

James

Thank you for the effort put into this response. It’s interesting to see that there are problems with LLDB’s handling here. This might mean we need to have a wider discussion about how LLD does its --gc-sections. From experience working with our proprietary linker and debugger, this issue is something that both sides have to agree on. We too ended up going with an invalid address to mark the low_pc field for discarded elements.

Due to changes in how the compiler emitted DWARF information, we discovered some time ago that we cannot simply patch the high_pc field to be the same as the low_pc field, because the compiler started to use the high_pc field to indicate the size instead. Whilst this reduces the number of relocations the linker has to perform, it does make it difficult for the linker to patch the high_pc field reliably, without parsing the debug_info section.

Regards,

James

We encounter low pc values of 0 quite often macOS, since we leave the DWARF in .o file and then use a debug-map to fix up the addresses (and remove unused functions). The first function in a .o file is often at 0. Just a data point in favor of choosing something other than a low pc of 0 to mean discarded.

Jim