DWARF: Reconstituting mangled names (& skipping DW_AT_linkage_name)

The idea of encoding names more efficiently is a great idea. I would have no concerns if the following were true:

  • we could 100% always reconstruct linkages names if we need to
  • accelerator tables that are trusted by debuggers (.debug_names, or .apple_XXX) that used to contain linkage names still do after this change

The main reason for this is for the LLDB expression parser. When the expression parser needs to call a function, the interface we have with the JIT code in LLVM means we always lookup functions by linkage (mangled) name. So if the accelerator tables don’t have the mangled names inside of them, we will need to know how/when we would need to ignore the accelerator tables and manually index the DWARF each time you debug. Right now LLDB and GDB don’t trust .debug_pubnames or .debug_pubtypes because they don’t index everything. .debug_names has more struct rules on what needs to be included, so any solution should make sure we don’t change the contents of this section for a binary compiled with and without this new feature.

I like the idea of being able to refer to a string from the main string table of the object file (.strtab for ELF, or LC_SYMTAB in macho) if they already exist there, it would be interesting to compare the symbols that are in both the .debug_str and .symtab from one of these large C++ binaries just to see how much space we could save if we had a new for DW_FORM_symtab_str that could refer to this section.

Another idea would be to have a new attribute that relies on the parent DIE chain where each child would encode it’s partial mangled named. Something like DW_AT_linkage_prefix and/or DW_AT_linkage_suffix. Then you could traverse the parent DIEs to reconstruct the full linkage name.

So if we have

namepace foo {
class bar {
void print(const char *) const;
}
}

The DWARF could be something like:

DW_TAG_namespace
DW_AT_name(“foo”)
DW_AT_linkage_prefix(“_Z3foo”)

DW_TAG_class_type
DW_AT_name(“bar”)
DW_AT_linkage_prefix(“3bar”)

DW_TAG_subprogram
DW_AT_name(“print”)
DW_AT_linkage_prefix(“5print”)
DW_AT_linkage_suffix(" const")

DW_TAG_parameter
DW_AT_name(“format”)
DW_AT_linkage_prefix(“int”)

This might allow a lot more name sharing between templated functions since their function base names like “erase”, “begin”, “end” and many more could be shared in the string tables.