The idea of encoding names more efficiently is a great idea. I would have no concerns if the following were true:
- we could 100% always reconstruct linkages names if we need to
Yep, that’d certainly be the plan. (well, that any place where we omit a linkage name in the DWARF could be reconstructed - we can always keep linkage names in places where the DWARF isn’t expressive enough to produce all teh info required for the linkage name).
- accelerator tables that are trusted by debuggers (.debug_names, or .apple_XXX) that used to contain linkage names still do after this change
Sure - gets a bit trickier in the LLVM IR but do-able. (some way to specify that the pretty name (for my other proposal about simplified template names) and/or linkage name (for this proposal) are only present, or only qualified with template parameters, for accelerated access and not for the DIE attributes)
The main reason for this is for the LLDB expression parser. When the expression parser needs to call a function, the interface we have with the JIT code in LLVM means we always lookup functions by linkage (mangled) name. So if the accelerator tables don’t have the mangled names inside of them, we will need to know how/when we would need to ignore the accelerator tables and manually index the DWARF each time you debug. Right now LLDB and GDB don’t trust .debug_pubnames or .debug_pubtypes because they don’t index everything. .debug_names has more struct rules on what needs to be included, so any solution should make sure we don’t change the contents of this section for a binary compiled with and without this new feature.
I like the idea of being able to refer to a string from the main string table of the object file (.strtab for ELF, or LC_SYMTAB in macho) if they already exist there, it would be interesting to compare the symbols that are in both the .debug_str and .symtab from one of these large C++ binaries just to see how much space we could save if we had a new for DW_FORM_symtab_str that could refer to this section.
Yeah, that should be pretty close to the numbers I’ve seen - I mean, not every linkage name is in the symtab - because we have linkage names for fully inlined functions, which wouldn’t be in the symtab.
But I also have ideas of removing the linkage names from the symtab too - well, depending on how you think about it, maybe changing the mangling from itanium to a hashed name. Then there’s an interesting question of what a given consumer wants when they talk about the linkage name - if they want the name of the ELF symbol, that’ll be correct, but if they want something that can be demangled, they would need a different name.
Another idea would be to have a new attribute that relies on the parent DIE chain where each child would encode it’s partial mangled named. Something like DW_AT_linkage_prefix and/or DW_AT_linkage_suffix. Then you could traverse the parent DIEs to reconstruct the full linkage name.
So if we have
namepace foo {
class bar {
void print(const char *) const;
}
}
The DWARF could be something like:
DW_TAG_namespace
DW_AT_name(“foo”)
DW_AT_linkage_prefix(“_Z3foo”)
DW_TAG_class_type
DW_AT_name(“bar”)
DW_AT_linkage_prefix(“3bar”)
DW_TAG_subprogram
DW_AT_name(“print”)
DW_AT_linkage_prefix(“5print”)
DW_AT_linkage_suffix(" const")
DW_TAG_parameter
DW_AT_name(“format”)
DW_AT_linkage_prefix(“int”)
This might allow a lot more name sharing between templated functions since their function base names like “erase”, “begin”, “end” and many more could be shared in the string tables.
Yeah, that doesn’t capture the majority of the cost I’m dealing with - where there’s lots of complexity due to various very complicated template parameters.