[RFC] Improve Dwarf 5 .debug_names Type Lookup/Parsing Speed

That might present a challenge by introducing an ambiguity between an entry that has been emitted without any parent data, and an entry that has no parent/is at the root of the DIE tree? So we know when things are at the root of the tree, we might have to emit empty parent entries, or parent entries that reference the DW_TAG_compile_unit to disambiguate the two situations.

ambiguity between an entry that has been emitted without any parent data, and an entry that has no parent/is at the root of the DIE tree?

IIUC we should never have a DIE that is at the root of the tree, since TAG_compile_unit and TAG_type_unit don’t fall under the description of:

The name index must contain an entry for each debugging information entry that
defines a named subprogram, label, variable, type, or namespace:

Even if root DIEs did show up, I think the important point is to not be ambiguous about:

  • Is this parentless because it has no indexed parent? (this includes the root DIE case, as well as top level functions, types, namespaces, etc)
  • Is this parentless because the table does not index parents?

We need to answer that question in order to know whether we can do a fast search or a slow search.

In the prototype implementation, I assumed that if a table has any IDX_parent in some abbreviation table, then the entire table was created with fully populated parents. Under this assumption, we would be able to consider entries that have no IDX_parent as entries whose parents are not indexed.

If we cannot make this assumption (e.g. because of a linker combining different kinds of accelerator tables), my suggestion to save data is to use a different kind of abbreviation entry for parentless nodes, i.e., something like {dwarf::DW_IDX_parent, dwarf::DW_FORM_flag_present}. This would mean: “the parent information is present but inexistent, i.e., this doesn’t have an indexed parent”, and it would cost zero bytes per entry.

To give an idea, the object file for APInt.cpp has 1432 accelerator table entries, 830 of which do not have an indexed parent.

This sounds pretty good to me. Bit of a flexible use of DWARF, probably want to propose it for standardization, perhaps - but sounds pretty plausible.

& you make a good point about linked/merged indexes - even if we could use some hint about whether a single index has parents or doesn’t (so if it has parents, the absence of a parent means the node has no parent, not that the parent information has been omitted) - it wouldn’t apply to a merged index. So we need a per-entry parent-absent disambiguator.

I think I am ready to open all the PRs for this work and get feedback.

With the previously discussed idea of using a DW_FORM_flag_present, the debug_names size increase is reduced to 19% (down from 30%). Total object size is now increased by 0.62% (down from 0.92%).

The complete PR is here

* 6f62cfd1b509 [lldb][DWARFIndex] Use IDX_parent to implement GetFullyQualifiedType query
* 9ca21052af4b [lldb][DWARFUnit] Implement PeekDIEName query
* 58b6d9b4038c [llvm][DebugNames] Implement Entry::GetParentEntry query
* b99113215416 [AsmPrinter][DebugNames] Implement DW_IDX_parent entries
* fa08a9dbc97d [lldb][[DWARFDeclContext] Add function to extract qualified names as vector
* 683cfd5217e5 [lldb][DWARFIndex][nfc] Factor out fully qualified name query
* 8ec587dc1414 [AccelTable][nfc] Add helper function to cast Data

Sadly I am pretty lost in Github-land when it comes to stacked patches, so the idea is to open PRs as dependencies are merged.
The bottom two commits already have PRs.

The most important commits are:

* b99113215416 [AsmPrinter][DebugNames] Implement DW_IDX_parent entries
* 6f62cfd1b509 [lldb][DWARFIndex] Use IDX_parent to implement GetFullyQualifiedType query

The description of that first commit is fairly detailed.

The first of the main PRs is up:
[AsmPrinter][DebugNames] Implement DW_IDX_parent entries #77457