From a performance standpoint, we do win when things are in the accelerator tables. LLDB doesn't grub around in the DWARF unless our accelerator tables are _not_ there (in which case we must manually create the accelerator tables by reading all DWARF very slowly). If they are there, they are assumed to be valid and we have been trying to hold clang to that standard. clang must put the plain and mangled name in the DWARF for it to be emitted in the accelerator tables, so we have been slowly improving clang as much as we can to get the accelerator tables accurate as. So LLDB won't even look in the DWARF unless it sees something in the accelerator tables (the __apple_XXX accelerator tables that I create and Eric Christopher then implemented in clang and in the DWARF committee).
OK - so rather than incrementally improving this by having users file bugs about missing data, I'm trying to understand the underlying principles so we can strive to implement this correctly. (While also not including data we don't really need because people seem to care about debug info size)
Agreed. Our main concern with old DWARF accelerator tables is that they didn't accelerate anything, they provided a vague way to find the info you are looking for using the base name. Any additional information requires ingesting all DWARF from a compile unit in order to be able to reconstruct decl context information.
So our approach with the accelerator tables is to make them useful as accelerator tables. They share the strings in the .debug_str table for example unlike the old accelerator tables. They are pre-sorted, and they contain both the base name and the mangled name for all things we need to lookup.
Though I wasn't actually suggesting that the accelerator tables were related to this, just another example of size/perf tradeoff - if they are related, then that's another piece of the puzzle I would like to understand so I can better implement these requirements in Clang/LLVM.
So how does the linkage name relate to the accelerator tables?
We place both the base name (DW_AT_name) and the linkage name (either of DW_AT_MIPS_linkage_name or DW_AT_linkage_name) in the accelerator tables. This allows for lookups to happen from both the base name and also allows us to find the exact information given a linkage name. LLDB expressions use the MCJIT which often looks for things via linkage name. Also when users set breakpoints they sometimes might use a demangled name. We can then correlate this with the linkage name and set the breakpoint efficiently.
I haven't really looked into them at all, but I thought they only contained "public" names (externally visible),
They currently do and that is why they are less useful for debugging. They might be useful for a dynamic linker, but they really don't serve the needs debuggers where users will want to set breakpoints on any functions (internal, private, public, external, etc).
but maybe that's just a limitation/misfeature of the GNU pubnames stuff that will be address in the DWARF 5 accelerator tables feature/proposal.
Eric and I both believe that the new tables really do address and solve this issue in an efficient and effective way.
The benefits of the tables at a high level include:
- allow debuggers to use these and trust that a certain level of content are there
- they share strings in the .debug_str table so the strings can be used in .debug_info and all accelerator tables with minimal cost
- they can be mmap'ed in and used as is for efficient searches and don't require an initial parse + sort like all other DWARF tables do
- they are extensible so they can be used for any by name lookup for any kind of table
With LLDB we do many lookups:
- base name lookup
- fully qualified demangled name lookups (user entered breakpoint strings)
- mangled name lookups (from MCJIT and other JIT relocations)
Since DWARF is not a great format for partial parsing we use these accelerator tables to vastly decrease the amount of data we need to parse when doing lookups. The accelerator tables help us to quickly find something by base name, and then using the mangled/demangled name quickly eliminate it from contention before we need to pull in all the DWARF for a compile unit. This saves a lot of memory when debugging.
So our concern does include .o file size, but also includes how that information is later used when consumed by debuggers and symbolication tools.
Greg