[RFC] Improve Dwarf 5 .debug_names Type Lookup/Parsing Speed

Dwarf 5 .debug_names is one of the most important index tables for reducing lldb startup performance. While dogfooding and using it, we have identified some performance issues. For instance, there is a code path in lldb that takes 50-60 seconds to display local variables for a stack frame.

Investigation reveals that the bottleneck occurs when lldb is resolving the type for a type declaration within a compile unit. The key issue is that the current LLVM .debug_names implementation only emits the base name for a type. For example, when lldb attempts to resolve/complete a type declaration like company::folly::Optional<Foo>::StorageTriviallyDestructible , there can be numerous StorageTriviallyDestructible entries in .debug_names pointing to various split dwarf DWO files. Additionally, each DWO file may contain several template instantiations for company::folly::Optional<int> or company::folly::Optional<float> , and so on.

As a result, LLDB has to load thousands of DWO files and parse each Debugging Information Entry (DIE) and its parent chain to determine if the full type declaration context matches, leading to a significant slowdown.

To address this issue, ideally, we would like .debug_names to be capable of type lookup without the need to touch or hydrate DIEs in DWO files. There are three options to consider (with my brief comment for their pros/cons):

  1. Emitting fully qualified type names instead of base type names.
  • Pros:

    • This is the simplest approach to implement for both producers and consumers.

    • I heard that PDB uses a similar approach.

  • Cons:

    • It may increase the size of the string table.
  1. Generating a type hash (DW_IDX_type_hash) for fully qualified type names.
  • Pros:

    • Very quick and straightforward.
  • Cons:

    • Does it align with the semantics of DW_IDX_type_hash?

    • There might be a potential increase in compile time due to hash calculation.

  1. Emitting the parent type declaration chain via DW_IDX_parent.
  • Pros:

    • No risk of violating the Dwarf specification.
  • Cons:

    • Could this be too slow (for consumer) for a large .debug_names table?

We welcome your thoughts on these options. cc @dblaikie, @ayermolo, @clayborg

1 Like

So the name-without-template-parameters happened as a part of support for -gsimple-template-names which was motivated due to DWARF size (in Google’s particular case, putting the full names in the string debug info in the DWP was exceeeding 4GB/32 bit offsets were overflowing in .debug_str.dwo)

Since it seemed suitable/important to have the index contain the same names as the DWARF - the simplified names were also added to the index. (this also, yes, would reduce .debug_names size - putting the full names in there could be quite expensive)

But, also/relatedly - the names of template instantiations aren’t precise. The simplest example is that t1<int, int> and t1<int,int> are the same - and the index could only contain one of these, but the user might write the other. This gets worse with extra parentheses, casts, and other ways to write the same template instantiation name.

We do already have type hashes/the mangled name of the type - but it has the problems of canonicalization above.

If the DW_IDX_parent is enough to get adequate performance it might be the best option. Though is it likely to be enough? I’d have thought most of the perf problems come from multiple instantiations of the same template - so they’ll all have the same parent chain, so this might not help improve performance by much?

For the Apple accelerator tables we include the 32 bit has of the fully qualified type name. It was unclear from reading the DWARF spec if DW_IDX_type_hash is supposed to be the 32 bit DJB hash of the fully qualified type name. If it is, then I would vote for this solution as it stops LLDB from having to even parse the main compile unit or type unit DIE in the first place.

If we want to lookup the type for “std::vector::iterator”, the only thing in the accelerator tables is by type base name “iterator”, so we have to check for any DIE that matches “iterator”. So the flow for Apple accelerator tables is:

  • uint32_t type_hash = djb_hash(“std::vector::iterator”)
  • lookup “iterator” and get many many results from every STL type in the universe
    • filter out any accelerator table entries whose type hash doesn’t match
  • only parse a few types that actually match

This saves us from parsing all of the DIEs for any compile units or type units that have matching entry, and it also stops us from having to convert the DWARF into clang AST types. The result is a huge memory savings from not having to parse all of the DIEs in all CUs/TUs that contain “iterator”.

The flow without this feature goes:

  • lookup “iterator” and get many many results from every STL type in the universe
  • convert all of the above matches into clang AST types
  • filter out the types whose fully qualified names don’t match after they have all been created

The spec says, in non-normative text (& without actually using the DW_IDX_type_hash name to refer to it, which makes it a bit hard to find):

The type hash attribute, not to be confused with the type signature for a TU, may be provided for type entries whose declarations are not in a type unit, for the convenience of link-time or post-link utilities that wish to de-duplicate type declarations across compilation units. The type hash, however, is computed by the same method as specified for type signatures.
Which means, no, it isn’t the hash of the fully qualified name. Though LLVM uses the hash of the mangled name for type units anyway, not the DWARF-spec’d type unit hash. We could put that in the hash table too, but it wouldn’t be all that much use - since it’s the mangled name, not the user-visible/‘pretty’ name.

That said - using a hash of the pretty name seems rather problematic due to the canonicalization concerns I mentioned above - if a user writes std::vector<int >::iterator the extra space would throw off the hash (& even efforts to canonicalize this wouldn’t be perfect or would be very involved - knowing where extra parens can be removed, whether to represent something as an enum constant or a cast of an integer constant (& having to map between those two things to implement such canonicalization). Not to mention, std::vector has some other template parameters, so its name is really std::vector<int, std::allocator<int>>::iterator (& then you get the >> canonicalization problem too)

Given that class members can’t be used unqualified anyway (except in the context of the class - in which case you’ve already got the right context and wouldn’t need an accelerator table to tell you anything) - would it not be more general & make for a more compact index, while admittedly being somewhat less efficient (but probably not “build the AST of every unit with an entity named ‘iterator’ inefficient”) to look up “vector” in the accelerator table, find every instance (again, this is the cost of non-canonicalization) and look for the ones that match the template parameters the user wrote - then look for an “iterator” inside that/those instances?

I am not so concerned with what the user types with pretty names, but more of what is in the DWARF when doing lookups for the expression parser.

If we have some string that has the qualified name then we can avoid pulling in all of the DIEs from each CU/TU just to find out we don’t need to actually parse the type. Any other solution means we will parse all of the DIEs in the CU/TU so we can get more context on the type so we can know if we want it or not.

Given that class members can’t be used unqualified anyway (except in the context of the class - in which case you’ve already got the right context and wouldn’t need an accelerator table to tell you anything) - would it not be more general & make for a more compact index, while admittedly being somewhat less efficient (but probably not “build the AST of every unit with an entity named ‘iterator’ inefficient”) to look up “vector” in the accelerator table, find every instance (again, this is the cost of non-canonicalization) and look for the ones that match the template parameters the user wrote - then look for an “iterator” inside that/those instances?

There is no “vector” anywhere in the accelerator tables (or in the DIE names). They are always specialized instances with template parameters (“vector<int>”), unless the simple template names option is enabled.

When enabling simple template names, there is code in LLDB to ignore any matches to something like “vector” if we actually do get a match because if we end up making a type from the “vector” match, then we return a specialized instance of vector<int> and it messes up the expression parser. If the expression parser is parsing text like std::vector<int>() the expression parser will lookup “std”, get the namespace, then say “within the decl context of ‘std’ please find ‘vector’” and then this will fail in LLDB due to lack of accelerator table entries as there is no simple “vector” class that represents just a “vector”. And if we wanted to find “vector” from a standard accelerator table, we would need to manually parse all accelerator table names and manually chop them up in case we would need to find “vector” in type matches. There is a Google Stadia patch I would like to try and get into LLDB at some point that fixes this issue, though I haven’t fully looked at the patch:

I do have a type lookup patch that unifies how all type searches are done. Right now we have 2 APIs users can use to lookup types, and this patch simplifies this down so users can specify the decl context in a simple way that that I need to revive and get into LLDB soon:

https://reviews.llvm.org/D137900

This patch makes type lookups a bit more efficient where any matching entry from the accelerator tables will need to have the DIE tree for the containing CU/TU parsed, but it can filter out non matching entries before we actually make clang AST types from them.

But it would still be nice to be able to filter out any non matching entries at accelerator table access time if possible so we can avoid parsing any of the unneeded DWARF DIEs up front. Not sure what the right approach is to make that happen as you have pointed out that the pretty names can be problematic. Does this issue go away if we assume the pretty name is created by traversing the DWARF decl context parent DIEs or if we ask the compiler for the qualified name at compile time when making the accelerator table entry?

I see the markup is swallowing some of the template parameters we both probably typed, so I will keep that in mind when reading from now on!

Not to change the subject too much, but what is the status of DWARF6? Did your proposal for enabling 64bit in DWP get accepted? To put it another way, is it at the stage where we can implement it in llvm?

Hmm, not sure I follow - where would these strings appear in the DWARF/how would they be used for lookups into the accelerator tables? Like if I were debugging something inside a member function of a std::vector instantiation and I wanted to print out a variable that was of the local iterator type - that wouldn’t involve using the accelerator table, would it? That’d be able to be found directly using the DWARF from the current context, I’d have thought.

Ah, right, sorry - I was assuming simplified template names - both because it’s what we use at Google (& figured you folks at Meta might be exploring it, given you’ve hit similar scaling issues with DWARF, by the sounds of it) and I have some hope/desire that we could move towards that model for DWARF by default/more generally, because it is quite expensive to store these full names (& their lack of canonicalization makes harms their applicability as well)

Hmm, I’ve missed a step here - are you describing the state of things now, or the state of things if we made some change to the accelerator tables?

Because as it is today, my understanding is that the simplified names are still used for index lookups - but they’re ambiguous. Are you saying they aren’t used and then what? The index is ignored and manual parsing is done again?

When you say “standard accelerator table” - what are you comparing/contrasting that to? Do you mean a .debug_names accelerator table, but without simplified names?

I don’t think it does go away in either case - in simple template names, traversing just the parent tree wouldn’t produce all the template parameters anyway (“iterator” would have a parent named “vector” with a parent named “std”).

But even if it’s an unsimplified name (either because you’re building without simplified template names, or because it’s simplified but we put the unsimplified name in the accelerator table) - there’s still the issue of canonicalization. The name coming from DWARF isn’t necessarily going to match the way a user spells it… - so I’m not sure how having template argument lists in the accelerator table can lead to a good user experience. It’d require the user to spell the names exactly the same as the compiler, punctuation and all.

But you mentioned up top you’re concerned about something else/not human spellings and I’m still a bit confused by that - perhaps we can dig into that further.

Ongoing. I doubt it’ll stabilize/release for another couple of years.

Yep: DWARF Issue: .debug_{c,t}u_index missing/incomplete DWARF64 support

Maybe? I believe in the past there had been some drama about implementations jumping the gun on implementing the standard. There are real concerns about compatibility - we’d have to be totally clear that there’s no expectation of compatibility day-over-day of any features implemented based on an incomplete DWARF spec.

Perhaps we could have -gdwarf-next that’s unstable/not guaranteed/etc (like use it only with a particularly matching debugger/tools, do not archive any object files/expect them to work with newer tools or expect tomorrow’s compiler to work with yesterday’s tools, etc) - not sure if that’s a useful feature for anything but experimentation, though. Like if you ship that to users, how do they use it reliably? (Oh, I have a crash in a production binary from last month - which debugger do I use to debug it?)

& the issue around .debug_names and DWP files looks like it’ll involve further index work (unless I can convince you folks that only the primary type needs to be in the index - which seems questionable/unlikely) - and that change might be a bit more invasive than the 64 bit one, in some ways - more structurally complicated.

No it wouldn’t cause any accelerator tables.

Yes indeed. Local variables don’t ever use the accelerator tables. Only expression parsing. Expression parsing in LLDB uses clang and an external AST source to do all of the lookups as they are needed.

Yes this does reduce string table sizes when enabled. Simple template names are not great for type manual type lookups by exact name though, not expression parsing here, just a simple type lookup by a user.

The state of things now. Simple template name accelerator table entries get ignored because they mess things up compared to normal DWARF.

No just ignored because they don’t match normal DWARF. The simple template names interfere with expression evaluation as vector would match any vector<T>

yes that is what I mean. These compiler features often get enabled without knowing if they affect debugging and they often do, so we have to learn to work around any
features that might get enabled.

I am just talking about how types are requested during expression evaluation for clang. First “std”, then “find ‘vector’ inside of ‘std’”, etc.

During type lookups for the expression parser, we often know the decl context that must contain a type like “iterator”, so we know we are looking for std::vector<int>::iterator, but when we do lookups for this type using accelerator tables, we must use the basename iterator and filter out results that don’t match (filter out std::*<T>::iterator where * are all STL classes but keep any whose decl context matches std::vector<int>. If we can’t quickly remove matching accelerator table entries somehow (we can with Apple accelerator tables), we must parse all of the DWARF (parse all DIEs in any matching CUs/TUS) and we currently make clang AST types out of each “iterator” match from all STL classes, and then throw away things by checking the decl context of the matching types that were produced.

So we made the Apple accelerator tables allow us to remove any unneeded entry matches without having to parse ANY dies from any CUs/TUs whose decl context doesn’t match std::vector<int>.

Do you have a small/standalone example?

Hmm, that’s not my understanding - again, perhaps a small/concrete example would be helpful to discuss?

(my understanding was we did the work to address -gsimple-template-names usage in lldb: [DWARF] using simplified template names - #13 by dblaikie and ⚙ D134378 [lldb] Support simplified template names )

Hmm - how does that square with the work done in D134378? Was that work incomplete? I’d like to know more about how to ensure this is working well.

I don’t believe that was the case with -gsimple-template-names - we did make an effort to ensure it works with lldb.

Once you reach a class type, though, like std::vector<int> would it be feasible to search inside the DWARF description or Clang AST of that type, rather than consulting the context-free index? (I understand this wouldn’t be the right choice for a namespace, like std, but for class members, it’s a bit more bounded - and “members of std::vector<int>” is probably a smaller list than “all entities named iterator”, for instance)

FYI internally we do not use -gsimple-template-names. Yet…
Unless option is enabled by default somehow…

Regarding enabling experimental DWP DWARF6 64bit support. Should I start a different RFC?

For environments where toolchain/compiler/debugger is controlled for the users it could be a good option. :slight_smile:

Nah, it’s not enabled by default as yet. But it’d be good to get there at some point. (It’d be nice to also get it to the point where there aren’t a bunch of opt-outs - cases where DWARF doesn’t carry enough info to reconstitute full type names)

Yeah, probably.

It’s a pretty tight situation, though - if we want to be able to break backcompat (eg: if we end up needing to change the index again for .debug_names+type units) then you can’t have any archival debug info. For instance, that wouldn’t work at Google for production builds - we’d still need to be able to debug 6 month old programs, for instance.

Though Google’s also still slightly supporting debugging with gdb, so we probably wouldn’t move to an experimental index just yet - be a harder sell to have gdb adopt it too, etc. (& we’re on older gdbs/aren’t updating much/at all)

We could carve out a part of the index version for our own uses, or put it in a different section for now, with /some/ backcompat guarantees.

But, yeah, if it’s purely experimental/no backcompat guarantees at all, it might be workable.

But, yeah, another thread to discuss all these issues would be good.

1 Like

Example that doesn’t work with the expression parser, with or without -gsimple-template-names:

template <class T> class A {
    T m_value;
 
public:
    A(T v) : m_value(v) {}
};

int main(int argc, const char **argv) {
  A<int> a(12);
  return 0;
}

Then compile and debug:

$ lldb a.out
(lldb) b /return 0/
(lldb) run
(lldb) expr A<int> b(12);
error: <user expression 0>:1:1: use of undeclared identifier 'A'
    1 | A<int> b(12);
      | ^

This is a known issue with our expression parser due to the way lookups are currently done.

Sorry for the late response.

I have spent some time prototyping and investigating the DW_IDX_parent approach to the decl context parent chain (option 3 as discussed in the post). The most significant performance concern arises from the consumer/lldb side, and I’ll explain it in more detail below.

According to the specification, DW_IDX_parent can represent either the parent pool entry index (the “Index of the name table entry for the parent”) or the parent pool entry offset (“This is represented as the offset of the entry relative to the start of the entry pool”). It might be a bit confusing which one to use, but it doesn’t matter for explaining the performance issue.

When a consumer locates a name entry, it needs to traverse the parent chain via DW_IDX_parent to construct the full decl context. For each DW_IDX_parent, it must quickly find two pieces of information: 1. the parent pool entry, and 2. the parent pool entry’s name. Since the second piece lacks a lookup table (the name index is not designed for this kind of reverse lookup), the consumer has to create a mapping table from pool entry to name entry to obtain the name. As some parent pool entries can be nameless (e.g., types inside an anonymous namespace), the consumer has to scan through all pool entries and cross-check with the name entries’ range to build this mapping. The net result is that any potential type match in a name index will require all pool entries in that index to be indexed.

Based on this analysis, it appears that option 2 (using DW_IDX_type_hash) would be a simpler and faster design. Please let me know if this explanation makes sense.

I was just independently investigating the exact same issue pointed by the OP here, and was also looking into the DW_IDX_parent solution. Glad to have found this discussion.

it must quickly find two pieces of information: 1. the parent pool entry, and 2. the parent pool entry’s name.
the name index is not designed for this kind of reverse lookup

@jeffreytan81 I don’t think we need to perform the reverse lookup here. We know the parent’s name (since we have the entire parent chain), so we know its hash in the table, and therefore we know which offsets have that name. If the DW_IDX_parent is one such offset, we have a match.

DW_IDX_parent can represent either the parent pool entry index […] or the parent pool entry offset

I agree it doesn’t matter much, but on page 141, lines 22-24, it says that:

Parent debugging information entry, a reference to the index entry for the parent. This is represented as the offset of the entry relative to the start of the entry pool.

More generally though, I think the bigger problem of template / simple / canonical names remain: when we are looking for A::B::C, the spelling of “B” may be different in the table and in what LLDB is querying (see the template examples provided in this thread). Right now I don’t see a way to reconcile DW_IDX_parent with the template canonical name problem, unless we can guarantee that LLDB’s context query will use the same spelling used in the table.

To answer @dblaikie question from an earlier post:

If the DW_IDX_parent is enough to get adequate performance it might be the best option. Though is it likely to be enough? I’d have thought most of the perf problems come from multiple instantiations of the same template - so they’ll all have the same parent chain, so this might not help improve performance by much?

This should be enough because the slow part is parsing the DIEs in the debug_info section for the false hits. If we can resolve everything exclusively through the accelerator table – without parsing DIEs – then things would be fast even with multiple instantiations of the same template.

@Felipe,

since we have the entire parent chain, so we know its hash in the table, and therefore we know which offsets have that name

That may not be accurate. The index table consists of two types of entries: name table entries and pool entries. One name table entry can have multiple pool entries (multiple types can share the same base name). Therefore, DW_IDX_parent should point to a pool entry (the second type), not a name table entry. Otherwise, it would be impossible to determine the specific pool entry that DW_IDX_parent is pointing to.

The issue with the current implementation in LLDB is that “pool entries” are lazily parsed and do not have a backward pointer to their corresponding “name table entry.” As a result, the pool entry only contains the tag and lacks the “name” information. To address this, LLDB will need to eagerly parse all pool entries to create a reverse mapping from pool entry to name table entry (i.e., <pool entry => name table entry>).

To add more complexity, the parent pool entry can be nameless (see the specification section below) which can’t be indexed from named table entry. So we will have to flatten all pool entries scanning to fully build the parent chain:

It is possible that an indexed debugging information entry has a parent that is
not indexed (for example, if its parent does not have a name attribute). In such a
case, a parent attribute may point to a nameless index entry (that is, one that
cannot be reached from any entry in the name table), or it may point to the
nearest ancestor that does have an index entry.

Hope this makes sense.

Therefore, DW_IDX_parent should point to a pool entry (the second type), not a name table entry.

Yup, this makes sense. But consider this example, where we are looking for A::B and do the following queries:

  1. Query the table for B, let’s say it gives us a single result, and its DW_IDX= pool_entry_B_parent (a pool entry).
  2. Query the table for A, let’s say it gives us two results: pool_entry_A1 and pool_entry_A2. If either pool_entry_A1 == pool_entry_B_parent or pool_entry_A2 == pool_entry_B_parent, we know that pool_entry_B_parent has the name A.

This precludes the need for parent name information. Does that make sense?

To add more complexity, the parent pool entry can be nameless

Would that be a concern for us though? Whenever we are doing type lookup, we have names for everything (A::B::C). If we found something on the accelerator table without a parent – or a nameless parent – we know this is not the entry we are looking for, no?

, I think the bigger problem of template / simple / canonical names remain

Even if we resolve the parent name issue, I still don’t see how we get around the template names problem, either with IDX_parent or IDX_type_hash. Did you have a plan for those?