[DebugInfo][DWARFv5][LLD] .debug_names with fdebug-type-sections

Hello

I am trying to enable debug names acceleration table with fdebug-types-sections. One part I am not sure about is the local TU list. It contains an offset into .debug_info section. All the entries have an index entry that points to the local TU list. DIEs within entry offsets are relative to the TU entry.

Linker de-duplicates Type Units using COMDAT. So final result will have less type units. So Local Type Unit List will be invalid.
Am I missing something linker, specifically LLD, will need to be aware of context of .debug_names sections when it de-duplicates type sections?

There is also Foreign TU List, which has signature instead of offset, but that is for split dwarf. Maybe this is he way to go to implement type units?

Thanks.
@dblaikie @MaskRay

The more I look at it the more I think the best way to implement support is to always use the foreign type units. The Entry will have a Type hash. If it points to just TU then entry is in current object file, if it points to CU also then this is a split dwarf case. Not sure if this is abusing the spec.

" When an index entry refers to a foreign type unit, it may have attributes for both CU and (foreign) TU. For such entries, the CU attribute gives the consumer a reference to the CU that may be used to locate a split DWARF object file that contains the type unit."

Either that or abandon the whole idea and do it in post build step: llvm-dwarfutil has partial support for debug_names accelerator table. Although considering it lifts everything to IR, might be a bit slow for production purposes.

@clayborg WDYT form LLDB perspective?

I think the opposite direction might be needed/suitable - that .debug_names should refer to the type stubs in .debug_info.

The reason is this:

Types in type units are canonical, but don’t contain all the features of a type - for instance, they can’t contain, consistently, any nested types (those types might not be defined in every translation unit the type is defined in), member function templates (similarly, you might instantiate some instances of such a template in some translation units, and some in others), and implicit special members (similarly, they may be instantiated only in some translation units)

So if a DWARF consumer wants to reconstruct the totality of a type (eg, they want to evaluate TypeName::func<int>()) - then they’ll need to search for all type stubs.

This also means it’s simpler/the same whether using type units or not.

(it does mean the index is bigger, since it’ll need to mention every place a type appears - whereas if it referenced the type unit directly, indeed, there would only need to be one result for the given term in the index)

Either that or abandon the whole idea and do it in post build step: llvm-dwarfutil has partial support for debug_names accelerator table. Although considering it lifts everything to IR, might be a bit slow for production purposes.

Yeah, that’s no good for us (Google) - we don’t want to have to parse all the DWARF again to make an index to get acceptable debugger performance. We’re likely interested in something like the existing .debug_gnu_pubnames/types → .gdb_index solution - where individual .debug_names tables are put in .o files, then the linker can do a smart/DWARF-aware merge of these indexes, or it could even do a non-aware traditional linking and get something like the performance of Apple’s situation with .o files (where each .o has its own little index - except in this case allr those little indexes would be side-by-side in the executable, saving time/avoiding having to go out and pull all the .o files over a network filesystem before queries could be run).

Not sure what you mean by stubs in .debug_info. Can you elaborate please?
I asked @clayborg to give his perspective from LLDB point of view.
I agree post build tool that parses all of debug info is not good. Ideally linker can do something.
I guess that would be question for @MaskRay (anyone else?) on what level of involvement of LLD should have in supporting .debug_names.

(here’s the thread I was talking about, @cmtice )

@dblaikie
When you refer to “type stubs” are you referring to these kind:

0x0000004e:   DW_TAG_structure_type [10]   (0x0000000c)
                DW_AT_declaration [DW_FORM_flag_present]  (true)
                DW_AT_signature [DW_FORM_ref_sig8]  (0x104ec427d2ebea6f)

Right, sorry for the delay - been swamped with email after getting back from vacation and working through at least some of the pull request migration fallout.

Yes, those are the stubs I’m thinking of.

So breaking this down:

Non-Split DWARF:

  • Yes, the local type unit list would be affected by linking in one of two ways:
    • local type unit references could become dead (like references to linker-gc’d functions)
    • local type units could end up shared (sometimes this happens to comdat functions, if their definitions are identical)

I think either of these things are acceptable, and should happen without linker-special casing. The dead case should be handled by the usual debug info tombstoning rules - a relocation to a gc’d section gets replaced with a tombstone value & consumers should be ready to handle that.

For Split DWARF it’s more problematic because, as you say, you need a CU and a TU to find a type unit when a dwp isn’t built (& at compile/link time you don’t know, so you’ve got to err on the side of caution & make it usable even if the user is debugging from dwo files directly). But, yeah I think that could be addressed by using DW_IDX_type_unit and DW_IDX_compile_unit on any type unit references into Split DWARF. The foreign TU list wouldn’t get cleaned up - it’d continue to contain all the hashes, none would be tombstoned, etc (because the linker doesn’t observe the DWARF linking happening in the dwp, if it happens at all)

But, yes, the question of how to reference type units is a valid one - I think for Clang’s debug info, at least, we always produce a stub type description, which may have extra entities in it, etc. So we might as well, whenever there’s a type stub (which is always for LLVM), reference that - and not use the type unit mechanisms of .debug_names at all. This’ll make the index a bit larger than if we referenced type units directly (well, when the .debug_names is merged by a smart linker/or otherwise unified index is produced) - but is more comprehensive/useful for consumers to find all these possible features of a type. If a consumer finds a type unit, it could then ignore all the other versions of a type if it wanted to.

(I guess, arguably to find all these extra bonus features that aren’t in the canonical type, for Clang’s debug info (with type homing, -fno-standalone-debug) you’d need to include type declarations with features to be in the index, which isn’t valid according to the spec - so I guess we’ll ignore that case)

If we didn’t want to reference the type stubs - then, for non-split DWARF, I think referencing the type unit, using a relocation (same as we must be doing for CUs) for the TU offset, and letting that become tombstoned by the linker if the linker isn’t doing a DWARF-aware .debug_names merge. And for split DWARF, no relocations required, just using foreign TU hashes and merged on that basis.

Does that make sense?