Implementing a DWP tool in LLVM

Much like the recent efforts to provide a port of dsymutil in the LLVM project, I’m looking at providing an implementation of the Fission/Split DWARF DWP tool ( https://gcc.gnu.org/wiki/DebugFissionDWP ) in LLVM.

While there’s potentially some overlap between the two tools, I’m thinking of keeping them separate at least initially since much of the debug info doesn’t need to be touched by a DWP tool, unlike dsymutil.

Basically all the tool needs to do is concatenate (or deduplicate, in the case of type units) the sections and apply a few domain-specific relocations, but that doesn’t include having to read the DIE tree in debug_info.dwo (only the headers). The other thing is to build a couple of indexing data structures to allow fast lookup of CUs and TUs.

Likely I’ll start with:

  • adding llvm-dwarfdump support for the DWP indexes
  • basic prototype of llvm-dwp just concatenating sections
  • handle each of the domain specific relocations in turn
  • abbr_offset
  • debug_str_offsets.dwo entries
  • type_unit’s DW_AT_stmt_list
  • references to debug_loc.dwo from debug_info.dwo
  • this one, at first blush, makes me particularly sad, as it’ll involve actually walking all the DIEs in any CUs (stmt_list isn’t great either, but at least that’d only be the header - same for accessing the signature for the CU, it’s always in the root DIE)
  • deduplicate type units
  • add CU/TU indexes
  • DWP merging (being able to read existing indexes and merge those into larger indexes)
  • possibly support the thin DWP mode

Does this all seem feasible/plausible/reasonable to do in LLVM? Any particular points of contention/interest/clarification?

It’s possible at some point in the future it might be nice to share the type merging logic of dsymutil (type units make sense when you don’t have a debug aware linker - but they do have unfortunate overhead which would be nice to avoid, if possible), in which case there might be some code sharing opportunity. But the two tools are still going to be fairly different in their purpose/handling (dsymutil has to get the address mappings and update all of that, DWP won’t have to deal with code addresses, etc).

  • Dave

SGTM. This will bring us closer to the point when we can write tests, where we strip out the .dwo files from executables, package them together with llvm-dwp, and then verify that we still get all we need from llvm-symbolizer.

I didn’t fully understand the part about walking DIEs to patch references from .debug_info.dwo to .debug_loc.dwo: shouldn’t their values stay the same, as they will be treated relative to the value of DW_SECT_LOC offset?

SGTM. This will bring us closer to the point when we can write tests,
where we strip out the .dwo files from executables, package them together
with llvm-dwp, and then verify that we still get all we need from
llvm-symbolizer.

Not quite following here - dwo sections are already stripped from .o files
and should never appear in executables.

llvm-symbolizer is tested against .o files that contain no dwo contents,
right? Or is the dwo/dwp information optionally used in some way?

I didn't fully understand the part about walking DIEs to patch references
from .debug_info.dwo to .debug_loc.dwo: shouldn't their values stay the
same, as they will be treated relative to the value of DW_SECT_LOC offset?

Ah, right you are - hadn't spotted that bit. I guess this'll all become
more clear to me as I implement dumping support for the indexes and can
start to look at some examples of the behavior of the existing dwp tool.
(thanks muchly for the pointer there)

- Dave

SGTM. This will bring us closer to the point when we can write tests,
where we strip out the .dwo files from executables, package them together
with llvm-dwp, and then verify that we still get all we need from
llvm-symbolizer.

Not quite following here - dwo sections are already stripped from .o files
and should never appear in executables.

llvm-symbolizer is tested against .o files that contain no dwo contents,
right? Or is the dwo/dwp information optionally used in some way?

Well, technically llvm-symbolizer is able to read the reference from
skeleton compile unit in the executable, load the necessary .dwo file, and
fetch the information from there, but we don't do much testing of this.

SGTM. This will bring us closer to the point when we can write tests,
where we strip out the .dwo files from executables, package them together
with llvm-dwp, and then verify that we still get all we need from
llvm-symbolizer.

Not quite following here - dwo sections are already stripped from .o
files and should never appear in executables.

llvm-symbolizer is tested against .o files that contain no dwo contents,
right? Or is the dwo/dwp information optionally used in some way?

Well, technically llvm-symbolizer is able to read the reference from
skeleton compile unit in the executable, load the necessary .dwo file, and
fetch the information from there, but we don't do much testing of this.

Ah, OK - thanks for the explanation/reminder/details!

Much like the recent efforts to provide a port of dsymutil in the LLVM project, I'm looking at providing an implementation of the Fission/Split DWARF DWP tool ( DebugFissionDWP - GCC Wiki ) in LLVM.

Sound good!

While there's potentially some overlap between the two tools, I'm thinking of keeping them separate at least initially since much of the debug info doesn't need to be touched by a DWP tool, unlike dsymutil.

While dsymutil and dwp appear to be similar on the surface, the actual functional overlap isn’t all that great. We should definitely look at opportunities to extract common code between the two, but developing them as separate tools makes total sense to me.

Basically all the tool needs to do is concatenate (or deduplicate, in the case of type units) the sections and apply a few domain-specific relocations, but that doesn't include having to read the DIE tree in debug_info.dwo (only the headers). The other thing is to build a couple of indexing data structures to allow fast lookup of CUs and TUs.

Likely I'll start with:

* adding llvm-dwarfdump support for the DWP indexes
* basic prototype of llvm-dwp just concatenating sections
* handle each of the domain specific relocations in turn
  * abbr_offset
  * debug_str_offsets.dwo entries
  * type_unit's DW_AT_stmt_list
  * references to debug_loc.dwo from debug_info.dwo
     * this one, at first blush, makes me particularly sad, as it'll involve actually walking all the DIEs in any CUs (stmt_list isn't great either, but at least that'd only be the header - same for accessing the signature for the CU, it's always in the root DIE)
* deduplicate type units
* add CU/TU indexes
* DWP merging (being able to read existing indexes and merge those into larger indexes)
* possibly support the thin DWP mode

What is thin DWP mode? (It’s not mentioned in the DWARF standard).

Does this all seem feasible/plausible/reasonable to do in LLVM? Any particular points of contention/interest/clarification?

Sounds all quite reasonable. How are you going to build a test suite? Checking in binaries?

It's possible at some point in the future it might be nice to share the type merging logic of dsymutil (type units make sense when you don't have a debug aware linker - but they do have unfortunate overhead which would be nice to avoid, if possible), in which case there might be some code sharing opportunity. But the two tools are still going to be fairly different in their purpose/handling (dsymutil has to get the address mappings and update all of that, DWP won't have to deal with code addresses, etc).

Agreed.
-- adrian

>
> Much like the recent efforts to provide a port of dsymutil in the LLVM
project, I'm looking at providing an implementation of the Fission/Split
DWARF DWP tool ( DebugFissionDWP - GCC Wiki ) in LLVM.

Sound good!

> While there's potentially some overlap between the two tools, I'm
thinking of keeping them separate at least initially since much of the
debug info doesn't need to be touched by a DWP tool, unlike dsymutil.

While dsymutil and dwp appear to be similar on the surface, the actual
functional overlap isn’t all that great. We should definitely look at
opportunities to extract common code between the two, but developing them
as separate tools makes total sense to me.

Yeah, once we/someone starts wanting to do type merging in DWP, then it'll
be interesting - until then, the DWP tool won't need to actually parse
debug info or much of anything, really - except just a tiny bit of the
first DIE in compile units, to get the hash. (speaking of - re: bag of
dwarf, perhaps all units should be able to have a hash->offset mapping,
then CUs in DWOs could have a hash->offset (where the offset would be 0,
the first DIE) for their CU, so we could get the CU hash without parsing
any DWARF)

> Basically all the tool needs to do is concatenate (or deduplicate, in
the case of type units) the sections and apply a few domain-specific
relocations, but that doesn't include having to read the DIE tree in
debug_info.dwo (only the headers). The other thing is to build a couple of
indexing data structures to allow fast lookup of CUs and TUs.
>
> Likely I'll start with:
>
> * adding llvm-dwarfdump support for the DWP indexes
> * basic prototype of llvm-dwp just concatenating sections
> * handle each of the domain specific relocations in turn
> * abbr_offset
> * debug_str_offsets.dwo entries
> * type_unit's DW_AT_stmt_list
> * references to debug_loc.dwo from debug_info.dwo
> * this one, at first blush, makes me particularly sad, as it'll
involve actually walking all the DIEs in any CUs (stmt_list isn't great
either, but at least that'd only be the header - same for accessing the
signature for the CU, it's always in the root DIE)
> * deduplicate type units
> * add CU/TU indexes
> * DWP merging (being able to read existing indexes and merge those into
larger indexes)
> * possibly support the thin DWP mode

What is thin DWP mode? (It’s not mentioned in the DWARF standard).

Like a thin archive, as I understand it (it's described here:
DebugFissionDWP - GCC Wiki ) - indexes, but without actually
merging all the stuff into a monolithic file.

>
> Does this all seem feasible/plausible/reasonable to do in LLVM? Any
particular points of contention/interest/clarification?

Sounds all quite reasonable. How are you going to build a test suite?
Checking in binaries?

Likely - can check in asm at least, if that's preferable (or at least check
it in besides, potentially) - though DWARF in asm isn't a terribly legible
format anyway.