DWARF parsing code

It seems to me like the code in source\Plugins\SymbolFile\DWARF was forked from llvm\lib\DebugInfo. Can anyone offer some history here?

Is there any technical reason, aside from simply the work hasn’t been done yet, that we can’t merge this back up into LLVM’s DWARF parsing code and then just re-use that?

It's the other way around -- the code started out in LLDB, and was
then reused in LLVM, but nobody refactored LLDB to use the new one.
They have now diverged and each has some functionality that's not
present in the other one.

There's no technical reason we couldn't converge on one implementation
-- it's just a sizable effort that hasn't made it to the top of
anyone's priority list.

Thanks. Do you happen to know, even at a high level, what functionality is present in each one that is not present in the other?

Note that lldb's needs for the DWARF parser are pretty specialized. It parses incrementally and puts a great deal of effort into not touching any more of the DWARF than it absolutely needs to to respond to the queries made to it. It is way overkill for a simple dwarf dumper. I'm a little leery of putting it into the llvm sources since somebody there who didn't understand all it was trying to do could make some seemingly innocent simplification, and that would not cause any correctness issues which the lldb testsuite would catch, but "only" performance problems that were probably only noticeable for sizable apps so the testsuite might not catch that either...

Did you have some need for the lldb DWARF parser's capabilities or was this just a software-hygiene issue?


I was interested in looking at refactoring it mostly as a software hygiene issue.

I'm not aware of broad classes of functionality that differ, but low
level examples include some DWARF4 support added to LLVM's version,
and 64-bit DWARF support in LLDB's.

Thinking about it some more, I think I’ll come back to this later. I still want to do it, but I want to have a look at other things first.

If / when I do get around to it, I think the approach that makes the most sense is:

  1. Make a new project called lldwarf, or something similar, that contains all of LLDB’s dwarf parsing code.
  2. lldwarf depends on LLVM, no other dependencies.
  3. symbolizer depend on lldwarf
  4. lldb depends on lldwarf
  5. All of the clang specific stuff (seems like this exists just for processing AST’s) in lldb’s dwarf parsing code stays in LLDB’s dwarf plugin.

A few things.

Some history:

The DWARF parser started in LLDB and parts of it were used to make the start of a DWARF parser in LLVM just so line tables could be accessed. The minimum porting was done to make this happen. Unfortunately you must parse the info in the .debug_info + .debug_abbrev sections just so you can get to the info in the .debug_line section. This meant having to parse all of the DWARF DIEs, attributes, forms, etc.

A few things that are required for LLDB to use any new parser:
1 - Must be fast to parse up a skeleton of the DWARF DIEs
2 - DIE's must be stored very efficiently to take up the smallest amount of memory (debuggers load a LOT of DWARF and memory footprint is very important)
3 - The DWARF parser should not use _any_ object file classes to extract the data, it should be handed the data manually (for .debug_info, .debug_abbrev, etc) so the data can come from mmap() or from malloced buffers. LLDB reads data not only from single architecture files, but also from universal files (files have more than one architecture) and also from BSD archive files which are files full of .o files (which also might be universal). So it would be best to keep the DWARF parser pure so anyone can easily re-use it. Currently the LLDB DWARF parser does use its own object file code but this could easily be removed)
4 - Must support manually indexing the DWARF as well as be able to use any accelerator tables (.debug_aranges, .apple_names, .apple_types, etc)

I have some ideas on what I would do different if I re-wrote things, so feel free to keep in touch during the entire development cycle here.