LLDB support for symbol files (PDB vs DWARF implementation)


My goal is to process symbol files (i.e. PE-COFF/PDB, ELF/DWARF), and so I sought to examine how this is done in lldb. However, without first studying the code for an unreasonable amount of time, I just want to ask this simple question.

Consider the header files that pertain to symbol files in llvm (i.e. $LLVM_ROOT/include/llvm/DebugInfo) and lldb (i.e. $LLVM_ROOT/tools/lldb/source/Plugins/SymbolFile/).

For PDB support, I see only two classes (.cpp/.h file pairs, in $LLDB_ROOT/source/Plugins/SymbolFile/PDB), PDBASTParser and SymbolFilePDB, which appear to leverage the LLVM PDB implementation, and act as a facade to the LLVM implementation. This pattern doesn’t repeat in the case of the DWARF format, where the file and class hierarchy appears duplicated, and yet are also quite different.

Its hard to describe the differences and similarities I see between the DWARF files, but so far, I can only guess that the differences are historical (forked and never re-merged), and that lldb only ever needed to read these files (the base LLVM implementation being more complete, in order to be able to generate them in the first place and merge or manipulate them once they are created).

Can anyone briefly explain the architectural choices made here?


LLMV’s DWARF parser was born as a copy of the LLDB DWARF parser at the time and the person that did the port never attempted to switch LLDB to using the LLVM version as the only reason he added it was to be able to extract line table information. The LLMV DWARF parser has grown a lot over the years. Someone could take the time to switch LLDB over to use the LLVM DWARF parser, but that is a lot of work that will likely introduce bugs in the short term, though this would be valuable to do. I worked on the LLMV DWARF parser in an effort to make that happen last year, so the LLVM DWARF parser has everything that LLDB needs in order to switch, except someone that has the time to do this work.

LLDB provides an object file (ELF, COFF, MACHO) and debug info (DWARF, PDB) agnostic representation of debug info that can represent everything in these files. We have plugins for DWARF and PDB that convert those formats into the LLDB format. We do so lazily by parsing only what we need when we need it. Another step that the symbol file plug-ins in LLDB do is to convert any types in the debug info into clang::ASTContext types so we can use clang to run expressions on the debug info which has been converted back into clang AST types.

So if you are looking to process symbol files, you could just use LLDB (LLDB.framework on Mac or liblldb.so on other systems). We have a public API that is stable and that entire API is available through python in case you need to script any solutions.

Can you elaborate on what you are wanting to do with your parsing of the object/debug info files?


Your guesses are pretty much accurate.

By the way, I’ll point out that the PDB parsing code in LLDB doesn’t technically use the PDB parsing code in LLVM. At least not in the way you might expect.

This code was added to LLDB before LLVM supported native PDB reading and writing. To make this possible, and planning for the case in the future where LLVM could support native PDB, an interface was added to LLVM that provided access to PDB data, but could be implemented in different ways.

Since there was no native PDB support in LLVM at the time, the natural path just to get something to work in LLDB was to provide an implementation based ont he Windows DIA SDK, and this is what was done and that is the implementation that LLDB uses today.

Later, we added all the native PDB support to LLVM, but it was done at a lower level than this interface abstraction, and the implementation to make use of this native PDB reading code from LLVM is incomplete. Work has started on it, and then later stalled as priorities shifted around.

So, if you want to process PDB files using LLDB, currently you will need to be on Windows. If you want o make it work on non-Windows, the first step will be completing the implementation of this interface in LLVM. After that it should be a matter of changing 1 or 2 lines of code in LLDB to swap the implementation.