Adding PDB support to lib\DebugInfo

I’ve been working on adding pdb reading support to llvm. This started as a tool for dumping info from a pdb (similar to llvm-dwarfdump), which has been checked in and currently has limited support for dumping pdb.

There’s still more to be done on the pdb dumping tool, but at this point – to reduce duplicated effort – I think it makes the most sense to start moving some of this logic into a library in llvm, and then change llvm-pdbdump to use the library. Later, once the library is more comprehensive, I plan to then use it in LLDB for reading PDBs while debugging on Windows.

I think the best way to do this is to move all of the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then make another folder called lib/DebugInfo/pdb. These would then be compiled into two separate libraries.

Another approach is to just put the PDB code in the same folder as the dwarf code, but I don’t like this approach for a number of reasons:

  1. Not every consumer of DebugInfo wants both types of DebugInfo.
  2. The pdb reading code relies very heavily on Windows APIs, and will not compile on other platforms. This is solvable with some CMake machinery, but it’s ugly and unwarranted in my opinion.

So as a first step in this direction I’d like to propose moving the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then updating the rest of llvm accordingly.

Thoughts? Comments? Suggestions?
Zach

I’ve been working on adding pdb reading support to llvm. This started as a tool for dumping info from a pdb (similar to llvm-dwarfdump), which has been checked in and currently has limited support for dumping pdb.

There’s still more to be done on the pdb dumping tool, but at this point – to reduce duplicated effort – I think it makes the most sense to start moving some of this logic into a library in llvm, and then change llvm-pdbdump to use the library. Later, once the library is more comprehensive, I plan to then use it in LLDB for reading PDBs while debugging on Windows.

I think the best way to do this is to move all of the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then make another folder called lib/DebugInfo/pdb. These would then be compiled into two separate libraries.

Another approach is to just put the PDB code in the same folder as the dwarf code, but I don’t like this approach for a number of reasons:

  1. Not every consumer of DebugInfo wants both types of DebugInfo.
  2. The pdb reading code relies very heavily on Windows APIs, and will not compile on other platforms. This is solvable with some CMake machinery, but it’s ugly and unwarranted in my opinion.

So as a first step in this direction I’d like to propose moving the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then updating the rest of llvm accordingly.

Thoughts? Comments? Suggestions?

Sounds good. Naming bikeshed:

DWARF/Dwarf and PDB as directory names.

Might want to ask Frederic about any pending patches he wants to get in before you move things around under him. git should deal with it, but…

-eric

I’ve been working on adding pdb reading support to llvm. This started as a tool for dumping info from a pdb (similar to llvm-dwarfdump), which has been checked in and currently has limited support for dumping pdb.

There’s still more to be done on the pdb dumping tool, but at this point – to reduce duplicated effort – I think it makes the most sense to start moving some of this logic into a library in llvm, and then change llvm-pdbdump to use the library. Later, once the library is more comprehensive, I plan to then use it in LLDB for reading PDBs while debugging on Windows.

I think the best way to do this is to move all of the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then make another folder called lib/DebugInfo/pdb. These would then be compiled into two separate libraries.

Another approach is to just put the PDB code in the same folder as the dwarf code, but I don’t like this approach for a number of reasons:

  1. Not every consumer of DebugInfo wants both types of DebugInfo.
  2. The pdb reading code relies very heavily on Windows APIs, and will not compile on other platforms. This is solvable with some CMake machinery, but it’s ugly and unwarranted in my opinion.

So as a first step in this direction I’d like to propose moving the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then updating the rest of llvm accordingly.

Thoughts? Comments? Suggestions?

Sounds good. Naming bikeshed:

Sounds generally good. Not knowing much about PDB; are there any plans to have some kind of unified interface between dwarf and pdb (don’t know if that makes sense), or will they be kept separate?

DWARF/Dwarf and PDB as directory names.

The official spelling is DWARF but it’s visually heavy and doesn’t play well with camelcase identifiers. DWARF when it’s used alone and Dwarf when it’s part of a longer word?

– adrian

The pdb reading code relies *very heavily* on Windows APIs

can you make shure not to have any windows api stuff in the pdb-reader interface
so its easy to - if someone wants to - implement an windows-api independent version (support reading under linux etc.)
because the pdb format needs to be analysed anyway for writing pdb information on compile/link (in the future)

I’ve been working on adding pdb reading support to llvm. This started as a tool for dumping info from a pdb (similar to llvm-dwarfdump), which has been checked in and currently has limited support for dumping pdb.

There’s still more to be done on the pdb dumping tool, but at this point – to reduce duplicated effort – I think it makes the most sense to start moving some of this logic into a library in llvm, and then change llvm-pdbdump to use the library. Later, once the library is more comprehensive, I plan to then use it in LLDB for reading PDBs while debugging on Windows.

I think the best way to do this is to move all of the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then make another folder called lib/DebugInfo/pdb. These would then be compiled into two separate libraries.

so you would have libDebugInfoDWARF and libDebugInfoPDB. Would you still have libDebugInfo at all?

I ask because there is the DIContext abstraction that’s not tied to a particular debug format (It’s used by llvm-symbolizer, and I guess you have some interest in having that working on windows PDB files). But DIContext.cpp as one method, thus having a library for just that might be really overkill.

I’ve been working on adding pdb reading support to llvm. This started as a tool for dumping info from a pdb (similar to llvm-dwarfdump), which has been checked in and currently has limited support for dumping pdb.

There’s still more to be done on the pdb dumping tool, but at this point – to reduce duplicated effort – I think it makes the most sense to start moving some of this logic into a library in llvm, and then change llvm-pdbdump to use the library. Later, once the library is more comprehensive, I plan to then use it in LLDB for reading PDBs while debugging on Windows.

I think the best way to do this is to move all of the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then make another folder called lib/DebugInfo/pdb. These would then be compiled into two separate libraries.

Another approach is to just put the PDB code in the same folder as the dwarf code, but I don’t like this approach for a number of reasons:

  1. Not every consumer of DebugInfo wants both types of DebugInfo.
  2. The pdb reading code relies very heavily on Windows APIs, and will not compile on other platforms. This is solvable with some CMake machinery, but it’s ugly and unwarranted in my opinion.

So as a first step in this direction I’d like to propose moving the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then updating the rest of llvm accordingly.

Thoughts? Comments? Suggestions?

Sounds good. Naming bikeshed:

DWARF/Dwarf and PDB as directory names.

Might want to ask Frederic about any pending patches he wants to get in before you move things around under him. git should deal with it, but…

I have the DWARFExpression review in flight that it might be nice to land before the move (although as you say git should handle it). For the things I have out of tree I’ll just deal with it.

Fred

I’ve been working on adding pdb reading support to llvm. This started as a tool for dumping info from a pdb (similar to llvm-dwarfdump), which has been checked in and currently has limited support for dumping pdb.

There’s still more to be done on the pdb dumping tool, but at this point – to reduce duplicated effort – I think it makes the most sense to start moving some of this logic into a library in llvm, and then change llvm-pdbdump to use the library. Later, once the library is more comprehensive, I plan to then use it in LLDB for reading PDBs while debugging on Windows.

I think the best way to do this is to move all of the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then make another folder called lib/DebugInfo/pdb. These would then be compiled into two separate libraries.

Another approach is to just put the PDB code in the same folder as the dwarf code, but I don’t like this approach for a number of reasons:

  1. Not every consumer of DebugInfo wants both types of DebugInfo.
  2. The pdb reading code relies very heavily on Windows APIs, and will not compile on other platforms. This is solvable with some CMake machinery, but it’s ugly and unwarranted in my opinion.

So as a first step in this direction I’d like to propose moving the code in lib/DebugInfo to lib/DebugInfo/dwarf, and then updating the rest of llvm accordingly.

Thoughts? Comments? Suggestions?

Sounds good. Naming bikeshed:

Sounds generally good. Not knowing much about PDB; are there any plans to have some kind of unified interface between dwarf and pdb (don’t know if that makes sense), or will they be kept separate?

DWARF/Dwarf and PDB as directory names.

The official spelling is DWARF but it’s visually heavy and doesn’t play well with camelcase identifiers. DWARF when it’s used alone and Dwarf when it’s part of a longer word?

I realize we are only speaking of directory names here, but keep in mind that there already are e.g. DWARFUnit and DwarfUnit classes in llvm that are different. For the sake of consistency I think DWARF would be better.

Fred

I kind of don’t think it makes sense, or at the very least would be difficult. It might be possible and useful to eventually come up with a unified interface that provides a small subset of the functionality, but the two formats don’t really map that well to each other, so it would end up being a least common denominator of the functionality. I don’t understand DWARF and PDB well enough yet to know exactly how big of a subset that least common denominator is, but I suspect it will be somewhat limited.

So at least in the short term, I think it makes the most sense to keep them separate, and then in the future potentially have a separate effort to create a unified interface, while still leaving the native interfaces available for use.

If you mean just the interface, then yes I can an effort to make sure the interface doesn’t expose anything Windows-specific. This particular implementation will obviously need to use Windows specific things though.

See my earlier response to Adrian. But I’ll rehash the point here, which is that basically in the short term, I think it makes the most sense to keep them separate. In the future, if / when we decide to provide a unified interface (e.g libDebugInfo as you suggest), there will be additional machinery required to wrap the two interfaces, so we could move the DIContext class at that time.

Does this make sense?

See my earlier response to Adrian. But I’ll rehash the point here, which is that basically in the short term, I think it makes the most sense to keep them separate. In the future, if / when we decide to provide a unified interface (e.g libDebugInfo as you suggest), there will be additional machinery required to wrap the two interfaces, so we could move the DIContext class at that time.

Does this make sense?

Sure, no objection to moving files around :slight_smile:

Out of curiosity, if the only target users of the library are the dump tool and lldb, why put the code in LLVM and not only in LLDB? I would love to see LLDB using our Dwarf parser because there is quite some code duplication here, but it wouldn’t be the case for PDB support (again, not an objection, just a candide question).

Fred

In theory it probably could go into LLDB, but when we discussed this internally, we just decided that debug info goes in llvm on pure principle and for consistency. I also agree it would be nice for LLDB to use LLVM dwarf parser, but this needs some evangelism in the LLDB community before it can happen :slight_smile:

There are potential use cases for PDB reading in llvm, for example llvm-symbolizer and ASAN, who i think have hand rolled PDB support, so there’s definitely some merit to providing this kind of thing at the LLVM level, but that’s more of a long term thing.