Implementation of DWARF expression parser

Hi,

This is my first post to this list, so I apologize in advance if I mess up on any list etiquette. Jumping right in, I’m making use of the DebugInfo/DWARF APIs to get debugging information out of binaries (what else!). One of the bits of data I need is the location information stored in the location list section as well as inline in DW_AT_location attributes and similar.

So far I’ve succeeded in making enough sense of the API to actually extract the raw data, though I’ve found the API somewhat confusing - for example, as far as I can tell there are two different code paths that the DWARFUnit implementation takes to extract DIEs which both seem to do basically the same thing? I’m well versed in the DWARF format, but not so much in LLVM itself as of yet; it wouldn’t take much for me to be just plain wrong about what’s going on. (That being said, I could lose myself in the LLVM code base for a long time if I let myself; it’s already taught me more about compiler design and even the C++ language than my entire career up to now!)

To give some detail, the code paths I saw were DWARFCompileUnit::getNumDIEs(), which parses all DIEs via the extractDIEsToVector() path, and DWARFCompileUnit::extract(), which parses just one specified DIE via the extractImpl() path. It’s not obvious to me why the extractImpl() code exists alongside DWARFDebugInfoEntryMinimal::extractFast(). In general, the "extract" paradigm is difficult to get a handle on, and I’ve yet to find any documentation. Another example is the DWARFContext::getCompileUnitForAddress() API, which is private; I’ve couldn't find a way to invoke its logic (aside from iterating all the units and using DWARFDebugInfoEntryMinimal::getInlinedChainForAddress() on each). I have the sensation that I’m misunderstanding the intended usage pattern for these objects entirely, or that at the very least I’m thinking at the wrong abstraction layer.

Unfortunately, I’m stopped here. As far as I can tell, there is no implementation of a DWARF expression parser, per §2.5 of the DWARF 4 standard, which is necessary for making sense of DWARF location information. I don’t mind building one myself, but before I do that I’d like to know if I’m duplicating effort. If there is in fact an expression parser in the LLVM core, and not just in places where it would be obviously needed (such as LLDB), I haven’t found it, and I’d appreciate a pointer to what I missed. Or even if someone’s just working on such a parser already, that would be great to know.

If there are other resources I could use besides the source code and the mailing list to get answers for questions like this, I’d be grateful for pointers to those too; I haven’t managed to find much useful information from simple Web searches thus far. I’m also curious if this is the sort of thing for which filing a feature request would be appropriate.

Thanks in advance for any help!

— Gwynne Raskind

Hi,

Hi,

This is my first post to this list, so I apologize in advance if I mess up on any list etiquette. Jumping right in, I’m making use of the DebugInfo/DWARF APIs to get debugging information out of binaries (what else!). One of the bits of data I need is the location information stored in the location list section as well as inline in DW_AT_location attributes and similar.

So far I’ve succeeded in making enough sense of the API to actually extract the raw data, though I’ve found the API somewhat confusing - for example, as far as I can tell there are two different code paths that the DWARFUnit implementation takes to extract DIEs which both seem to do basically the same thing? I’m well versed in the DWARF format, but not so much in LLVM itself as of yet; it wouldn’t take much for me to be just plain wrong about what’s going on. (That being said, I could lose myself in the LLVM code base for a long time if I let myself; it’s already taught me more about compiler design and even the C++ language than my entire career up to now!)

To give some detail, the code paths I saw were DWARFCompileUnit::getNumDIEs(), which parses all DIEs via the extractDIEsToVector() path, and DWARFCompileUnit::extract(), which parses just one specified DIE via the extractImpl() path. It’s not obvious to me why the extractImpl() code exists alongside DWARFDebugInfoEntryMinimal::extractFast(). In general, the “extract” paradigm is difficult to get a handle on, and I’ve yet to find any documentation.

The DIE extraction is handled only by extractFast. DWARFUnit::extractImpl is only about the Unit header parsing AFAICS. The DIEs are read lazily by the first method that requests them (see extractDIEsifNeeded()).

Another example is the DWARFContext::getCompileUnitForAddress() API, which is private; I’ve couldn’t find a way to invoke its logic (aside from iterating all the units and using DWARFDebugInfoEntryMinimal::getInlinedChainForAddress() on each).

getCompileUnitForAddress() is only used as a helper for the high-level line-tables abstraction. It’s not exposed as a public API.

I have the sensation that I’m misunderstanding the intended usage pattern for these objects entirely, or that at the very least I’m thinking at the wrong abstraction layer.

I believe there are 2 things here:

  • The DWARFContext methods are meant to be the implementation of a generic DebugInfo abstraction (see DIContext). This abstraction is used by llvm-symbolizer and other tools to access the DWARF (mostly (only?) line tables) data through a generic interface, but it might make the interface look a bit strange if you look at it from a pure DWARF point-of-view.
  • Apart from that libDebugInfo has been used nearly exclusively to power llvm-dwarfdump. Functionality
    might be absent/hidden because nobody has come up with a use-case for it.

Unfortunately, I’m stopped here. As far as I can tell, there is no implementation of a DWARF expression parser, per §2.5 of the DWARF 4 standard, which is necessary for making sense of DWARF location information. I don’t mind building one myself, but before I do that I’d like to know if I’m duplicating effort. If there is in fact an expression parser in the LLVM core, and not just in places where it would be obviously needed (such as LLDB), I haven’t found it, and I’d appreciate a pointer to what I missed. Or even if someone’s just working on such a parser already, that would be great to know.

I had implemented one here: http://reviews.llvm.org/D6771
Unfortunately, I lost track of this work. If you want to revive it and contribute it, it would be great!

If there are other resources I could use besides the source code and the mailing list to get answers for questions like this, I’d be grateful for pointers to those too; I haven’t managed to find much useful information from simple Web searches thus far. I’m also curious if this is the sort of thing for which filing a feature request would be appropriate.

Filling a feature request is fine, but the only way to be sure it gets there is to actually contribute it :slight_smile:

Fred

Hi Frédéric and LLVM,

I managed to finally come back to this after quite a while. Frédéric, thank you very much for the pointer to your work; it’s saving me a lot of time!

Unfortunately, I am running into one issue that my knowledge isn’t complete enough to solve on my own yet. The description of D6771 reads, “It requires a few preliminary patches like landing D6243 and adding a MCRegisterInfo in the DWARFContext”. The latter, adding an MCRegisterInfo, seems to be a bit beyond me.

Specifically, while I can get a target architecture from a given ObjectFile using getArch(), I’m not sure how to turn this into an MCRegisterInfo (or really anything, such as Target, that could lead to one). My most recent attempt, using TargetRegistry::lookupTarget() to get a Target and then use createMCRegInfo(), fails with a complaint that no targets are registered, and I’m still too new to LLVM’s depths to know what I’m missing. I’m not even entirely convinced this is the right approach. I haven’t found anything from Googling either; almost everything out there is focused on compiling and emitting to a target, working with an existing output file is an oddly sparse area.

So, in the end, my question boils down to: Given an ObjectFile, how can I most effectively look up an MCRegisterInfo appropriate to that file (if any such is known to LLVM in the first place)?

Thanks in advance!

– Gwynne Raskind

Hi,

Hi Frédéric and LLVM,

I managed to finally come back to this after quite a while. Frédéric, thank you very much for the pointer to your work; it’s saving me a lot of time!

Unfortunately, I am running into one issue that my knowledge isn’t complete enough to solve on my own yet. The description of D6771 reads, "It requires a few preliminary patches like landing D6243 and adding a MCRegisterInfo in the DWARFContext". The latter, adding an MCRegisterInfo, seems to be a bit beyond me.

Specifically, while I can get a target architecture from a given ObjectFile using getArch(), I’m not sure how to turn this into an MCRegisterInfo (or really anything, such as Target, that could lead to one). My most recent attempt, using TargetRegistry::lookupTarget() to get a Target and then use createMCRegInfo(), fails with a complaint that no targets are registered,

This is the real issue. The targets need to be registered, and for now llvm-dwarfdump didn’t require any target-specific code, so it didn’t do this.

I think you need something like this:

#include "llvm/Support/TargetSelect.h"

  llvm::InitializeAllTargetInfos();
  llvm::InitializeAllTargetMCs();
  llvm::InitializeAllTargets();

Before doing lookupTarget. For this to work, you will also need to add MC and ${LLVM_TARGETS_TO_BUILD} to the LLVM_LINK_COMPONENTS variable in the CMakeList.txt

(Note one bad side effect of this is that llvm-dwarfdump will become a much bigger executable, but I don’t see any way around this currently)

Fred

I actually tried that, but got tripped up by a laundry list of linker errors I’m still sorting through (pretty sure my config is just messed up in general, I’m about to basically start more or less fresh); I was hoping there was some better way than figuring out how to convert an architecture "unsigned" value to a triple string :confused: (My first attempt cheated by using the knowledge that my ObjectFile was actually a MachOObjectFile to call the getArch() form that returns a Triple directly). At least I was on the right track. Thanks again!

-- Gwynne Raskind