Accessing ELF/DWARF data

Hi,

I'm wondering how I could use lldb to access ELF and DWARF data.
Examples being, the symbol table inside an ELF or being able to walk
the DIE tree to find the compilation unit for a given PC.

More specifically, I want to be able to perform these queries on an ELF:

1) For a given PC determine the file name
2) For a given PC determine the function name
3) For a given PC determine the line number

Is this possible with lldb? If not is this something provided by llvm?

I've tried to find the answer to this by looking at the header files,
but nothing jumped out at me.

Thanks,

Yes this is possible with LLDB, and LLDB will insulate your from the specifics of the ELF/DWARF and represent the information in its own formats.

You can currently use the "lldb" command line tool to do address lookups:

lldb my.elf
(lldb) image lookup -a 0x1234

An example on MacOSX looks like:

% lldb a.out
Current executable set to 'a.out' (x86_64).
(lldb) image lookup --address 0x0000000100000e85
      Address: a.out.__TEXT.__text + 121
      Summary: a.out`main + 4 at test.c:19

You can also get more internal/verbose information by adding the verbose flag:

(lldb) image lookup --verbose --address 0x0000000100000e85
      Address: a.out.__TEXT.__text + 121
      Summary: a.out`main + 4 at test.c:19
       Module: "/Volumes/work/gclayton/Documents/src/attach/a.out"
  CompileUnit: "/Volumes/work/gclayton/Documents/src/attach/test.c", id = {0x00000000}, language = ISO C:1989
     Function: "main", id = {0x0000008f}, range =
     FuncType: id = {0x0000008f}, decl = /Volumes/work/gclayton/Documents/src/attach/test.c:18, clang_type = 0x04818850 int (int, char const **)
       Blocks: range = [0x100000e81-0x100000eab), id = {0x0000008f}
    LineEntry: range = /Volumes/work/gclayton/Documents/src/attach/test.c:19
       Symbol: "main", id = {0x00000008}, range =

Since the symbol file parser gets to pick the IDs for the you can usually infer where the data came from in the dwarf. Above we see the function has an "id = {0x0000008f}" which means it was the DIE at offset 0x8f in the .debug_info. Also the Block ID "id = {0x0000008f}" will show you the deepest block in the DWARF where the address matched (in this case it is the same as the function at 0x8f).

The one thing you need to watch out for is that before you run, all addresses are "file virtual addresses", so if you have an executable and shared libraries that are based at address zero, you might get multiple matches for an address.

On MacOSX:

(lldb) image lookup --address 0x2000
      Address: a.out.__PAGEZERO + 8192
      Summary:

      Address: libSystem.B.dylib.__TEXT.__text + 3392
      Summary: libSystem.B.dylib`mach_ports_lookup + 120

      Address: libmathCommon.A.dylib.__TEXT.__const + 512
      Summary: libmathCommon.A.dylib`exp2table + 208

We see the "file virtual address" of 0x2000 matches our main executable, libSystem and libmathCommon. I didn't have debug symbols for these shared libraries so we only see the symbols that matched.

If you want to limit your search to just a specific executable/shared library you can add it on the command line:

(lldb) image lookup --address 0x2000 a.out
      Address: a.out.__PAGEZERO + 8192
      Summary:

Does this help and do all that you want it to do?

Greg Clayton

Hi Greg,

Thanks for all the great information. It confirms that I should be
able to use lldb for my project. But, what I'm actually after is the
programmable API to gather this information myself. I'll start
crawling the code from those commands to see what APIs they use, but
if you have any pointers or recommendations I would appreciate it.

I have checked in some missing bits to our API for address lookups, and added sample code in the repository that allows you to specify an executable and do an address lookup in that file. The new file is in the repository at:

examples/lookup/main.cpp

Let me know if this is enough to get you going. There are sure to be things that are in LLDB core that are not yet exposed through the script briding API (the classes that start with "lldb::SB"), so let me know if/when you run into issues and need more exposed).

Greg Clayton

Hi Greg,

This looks like it does most of what I want. It's a great start. What
targets are supported? My ELFs are compiled for Cell Linux. If I just
want to find things inside an ELF does the target really matter? I
don't want decompilation or the ability to actually run the
executable. I'll need to load the main ELF and some shared ELF
libraries into certain memory addresses and then do lookups based on
PC samples. Is there a virtual or fake target that I could use if my
particular platform isn't targeted?

Thanks,
John

Targets don't care what kind of object file you have, so you shouldn't have to do anything different if you want to peruse the symbol data in your object file (ELF) or debug symbols (DWARF). When you load an ELF file, it should be able to track its dependencies as long as the dependent libraries are specified the same way (I don't know how an ELF file mentions which shared libraries it depends upon, but I hope it is done in a similar fashion no matter what the target).

The specialization comes in when you want to debug something, then you would need a new lldb_private::Process subclass. If you do want to make lookups by loading ELF files and specifying load addresses for each of the shared libraries, then you do need a process subclass that is stubbed out (doesn't do launch, attach, run etc). There isn't currently one of these.

So the two ways to do this would be:
1 - make a new Process subclass and name it "generic" and stub out all functions for launch, attach, read/write memory etc. Then create a process object but don't launch it, and specify the load addresses for each section that you can resolve using

    bool Process::SectionLoaded (const Section *section, lldb::addr_t load_addr);

    Then you should be able to make queries.

2 - Just load all the of the ELF files into the target and manually track who is at what offset and do the mapping yourself. You would need to take a virtual address, figure out which shared library it should be in, subtract the virtual address from the load address, find the SBModule for the shared library and lookup using the shared library virtual address.

I might be able to code up a generic process if I had some free time soon. Til then, let me know if you have any other questions!

Greg Clayton

After talking this over with a colleague, we are going to move the section loading commands from Process to Target so that we don't need a process just to do symbol lookups. The affected function calls will be:

    lldb::addr_t
    GetSectionLoadAddress (const Section *section) const;

    bool
    ResolveLoadAddress (lldb::addr_t load_addr, Address &so_addr) const;

    bool
    SectionLoaded (const Section *section, lldb::addr_t load_addr);

    // The old load address should be specified when unloading to ensure we get
    // the correct instance of the section as a shared library could be loaded
    // at more than one location.
    bool
    SectionUnloaded (const Section *section, lldb::addr_t load_addr);

    // Unload all instances of a section. This function can be used on systems
    // that don't support multiple copies of the same shared library to be
    // loaded at the same time.
    size_t
    SectionUnloaded (const Section *section);

We would also like to use LLDB as a crash symbolicator in the future, so this change makes sense. We originally had the section loading information in the process in case a Target could ever have multiple processes, but after thinking about this more we don't believe we would want this, we would want new targets for each process. We already shared Modules between all targets in a single LLDB instance, so the overhead of a Target is minimal.

So I would hold off a bit and I will change the section looading code to be in the target which will allow you to set section load addresses and make symbol queries.

Greg Clayton