Symbol Server for everyone.

Hello lldb developers,

In one of the older posts (http://blog.llvm.org/2015/01/lldb-is-coming-to-windows.html), symbol server support was mentioned. Most likely it was meant for Windows, but at FB we have our own symbol server implementation for Linux (technically it’s completely platform agnostic), which we would like to integrate with LLDB and eventually open source along with the server. As such I thought I’d ask LLDB gurus like you, if anyone is already working on symbol server support and if not, I’d appreciate your thoughts on a desired architecture.

General idea.

Based on current LLDB implementation and the fact that symbol server feature is a cross-cutting concern, the natural place to put this logic would be inside SymbolVendor plugin, which on Mac is used to resolve separate dSYM bundles. In theory symbol server logic is completely platform-agnostic, as all we need to know is some sort of binary ID (could either be a real .build-id or UUID or some custom logic to compute a stable binary hash) and binary name. This info can be used to make a network request to check whether corresponding binary exists and if so, download it to a temporary location and call symbol_vendor->AddSymbolFileRepresentation with FileSpec pointing at that temporary location.

Implementation details.

Logic placement.

Even though symbol resolution is platform agnostic, the process of extracting/computing binary ID is. As such it seems like SymbolServerResolver can either be a part of LLDB core, or a separate directory in Plugins/SymbolVendor, which will then be used by SymbolVendorELF and SymbolVendorMacOSX. First both symbol vendors will try to resolve the symbols the way they currently do and only if they cannot find anything, will they try to use SymbolVendorSymbolServer.

Alternatively symbol server resolution logic can be placed into its own SymbolVendorSymbolServer, and modify SymbolVendor FindPlugin’s logic such that it does not return the first found SymbolVendor instance and instead returns either the first SymbolVendor instance that managed to successfully resolve symbols or just last one.

Yet another alternative would be to use a delegation chain, such that any SymbolVendor could be wrapped into a SymbolVendorSymbolServer, which would first try to invoke the delegate and if it cannot find symbols, will try to perform its magic. This approach seems nice, but does not play nice with current implementation based on static factory method.

Symbol server communication.

Network communication can either be implemented natively for different platforms or it can be delegated to a python script invoked by ScriptInterpreter. Using Python seems an easier option in order to make this cross-platform, but it adds a dependency on Python and will require propagating ScriptInterpreter to SymbolVendor creation factory.

Thoughts, suggestions and comments are very welcome.

Thank you,

Taras

Making the SymbolVendor dependent on Python is probably a non starter, and it would also make debugging more difficult.

We have network code for various platforms in Host already.

It would be nice to have a symbol server format that is platform agnostic. On the other hand, Microsoft tools already understand their own symbol server format , so if i ever reprioritize this, we will probably want the standard Microsoft symbol server format on Windows for interoperability.

Zachary, I agree that adding a Python dependency might not be a good idea, so I’ll take a closer look at the network code available in lldb. Symbol format we are currently using is pretty simple - every artifact is identified by a type (elf, src, etc), an id (build id for binary or hash for source) and a path. I’m not sure what you mean by platform agnostic, but with this approach every SymbolVendor will just have to pass the appropriate type, build id and a path to a ArtifactManager, which will download or locate a locally cached artifact and return a path to it.

By platform agnostic i mean having a single symbol server that works across multiple platforms is very nice. It sounds like in addition to being a symbol server this can also serve source code, and should work with embedded debug info, split dwo, or even pdb?

If you want to go agnostic, then you can just integrate into the following functions from Symbols.h:

    //----------------------------------------------------------------------
    // Locate the symbol file given a module specification.
    //
    // Locating the file should happen only on the local computer or using
    // the current computers global settings.
    //----------------------------------------------------------------------
    static FileSpec
    Symbols::LocateExecutableSymbolFile(const ModuleSpec &module_spec);
        
    //----------------------------------------------------------------------
    // Locate the object and symbol file given a module specification.
    //
    // Locating the file can try to download the file from a corporate build
    // repository, or using any other means necessary to locate both the
    // unstripped object file and the debug symbols.
    // The force_lookup argument controls whether the external program is called
    // unconditionally to find the symbol file, or if the user's settings are
    // checked to see if they've enabled the external program before calling.
    //
    //----------------------------------------------------------------------
    static bool
    Symbols::DownloadObjectAndSymbolFile (ModuleSpec &module_spec, bool force_lookup = true);
                                 
};

Note that we have an implementation for MacOSX that uses DebugSymbols.framework which is available on all Apple systems. There are many ways to track down a symbol file that is located locally and remotely. See the settings that you can set by checking out the details:

http://lldb.llvm.org/symbols.html

We allow a command line executable to be run that returns a plist. See the section labeled "SHELL SCRIPT PROPERTY LIST FORMAT". You basically run a shell command that takes arguments that are either a path to a file on disk, or a UUID that it is supposed to locate. The shell script can then use any method it wants to in order to find the symbol file you requested.

Apple has a shell tool named "dsymForUUID" that will do such a thing. It currently uses a custom database to do the lookup and return the correct values. The information the plist returns looks like:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd&quot;&gt;
<plist version="1.0">
<dict>
  <key>23516BE4-29BE-350C-91C9-F36E7999F0F1</key>
  <dict>
    <key>DBGArchitecture</key>
    <string>i386</string>
    <key>DBGBuildSourcePath</key>
    <string>/path/to/build/sources</string>
    <key>DBGSourcePath</key>
    <string>/path/to/actual/sources</string>
    <key>DBGDSYMPath</key>
    <string>/path/to/foo.dSYM/Contents/Resources/DWARF/foo</string>
    <key>DBGSymbolRichExecutable</key>
    <string>/path/to/unstripped/executable</string>
  </dict>
  <key>A40597AA-5529-3337-8C09-D8A014EB1578</key>
  <dict>
    <key>DBGArchitecture</key>
    <string>x86_64</string>
    <key>DBGBuildSourcePath</key>
    <string>/path/to/build/sources</string>
    <key>DBGSourcePath</key>
    <string>/path/to/actual/sources</string>
    <key>DBGDSYMPath</key>
    <string>/path/to/foo.dSYM/Contents/Resources/DWARF/foo</string>
    <key>DBGSymbolRichExecutable</key>
    <string>/path/to/unstripped/executable</string>
  </dict>
</dict>
</plist>

Note that this format will tell us where the unstripped executable lives, and also allows for source remapping. The DBGBuildSourcePath value says what the path was when the binary was built (and what is in the debug info), and DBGSourcePath says that the paths should be when actually used (where they will live forever on a build server). This allows our builders to build binary in say "/tmp/project/lldb" and then copy the sources to where they will live permanently in "/build/server1/project/lldb/lldb-1.2.3.4". The "DBGDSYMPath" key tells us where the symbol file is.

So our currently Apple solution is:
1 - check that the debug info isn't already in the object file
2 - check for the symbols in proximity to the executable (same directory, at the bundle level, and a few other places)
3 - check for the symbol file locally in one of our dsymForUUID cache locations
4 - check in common symbol directories (~/Library/Symbols, /Library/Symbols, user specified directories)
5 - if enabled, run the dsymForUUID shell script to possibly go out and fetch the symbols and cache them locally so that step #3 above can find the symbol file the next time without having to run an external shell script tool

Steps 3 through 5 happen in DebugSymbols.framework for us. The nice thing about an external tool is it allows the symbol locator to be updated separately from LLDB itself and makes it easy to update servers with a new version of a tool without having to update the LLDB on the server.

One thing that is key for this to work well is we build UUIDs into each binary. The same binary built with the same compiler with the same sources will produce the same UUID even if the two binaries were built in different directories. This allows our database to not have to store a new path for each build as many builds of a binary will share the same dSYM file. This also keeps the size of our caches on symbolication servers from filling up with every new symbolication request as many stack logs contain the same symbols.

The other way to do this would be to allow each Platform subclass to roll their own SymbolServer, or to allow multiple symbol server/symbol cache plugins to coexist. Right now Apple just uses the stuff from Symbols.h, but the Apple implementation could easily be plug-in-ified. If each platform, like say linux or windows, want to roll their own symbol servers, this could just be done by extending the Platform.h virtual functions to include functions that allow each platform to locate symbols in their own way. This might be easier that trying to make a whole new SymbolServer type of plug-in.

Se we have quite a bit of experience with this at Apple. Let us know if you have any questions.

Greg Clayton

Yes, Zachary, design does not have any platform constraints, as long as we have a reliable way to identify a binary, which can always be arranged.

Thanks a lot Greg for such a detailed response! Locating dSYM bundles is indeed very similar and in fact, since it’s probably more popular than inlined symbols, it will have to be extended to look for symbols on a symbol server as well.

The only reason I didn’t consider Symbols.h initially was because it does not seem to handle source files, which would be nice to support as well. But I think it’s probably a good start indeed.

Thanks again Greg!

The "dsymForUUID" tool doesn't handle copying source files around - we tend to just remote mount them. But we do include these keys in the return plist so that lldb can automatically remap the source files from where they were at build time to where they are at debug time. So if your symbol server copies files locally and they aren't in the same location as at build time, you might want to play a similar trick on your end.

Jim

Thank you Jim! Sounds like this should work!

    The "dsymForUUID" tool doesn't handle copying source files around - we tend to just remote mount them. But we do include these keys in the return plist so that lldb can automatically remap the source files from where they were at build time to where they are at debug time. So if your symbol server copies files locally and they aren't in the same location as at build time, you might want to play a similar trick on your end.
    
    Jim