RFC: Simplifying SymbolFile interface

The Native PDB symbol file plugin I think is mostly complete. It’s at least almost as good as the old Windows-only PDB plugin in 90% of ways, while actually being significantly better in other ways (for example, there was a test that took over 2 minutes to run with the Windows-only PDB plugin, which now takes about 2 seconds to run with the native PDB plugin.

While implementing this, I ran into several things that made my life quite difficult, and only later found out that I could have saved myself a lot of headache and time if the SymbolFile interface had been a little simpler and easier to understand.

Specifically, I’d like to remove the heavy use of SymbolContext in the SymbolFile / SymbolVendor interface and replace it with more narrow and targeted parameter lists.

Consider the case of someone calling FindTypes. In theory, today they can fill out any combination of Target, Module, Function, Block, CompileUnit, LineEntry, and Symbol. That’s 2^7 different possible ways the function can be called. While obviously not all of these combinations make sense, the fact is that it greatly increases the API surface, which is bad for test coverage, bad for ease of understanding, bad for usability, and leads to a lot of dead code.

For a person implementing this function for the first time, and who may not know all the details about how the rest of LLDB works, this is quite daunting because there’s an inherent desire to implement the function faithfully “just in case”, since they don’t know all of the different ways the function might be called.

This results in wasted time on the developer’s part, because they end up implementing a bunch of functionality that is essentially dead code.

We can certainly document for every single function “The implementor should be prepared to handle the case of fields X, Y, and Z being set, and handle it in such and such way”, but I think it’s easier to just change the interface to be more clear in the first place.

Here are the cases I identified, and a proposal for how I could change the interface.

  1. SymbolFile::ParseTypes(SymbolContext&)
  • In the entire codebase, this is only called with a CompileUnit set. We should change this to be ParseTypesForCompileUnit(CompileUnit&) so that the interface is self-documenting. A patch with this change is here [https://reviews.llvm.org/D56462]
  1. SymbolFile::ParseDeclsForContext(CompilerDeclContext)
  • This is intended to only be used for parsing variables in a block. But should it be recursive? It’s impossible to tell from the function name, so callers can’t use it correctly and implementors can’t implement it correctly. I spent 4 days trying to implement a generic version of this function for the NativePDB plugin only to find out that I only actually cared about block variables. I would propose changing this to ParseVariableDeclsForBlock(Block&).
  1. These functions:
  • ParseCompileUnitLanguage(SymbolContext&)
  • ParseCompileUnitFunctions(SymbolContext&)
  • ParseCompileUnitLineTable(SymbolContext&)
  • ParseCompileUnitDebugMacros(SymbolContext&)
  • ParseCompileUnitSupportFiles(SymbolContext&)

are only for CompileUnits (as the name implies. I propose changing the parameter from a SymbolContext& to a CompileUnit&.

  1. SymbolFile::ParseFunctionBlocks(SymbolContext&)
  • This is intended to be used when the SymbolContexts m_function member is set. I propose changing this to SymbolFile::ParseFunctionBlocks(Function&).
  1. SymbolFile::ParseVariablesForContext(CompilerDeclContext)
  • This function is only called with the the Context being a CompileUnit, Function, and Block. But does it need to be recursive? For a Function and Block it seems to be assumed to be recursive, and for a CompileUnit it seems to be assumed to not be recursive. For the former case, it’s not actually clear how this function differs from ParseGlobalVariables, and for the latter case I would propose changing this to ParseImmedateVariablesForBlock(Block&).
  1. SymbolFile::FindTypes(SymbolContext&).
  • This function is only called with the m_module field set, and since a SymbolFile is already tied to a module anyway, the parameter appears unnecessary. I propose changing this to SymbolFile::FindAllTypes()
  1. SymbolFile::FindNamespace(SymbolContext&, ConstString, DeclContext*) is only called with default-constructed (i.e. null) SymbolContexts, making the first parameter unnecessary. I propose changing this to FindNamespace(ConstString, DeclContext*)

  2. Module::FindTypes(SymbolContext &, ConstString, bool , size_t , DenseSet<SymbolFile *> &, TypeList&):

  • After the change in #6, we can propagate this change upwards for greater benefit. The first parameter in Module::FindTypes(SymbolContext&, …) now becomes unnecessary (and in fact, it was kind of unnecessary to begin with since in every case, the SymbolContext actually just had a single member set, which was equal to the this pointer of the Module from which this function was called). So I propose deleting this parameter and leaving the rest of the function the same.

I think all of these changes combined will make life significantlly easier for new SymbolFile implementors, delete untested code paths, and overall be easier to understand and work with.


I like all of the proposed interface changes.

As for the names, I wonder if we really need to put the argument type in the function name when that is already obvious from the type itself (so I'd think ParseTypes(CompileUnit&) is sufficient and ParseTypesForCompileUnit(CompileUnit&) is verbose).


I'm in support of this as well.