SymbolFile::FindGlobalVariables

I’m trying to implement this function for PDB. There are two overloads:

uint32_t
FindGlobalVariables (const ConstString &name, const CompilerDeclContext *parent_decl_ctx, bool append, uint32_t max_matches, VariableList& variables)

uint32_t
FindGlobalVariables(const RegularExpression& regex, bool append, uint32_t max_matches, VariableList& variables)

I know how to implement the second overload, but not the first. What is a CompilerDeclContext? Some comments in the DWARF implementation of the function seem to imply it’s related to namespaces, but there’s a lot of strange code that I don’t understand. What is the relationship between a namespace and a symbol file? And why does DeclContextMatchesThisSymbolFile contain no code at all that accesses any property of the symbol file? It just checks if decl_ctx->GetTypeSystem()->GetMinimumLanguage(nullptr) == decl_ctx->GetTypeSystem(), which appears to have nothing to do with any symbol file.

What user command or debugger operation results in FindGlobalVariables getting called with this particular overload, and how does it build the CompilerDeclContext?

On another note, why is the decl context stored as void* instead of having an actual wrapper with an abstract interface such as ClangDeclContext / JavaDeclContext, etc that all inherit from LanguageDeclContext, and pass the LanguageDeclContext around instead of a void*?

Also why does the lldb_private::Variable() class take a DWARFExpression to its constructor? Seems like this is wrong in the face of non-DWARF debug information.

I'm trying to implement this function for PDB. There are two overloads:

uint32_t
FindGlobalVariables (const ConstString &name, const CompilerDeclContext *parent_decl_ctx, bool append, uint32_t max_matches, VariableList& variables)

uint32_t
FindGlobalVariables(const RegularExpression& regex, bool append, uint32_t max_matches, VariableList& variables)

I know how to implement the second overload, but not the first. What is a CompilerDeclContext?

It is a declaration context. A variable like "foo::g_int" that is declared like:

namespace foo
{
   int g_int;
}

Has a decl context of "namespace foo". Also a variable like:

namespace foo
{
    class bar
    {
         struct baz
         {
              void foo();
         };
    };
}

The function foo would have a decl context "struct baz". "struct baz" has a parent decl context "class bar". "class bar" has a parent decl context "namespace foo". "namespace foo" has a parent decl context of "compile unit"

Some comments in the DWARF implementation of the function seem to imply it's related to namespaces, but there's a lot of strange code that I don't understand. What is the relationship between a namespace and a symbol file? And why does `DeclContextMatchesThisSymbolFile` contain no code at all that accesses any property of the symbol file? It just checks if decl_ctx->GetTypeSystem()->GetMinimumLanguage(nullptr) == decl_ctx->GetTypeSystem(), which appears to have nothing to do with any symbol file.

When it comes down to creating types I am going to guess that you will be using ClangASTContext to create any types that you hand out. This already has all of the needed calls for you to create all of the stuff that you will need. You will need to take a look at DWARFASTParserClang and see how it creates types using the ClangASTContext.

What user command or debugger operation results in FindGlobalVariables getting called with this particular overload, and how does it build the CompilerDeclContext?

The SymbolFile subclasses will create decl contexts as needed. In DWARF it uses:

void
SymbolFileDWARF::ParseDeclsForContext (CompilerDeclContext decl_ctx)
{
    TypeSystem *type_system = decl_ctx.GetTypeSystem();
    DWARFASTParser *ast_parser = type_system->GetDWARFParser();
    std::vector<DWARFDIE> decl_ctx_die_list = ast_parser->GetDIEForDeclContext(decl_ctx);

    for (DWARFDIE decl_ctx_die : decl_ctx_die_list)
        for (DWARFDIE decl = decl_ctx_die.GetFirstChild(); decl; decl = decl.GetSibling())
            ast_parser->GetDeclForUIDFromDWARF(decl);
}

On another note, why is the decl context stored as void* instead of having an actual wrapper with an abstract interface such as ClangDeclContext / JavaDeclContext, etc that all inherit from LanguageDeclContext, and pass the LanguageDeclContext around instead of a void*?

So three classes: CompilerType, CompilerDecl and CompilerDeclContext all contain a "TypeSystem *" which points to a subclass of "TypeSystem". Then each different type system will store their native pointer to the thing that represents a type, decl and decl context. For ClangASTContext TypeSystem produced objects, CompilerType stores a QualType as an opaque pointer gotten from a call to "clang::QualType::getAsOpaquePtr()". CompilerDecl stores just the "clang::Decl *", and for CompilerDeclContext we store a "clang::DeclContext *". Each type system is different.

SwiftASTContext stores "swift::Type*" in CompilerType, and it doesn't represent CompilerDecl or CompilerDeclContext because its expression parser doesn't need access to these things.

RenderScript and Go each have their own type systems and can back CompilerType, CompilerDecl and CompilerDeclContext as they need to backing them with whatever they need.

Right now CompilerType is the important one since all variable viewing explores CompilerType to display the children of a struct/union/class. CompilerDecl and CompilerDeclContext will help with expressions and the only thing that needs this right now is the clang expression parser for C/C++/ObjC/ObjC++.

When you are asked to parse a type for a variable in your SymbolFilePDB, you will be required to make a CompilerType. I would suggest using a ClangASTContext as a type system to create your types. You can see how DWARF does this in DWARFASTParserClang. You will need to do something very similar. If you need to create a class for something like:

namespace A
{
    class B
    {
    };
}

Part of correctly doing so involves you creating a "namespace A" in the ClangASTContext (which is a clang::DeclContext). You will specify that the translation unit is the decl context for "namespace A" when you create the namespace. When you create the "class B", you will need to specify that the context that is is created in is the "namespace A" from the ClangASTContext. These are parameters that are needed when you create clang types in a ClangASTContext. So you will be creating the CompilerDecl and CompilerDeclContext classes already. Classes are clang::DeclContext objects, functions are as well. A variable is just a clang::Decl. Each of these items could be handed out through your SymbolFile if someone asks "get me the CompilerDecl for the variable with ID 123".

The type system stuff is the most complex thing you will be doing as you are implementing your SymbolFilePDB and I know you will have a lot of questions.

In other debuggers, the debugger makes up a very simple structures used to represent the types from your executable and then these debuggers create expression parsers that uses these simple structs. If the language changes, or a new language feature is added be compiler engineers, the debugger must go and add all sorts of functionalities to the expression parser and they are always playing catchup and can never do all of the things the compiler can do. In LLDB we took a different approach: create types using the native clang::ASTContext just as the compiler would when compiling source code, and then just use the compiler to evaluate expressions. We can do so many things no other debugger expression parser can:
- multi-line statements
- declare expression local variables
- use flow control (if/then/else, switch, etc)
- Use C++ lambdas
- declare types that you can use in your expressions
- much much more

So the price of entry is a bit high as you need to convert your types into clang types, but the payoff is huge.

Let me know what other questions you have.

Greg

lldb uses DWARF expressions internally as a convenient language to represent locations of values. We had to pick some representation, and the DWARF expression was powerful enough for our purposes, meant we didn't have to reinvent something that already existed, and had the added benefit that if you did your DWARF then you don't have to transcode.

Jim

Can we abstract this somehow? Converting all my debug info to DWARF seems like a non-starter, as it doesn’t look like you can just do it partially, you have to go all the way (just based on glancing at the DWARFExpression header file)

Also why does the lldb_private::Variable() class take a DWARFExpression to its constructor? Seems like this is wrong in the face of non-DWARF debug information.

They are powerful enough to handle any variable location. More powerful than any other format I have seen. You have two choices:

- make a new lldb_private::Location class and have DWARFExpression implement the pure virtuals you need
- convert PDB locations into DWARF

Personally the second sounds easier as the DWARF expressions are well documented and they are easy to construct. If you have the spec for the PDB locations and can point me at this, I can take a look to see how well things would map.

Variables that are in registers use a DWARF location expression like:

DW_OP_reg12

that means the value is register number 12.

DW_OP_addr(0x10000) means the value is a global variable whose value lives at "file address 0x10000 inside of the module from which is originates". We translate the file address into a load address if we are running and if that resolves to a load address, we can read the variable value.

DW_OP_fbreg32(32) means the value is 32 bytes off of register 32.

So the locations expression are often this simple: in a register, in .data at a file address, or on the stack. So depending on how complex locations are in PDB, it might be easier to just create a simple DWARF expression and be done with it.

Greg Clayton

See my other email. You can abstract this, but it doesn't seem worth it unless PDB has some really powerful way to express variable locations?

The only “spec” is the API that allows you to access the info. There’s no spec of the bit format. This is probably all you are actually looking for though:

The problem isn’t necessarily that one is more pwoerful than the other, it’s just that PDBs can get huge (on the order of gigabytes), and converting between formats is an unnecessary step that a) will be slow to do the conversion, b) might not map 1 to 1 between the formats, and c) it’ already trivial (on the order of a few lines of code) to just query the PDB for everything you need.

So we’re talking about potentially thousands of lines of code to do something that would take about 10 (as well as being more efficient) with a proper abstraction.

Feel free to abstract if you need to. The page you sent me to has _very_ simple locations that would convert to DWARF expressions very easily. Probably less that a hundred lines of code.

If you need to abstract, making a lldb_private::Location class that DWARFExpression would implement the needed pure virtuals. Then each things that contains DWARFExpression would now contain a lldb_private::LocationSP which would be a shared pointer to a lldb_private::Location. DWARFExpression has grown over the years to contain a bunch of evaluate variants.

Just know that LLDB lazily parses things. We don't say "convert the entire PDB into the internal LLDB format now!". We say "get the line table for this one compile unit". Find the function for address "0x123000" and parse it. Later we will ask to get the function type and its args. Later, if we ever need to, we will lazily parse the blocks in the function. Nothing is parsed in full.

Let me know what you want to do. Abstraction is great, but comes at a cost of breaking things when our tests don't cover everything, so that is a worry on my end with any large changes...

How large of a change do you think it would be to abstract out the location information for the variable? As far as I can tell, our uses of this DWARFExpression on Variables are very limited:

  1. In ValueObjectVariable::UpdateValue and ClangExpressionDeclMap::GetVariableValue, if the location is a constant value, it refers to a a host address, we just read the value out as a number.

  2. In EntityVariable::Materialize(), we check whether it is valid.

  3. In SymbolFileDWARF, we “evaluate” the expression.

  4. In a few places, we check whether an input address matches the location specified.

  5. We dump the location to stdout in a few places.

Everything else could just as easily be private methods, because that’s all that public users of DWARFExpression actually use.

This seems like an easy abstraction to create. #3 is irrelevant because that code is in SymbolFileDWARF, it could downcast from Location to DWARFLocation. #1, 2, 4, and 5 could easily be implemented directly against a PDB.

While I haven’t tried to actually do either approach yet, I like the idea of creating the abstraction because it provides the native / most optimized debugging experience no matter what you’re using. For example, I can easily imagine a scenario where I have to keep the PDB open in memory to query some types of information, but I have to do a conversion of location information for Variables, and the memory usage becomes unacceptable because everything is memory twice (even though it’s lazily evaluated, the memory usage would double over time).

How large of a change do you think it would be to abstract out the location information for the variable? As far as I can tell, our uses of this DWARFExpression on Variables are very limited:

1. In ValueObjectVariable::UpdateValue and ClangExpressionDeclMap::GetVariableValue, if the location is a constant value, it refers to a a host address, we just read the value out as a number.
2. In EntityVariable::Materialize(), we check whether it is valid.
3. In SymbolFileDWARF, we "evaluate" the expression.

Leave this one alone, don't abstract it since it is DWARF native.

4. In a few places, we check whether an input address matches the location specified.
5. We dump the location to stdout in a few places.

Everything else could just as easily be private methods, because that's all that public users of DWARFExpression actually use.

Sounds like it won't be too bad.

This seems like an easy abstraction to create. #3 is irrelevant because that code is in SymbolFileDWARF, it could downcast from Location to DWARFLocation. #1, 2, 4, and 5 could easily be implemented directly against a PDB.

While I haven't tried to actually *do* either approach yet, I like the idea of creating the abstraction because it provides the native / most optimized debugging experience no matter what you're using. For example, I can easily imagine a scenario where I have to keep the PDB open in memory to query some types of information, but I have to do a conversion of location information for Variables, and the memory usage becomes unacceptable because everything is memory twice (even though it's lazily evaluated, the memory usage would double over time).

You will abstract the location only and that is fine. For everything else we do have lldb classes that will need to be created (compile units, functions, blocks, variables). Types are done via the TypeSystem subclasses so you will need convert all types there. So feel free to abstract the DWARFExpression for variable locations only.

I have no problem with the abstraction if you think it is needed. I personally think it will be much more work, but I won't be doing it so I don't mind.

Greg

It looks like i need to get type information working before variables, so I’ll work on that first and come back to this

Bleh, it looks like some abstraction will be needed at this level too, because ClangASTContext assumes a DWARFASTParser.

This doesn’t seem too bad, because the only code that actually assumes it’s a DWARFASTParser is in SymbolFileDWARF. So maybe creating a DebugInfoASTParser in lldb/Symbol and then making a PDBASTParser would be enough to get this working.

Seem reasonable?

Hi Greg,

could you clarify the difference between the functions ParseTypes, FindTypes, ResolveTypeUID, and CompleteType from the SymbolFile plugin?

Hi Greg,

could you clarify the difference between the functions ParseTypes,

SymbolFile::ParseTypes() is only used for debugging, don't worry about this one unless you need a way to say "parse all types in the supplied context". It can help you debug things. It is only called by Module::ParseAllDebugSymbols() and this is only used for debugging and it isn't exposed anywhere or used by anyone. Just used for initial implementation. Again, see Module::ParseAllDebugSymbols() for all the details, but not one calls Module::ParseAllDebugSymbols().

FindTypes

FindType is for finding types by name:

virtual uint32_t
FindTypes (const SymbolContext& sc,
           const ConstString &name,
           const CompilerDeclContext *parent_decl_ctx,
           bool append,
           uint32_t max_matches,
           llvm::DenseSet<lldb_private::SymbolFile *> &searched_symbol_files,
           TypeMap& types);

The symbol context can limit the scope of your search if needed. Why? Because you might be stopped inside a "test" module that has 100 compile units, and you are stopped in compile unit "foo.cpp" in function "bar()". The symbol context can specify a scope in which to search. If a function or block is specified, you should find the any types in the the block, if you don't find one, proceed to the parent block. Keep going up the blocks, then search the class if "bar()" is a method, then search the compile unit for any types that match, and then fall back to the module. It is usually easiest to just lookup the type and come up with N results, and pare down the results after you find them as most times you will only find one type so there is no need.

If "parent_decl_ctx" is not NULL, you take all results that made it through the "sc" filter and then filter out any results that are not in this "parent_decl_ctx". The expression parser often says "I am looking for "basic_string" in the parent_decl_ctx that represents "namespace std".

You always check to see if "this" is in the searched_symbol_files set and only search if you aren't. This is for module debugging where DWARF files refer to types from other DWARF files.

And there is one that is specific to module debugging:

    virtual size_t FindTypes (const std::vector<CompilerContext> &context, bool append, TypeMap& types);

You don't need to implement this one unless you have one PDB file that refers to a type in another...

ResolveTypeUID

Often types you might have a variable whose type id "typedef FooType ...;". Because of this, you can give your lldb_private::Variable a lldb_private::Type that says "I am a typedef named 'FooType' to the type with UID 0x1234. We don't need to resolve the type just yet and if no one needs to explore the type, we don't need to resolve the type right away. If anyone does ask for the CompilerType from a lldb_private::Type, we can then do a "m_symfile->ResolveTypeUID(m_encoding_uid);". So it allows us to just say "your type is a type from a symbol file with a user ID of 0x1234" and we can resolve that lazily if any only if we ever need to. Another instance where this is useful is when you create a lldb_private::Function:

Function (CompileUnit *comp_unit,
          lldb::user_id_t func_uid,
          lldb::user_id_t func_type_uid,
          const Mangled &mangled,
          Type * func_type,
          const AddressRange& range);

Note that you don't need to parse the function type up front, you can just make a function with:

SymbolFilePDB::ParseFunction(...)
{
    lldb::user_id_t func_uid = 0x1234;
    Mangled mangled("_Z3foo", true);
    AddressRange func_range = ...;
    Function *func = new Function(m_cu, func_uid, func_uid, mangled, nullptr, func_range);
}

Later if anyone calls:

    Type* Function::GetType();

You can see if can lazily resolve its function type using ResolveTypeUID:

Type*
Function::GetType()
{
    if (m_type == nullptr)
    {
        SymbolContext sc;
        
        CalculateSymbolContext (&sc);
        
        if (!sc.module_sp)
            return nullptr;
        
        SymbolVendor *sym_vendor = sc.module_sp->GetSymbolVendor();
        
        if (sym_vendor == nullptr)
            return nullptr;
        
        SymbolFile *sym_file = sym_vendor->GetSymbolFile();
        
        if (sym_file == nullptr)
            return nullptr;
        
        m_type = sym_file->ResolveTypeUID(m_type_uid);
    }
    return m_type;
}

, and CompleteType

    virtual bool SymbolFile::CompleteType (CompilerType &compiler_type);

This allows you to say "I have a class A type that is very complex and until someone needs to know the details about this class I will have the type represented as 'class A;'. Once someone needs to know the details they can lazily complete the type. This is very useful when creating types in clang::ASTContext objects because we can make a forward declaration that knows how to complete itself when you hook into the clang::ExternalASTSource. This is some of the factoring you have already been doing. If you always just want make all of your types completely when you hand them out, then you don't need to do this. But since clang types already have all the hooks needed to lazily complete types, we take advantage of this. This way you can hand an expression a "class A;" as the type for a variable, and if clang ever needs to know about the details inside of A, the clang::ExternalASTSource hooks will allow the type to lazily complete itself _only_ if ever needed. So if you have an expression like "A* a = GetA(); printf("%p\n", a)", clang never need to know what is inside of A so it never asks us to complete it. But if you do "A* a = ...; a->DoSomething()", clang will ask us to expand the type. Another example is imagine you have a local variable in your frame whose type is "A* a;". If we display the type in the debugger's variable view, we know this is a pointer and unless someone clicks on the disclosure triangle to expand the type, we don't need to know what is inside of "A". Variable display is done by using CompilerType methods like:

CompilerType t;

const bool omit_empty_base_classes = true;
uint32_t num_children = t.GetNumChildren (omit_empty_base_classes);
for (i in num_children)
{
    CompilerType child_type = t.GetChildCompilerTypeAtIndex (exe_ctx, i, ...);
}

When we call CompilerType::GetNumChildren(), we call into ClangASTContext::GetNumChildren(...), which knows we have clang types and it also knows that if the type isn't complete and it can complete itself, that we can complete the type:

uint32_t
ClangASTContext::GetNumChildren (lldb::opaque_compiler_type_t type, bool omit_empty_base_classes)
{
    uint32_t num_children = 0;
    clang::QualType qual_type(GetQualType(type));
    const clang::Type::TypeClass type_class = qual_type->getTypeClass();
    switch (type_class)
    {
        case clang::Type::Record:
            if (GetCompleteQualType (getASTContext(), qual_type))
            {

Note that we are smart and only try to complete types that we know can be complete, like Record types for clang. See the function GetCompleteQualType(). So the key is here: both LLDB code and the actual clang compiler itself as it is compiling expressions has the ability to deal with incomplete types and complete them only when necessary.

from the SymbolFile plugin?

Let me know if you have any questions.

Greg

Thanks Greg. It sounds like for now I will need to implement FindTypes and ResolveTypeUID. ParseTypes I can skip and CompleteType can be re-used from the DWARF implementation (it doesn’t actually do anything DWARF specific). And FindTypes can probably just re-use ResolveTypeUID and the real work is done in ResolveTypeUID.

Sadly PDB has no good way to do filtered / scope limited searches except by name. So we will probably have no choice in any case except to do the name search, and hope that it’s sufficiently narrow as to not make the subsequent O(n) filtering a huge problem (we could of course re-index everything in memory, but you’d still get the one time hit).