ASTImporter usage

Hello,

I'm a graduate student at the University of Wisconsin Madison and am
beginning work on a static analyzer using clang. I've been looking at the
ASTImporter class of clang and noticed that it doesn't seem to be complete
and isn't used directly by any part of clang, but that LLDB does use it.
I was wondering if anyone could help me understand what parts of the
ASTImporter class LLDB uses and for what overall purpose. If anyone has
even a little bit of insight on this topic it'd be much appreciated.

Thank you

--henry abbey

FWIW clang has a static analyzer:

http://clang-analyzer.llvm.org

-eric

Henry,

I'd be happy to give you a quick overview.

In order to optimize compilation time and memory usage, Clang keeps Types, Decls, and Stmts in ASTContexts. ASTContexts overload the placement new operator with a custom bump allocator that allows these objects to be associated tightly with them, and allow a single free to destroy everything associated with that ASTContext.

As a side effect of this structure, ASTContexts are insert-only. This means that they will grow in size over time, and if you ever want to have "temporary" Types/Decls, they will need to live in their own ASTContext and deleted as a unit.

The Clang parser assumes that all declarations and types it deals with are in a single ASTContext.

LLDB loads debug information from many files and wants to be able to unload it as well. That means that we create an independent AST context for each symbol file, and create/delete them at will. This is great for memory usage, and our own type handling functions all know how to deal with types from different AST contexts.

We embed Clang into LLDB to do our expression parsing, though, and when an expression uses a Decl or a Type that lives in a symbol file, that Decl or Type needs to be moved to the ASTContext that Clang is using for parsing. This is a "deep copy" – everything that the Type or Decl refers to needs to be potentially moved into the parser's ASTContext.

That's where the ASTImporter comes in. The ASTImporter knows how to "deep copy" entities between ASTContexts. In addition, it has a "minimal" mode where it allows us to provide particular chunks of the abstract syntax graph on demand rather than deep copying everything. LLDB relies heavily on this "minimal" mode and when it doesn't work the Clang parser can crash or produce bizarre parse errors.

LLDB actually has several types of ASTContexts:

(1) the ASTContexts for entities in symbol files, one per symbol file;
(2) the ASTContexts to parse an expression, one per expression; and
(3) the scratch ASTContexts, which contains types and Decls that were created by the user (for example, by declaring a type in an expression) but need to persist beyond the lifetime of the expression (for example, if the user created an expression result variable of that type).

LLDB's use of AST importers is managed by the ClangASTImporter class. It maintains origin information for all Decls in ASTContexts it manages, and dynamically allocates Minions to handle copying between specific ASTContexts. Inside the ClangASTImporter class is a custom subclass of ASTImporter called Minion. ClangASTImporter uses Import(Decl *) and Import(QualType) directly on its Minions, which call through to much of the rest of the ASTImporter interface. Each minion also overloads Imported(), which is a callback that indicates what portions of the AST have been copied. When they receive this callback, they update the ClangASTImporter keep the origin information up to date.

Sean

The ASTImporter is used to import types from one AST into another. When we debug in LLDB, we create one AST for each binary (executable, shared library, etc):

"a.out" has its own AST
"libc.so" has its own AST

And now you debug and stop somewhere and you want to run an expression:

(lldb) expr (int)printf("argv[0] = '%s'\n", argv[0])

When we evaluate an expression, we create a new AST for the expression itself since it can define local variables and many other things (think of an expression that we can do with lldb like "int x=0; int y=12; x + y", this would not use any variables from the program, but it would define them in the expression AST only).

So then the following would happen:
- expression evaluation would find "argv" in the AST for "a.out", and import the definition into the expression AST.
- expression evaluation would find "printf" in the AST for "libc.so", and import the definition into the expression AST.

The importing also is done lazily and the expansion happens on demand, so if the "a.out" AST has a full definition for std::string, it doesn't need to import the full definition for "std::string" into the expresssion AST, it will start by importing a forward decl that can be completed on demand. This helps when you have an expression like:

(lldb) expr std::string *p = NULL

So the ASTImporter does smart importing from one AST to another and always downgrades classes with full definitions into completable forward decls in the destination AST.

Does this make sense?

Greg Clayton

Henry,

it is true that the separate ASTContexts for each symbol file can cause redundancies between different symbol files. However, we avoid rendering types from the debug information into the relevant AST context until these types are needed (either by direct request or through a deep copy) so although there is some memory bloat (from indexes, etc. in the actual DWARF parsing code, not in the ASTContext) we won’t necessarily actually create those redundancies.

However, if for example you hit a breakpoint in code in shared library A, access a local variable that has a type that’s defined in that library, and then continue and hit a breakpoint in shared library B, and access a local variable of the same type, it’s likely that we will use the definition from B.

A very nice feature we would like to see is the ability to generate hashes of types that would allow us to recognize that there is a type we have already loaded elsewhere that exactly matches what we’re about to load. That would thoroughly eliminate these redundancies and probably even allow us to get rid of the indexing bloat.

Sean