[RFC][LLDB] Make PdbAstBuilder language-agnostic

I’d like to add downstream Swift support in NativePDB but have found that PdbAstBuilder is tightly coupled to TypeSystemClang. This RFC proposes a refactor which makes PdbAstBuilder an abstract base class and introduces PdbAstBuilderClang to take care of TypeSystemClang-specific functionality.

PdbAstBuilder’s public methods fall into one of four general camps:

  1. Used by SymbolFileNativePDB for its functionality.
  2. Called by SymbolFileNativePDB for internal side-effects in PdbAstBuilder.
  3. Called by UdtRecordCompleter (which is Clang-only).
  4. Purely internal.

For Group 1, the methods would remain on the base class, but change return values and parameters to non-Clang-specific wrapper types (for example: clang::DeclContext*lldb_private::CompilerDecl).

Group 2 would get replaced in the public interface with void-returning methods (for example: GetOrCreateFunctionDeclEnsureFunction) and become private.

Group 3 would become public methods on PdbAstBuilderClang and UdtRecordCompleter would take that instead of the base class.

Group 4 would be private.

Per-method breakdown here: Proposed PdbAstBuilder breakup.md · GitHub

Does this seem broadly acceptable?

cc: @adrian.prantl @charles-zablit @compnerd @Nerixyz

CC @ZequanWu

Making it easier to support other languages with the native PDB plugin sounds good to me.

Should UdtRecordCompleter get a new name as well (to indicate that it’s Clang specific)?

Looking through SymbolFileNativePDB, all uses of TypeSystemClang can be replaced with your proposed abstract base class. I suppose a good start could be to remove these uses by using the wrapper types (i.e. create group 1).

I’d like to add downstream Swift support in NativePDB but have found that PdbAstBuilder is tightly coupled to TypeSystemClang

Could you elaborate a bit on the pain-points you’ve encountered?

Could you elaborate a bit on the pain-points you’ve encountered?

It takes and returns Clang-specific types, takes TypeSystemClang in the constructor, etc.

I had to do something similar when making the TypeSystemRust prototype.

I don’t have a huge stake in LLDB’s API design, but it seems like wrapping it in such a way that it adheres closely to the SymbolFileDWARF/DWARFASTParser API would make it the easiest for users. That’s more or less what I did in my implementation, and then just wrapped the existing PdbAstBuilder functions in those (though the exact code isn’t super well tested so be careful if you use any part of it).

Could you elaborate a bit on the pain-points you’ve encountered?

When implementing a custom language, the major differentiating factor is how you interpret the debug info, and by extension, what the in-memory representation of that interpreted debug info is. Using Clang’s types often requires a ton of hacks because the concepts of your language doesn’t always map to the C/C++'s concepts. An example I often use is Rust’s references, which are borrowing pointers. In C++, references aren’t objects and aren’t guaranteed to occupy memory. That means no array-of-ref, no ref-to-ref, whereas those things are completely acceptable constructs in Rust.

There’s not a good way to get clang to represent what rust wants, so you have to add a bunch of hacks to account for it. Most of those hacks can be accomplished by the frontend (SB API) or via the debug info you output in the first place (e.g. outputting refs as typedefed pointers).

If you want it to be handled “natively” by LLDB, you need to add a TypeSystem, and the TypeSystem will require a bespoke LangType representation. If you were to make a TypeSystem that supports PDB, without changing SymbolFileNativePDB at all, you would essentially need your TypeSystem to call into TypeSystemClang and then reinterpret the objects you get from that into your LangType objects. It’s wasteful computationally, adds additional barriers to creating a type system (i.e. I don’t want to have to learn how clang’s types work to make my TypeSystem), and will always be “lossy” because you lose the debug info that TypeSystemClang decided not to care about/represent.

Should UdtRecordCompleter get a new name as well (to indicate that it’s Clang specific)?

Ideally it’d be great if it could be generalized to be useful for other languages (e.g. essentially re-exposes the raw PDB data in a way that’s more convenient to work with). At the absolute least, FieldListDeserializer should be made less awkward to use and/or documented in any way. It was baffling enough that I almost wrote my own code to parse the raw field list bytes.

I wrote an entire blog post about implementing PDB support for TypeSystemRust if you’re interested in some of the rationale and pain points. For example, one question that needs to be answered if SymbolFileNativePDB is to support other languages is: how do you associate a specific type with a specific language? Unlike DWARF, iirc the types are all just in a big global pile. They’re not super easy to associate with their compile unit with the way things are currently structured.

I agree that using the more generic types makes sense in general. What i was looking to understand is the extent to which you’ve implemented the Swift builder. Is the RFC motivated by “these seem like a good first step” or “we have a fully functioning PDB Swift AST builder and this is required for us to upstream it” or somewhere in-between? The risk with the former is that we end up with another layer of components that end up not getting used. Also, are you using the TypeSystemSwift on the swiftlang fork for this?

At most, you can use some heuristics. For example, the unique-name of structs/classes. But as you say, because there’s only one main type stream (with potentially deduplicated types from different languages), generally, it’s not possible to identify the language given a type.
For compile-units, we do have information about their language (from S_COMPILE3/2). So for types we create upon seeing them in a module’s symbols, we can resolve the language. This won’t work for FindType and probably others, where we only look through the type stream.

On the -msvc targets, unless you explicitly opt out, you’ll link to the C runtime. This pulls in some PDBs that include C++ debug info (I suppose the crt is implemented in C++ internally). Thus, if you’re debugging Swift or Rust executables, most likely, you’ll have C++ debug info in the final PDB as well. For us, this means that we will have to deal with multiple languages, even in a single SymbolFileNativePDB instance.

Imo the main reason the existing plugin components arent being used is because they cant be used. There’s no public API so people cant write standalone plugins. You either have to fork lldb or upstream, and upstreaming often isnt always very attractive for a number of reasons.

Fwiw, if rust wanted a standalone or upstream a typesystem (and many have expressed interest in that, though i’m iffy on how well it’ll work long-term), we would need this same change to PdbAstBuilder too.

It’s obviously a separate issue, i can make an rfc post on it or whatever, but i’d be willing to drive some sort of effort towards formalizing a public interface for plugins or an lldb-c or something. I’d absolutely need some guidance on that but if it’s something y’all want done, i can do it.

As a I said in previous conversations I am very much in support of working towards a plugin API surface that guarantees a rolling window of compatibility so we can evolve it more quickly than the SBAPI. I don’t have a concrete timeline, but my own plan for rolling this out is to develop a “light” Swift language plugin in tree on llvm.org. It would have no expression evaluator and thus no dependencies on the Swift compiler, and link only against the system’s Swift language runtime. We can find common ground between the APIs used by that plugin and @Walnut356 ‘s plugin to iterate on a new public plugin API. Once the plugin API is in place we could then have the swift.org toolchain build a standalone Swift language plugin (with expression evaluator and strongly coupled to the compiler) that can be loaded into any LLDB within the rolling compatibility window.

Somewhere in-between. I have a prototype (using TypeSystemSwift from the swiftlang fork, as you surmised) which supports most of fr v but not expressions yet.

Two notes on this:

  1. The status quo is that NativePDB in Swift crashes because of nullptr derefs on PdbAstBuilder.
  2. If the null derefs were fixed, Swift could probably get away with jamming another system on top of SymbolFileNativePDB downstream (since it really only needs GetOrCreateType from PdbAstParser). But as @Walnut356 points out, making a generic interface makes sense to support non-Clang TypeSystems in general.

Here’s a take on this: [LLDB][NativePDB] NFC: Add language-agnostic interface to PdbAstBuilder by speednoisemovement · Pull Request #173111 · llvm/llvm-project · GitHub though I’m not sure it stands on its own.

This sounds good to me. Either we want the “light” Swift language plugin or the complete language plugin, making the PdbAstBuilder language-agnostic would be the first step toward it.

I think the current Dwarf plugin’s API structure is a good reference on how it refactor could be done.

Thanks to everyone for the feedback and reviews! As of [LLDB][NativePDB] Introduce PdbAstBuilderClang by speednoisemovement · Pull Request #175840 · llvm/llvm-project · GitHub this is merged.