I had to do something similar when making the TypeSystemRust prototype.
I don’t have a huge stake in LLDB’s API design, but it seems like wrapping it in such a way that it adheres closely to the SymbolFileDWARF/DWARFASTParser API would make it the easiest for users. That’s more or less what I did in my implementation, and then just wrapped the existing PdbAstBuilder functions in those (though the exact code isn’t super well tested so be careful if you use any part of it).
Could you elaborate a bit on the pain-points you’ve encountered?
When implementing a custom language, the major differentiating factor is how you interpret the debug info, and by extension, what the in-memory representation of that interpreted debug info is. Using Clang’s types often requires a ton of hacks because the concepts of your language doesn’t always map to the C/C++'s concepts. An example I often use is Rust’s references, which are borrowing pointers. In C++, references aren’t objects and aren’t guaranteed to occupy memory. That means no array-of-ref, no ref-to-ref, whereas those things are completely acceptable constructs in Rust.
There’s not a good way to get clang to represent what rust wants, so you have to add a bunch of hacks to account for it. Most of those hacks can be accomplished by the frontend (SB API) or via the debug info you output in the first place (e.g. outputting refs as typedefed pointers).
If you want it to be handled “natively” by LLDB, you need to add a TypeSystem, and the TypeSystem will require a bespoke LangType representation. If you were to make a TypeSystem that supports PDB, without changing SymbolFileNativePDB at all, you would essentially need your TypeSystem to call into TypeSystemClang and then reinterpret the objects you get from that into your LangType objects. It’s wasteful computationally, adds additional barriers to creating a type system (i.e. I don’t want to have to learn how clang’s types work to make my TypeSystem), and will always be “lossy” because you lose the debug info that TypeSystemClang decided not to care about/represent.
Should UdtRecordCompleter get a new name as well (to indicate that it’s Clang specific)?
Ideally it’d be great if it could be generalized to be useful for other languages (e.g. essentially re-exposes the raw PDB data in a way that’s more convenient to work with). At the absolute least, FieldListDeserializer should be made less awkward to use and/or documented in any way. It was baffling enough that I almost wrote my own code to parse the raw field list bytes.
I wrote an entire blog post about implementing PDB support for TypeSystemRust if you’re interested in some of the rationale and pain points. For example, one question that needs to be answered if SymbolFileNativePDB is to support other languages is: how do you associate a specific type with a specific language? Unlike DWARF, iirc the types are all just in a big global pile. They’re not super easy to associate with their compile unit with the way things are currently structured.