AST for multiple files and copying the clang diagnostic system design

Hello everyone,

How can I build an AST that represents code in multiple translation units? I am working on a code generation tool whose DSL is full C++, plus a few extensions related to what the tool does (it adds new AST nodes). Some of the declarations are namespace-like: redeclarable (can be “re-opened” several times) and acting as DeclContext’s. I would prefer that each the time they are reopened (in any file), the same Decl object gets populated.

One thing I could do is paste all the translation units into a single buffer to satisfy the need for a single “main file,” and keep a mapping of SourceLocations in the merged file back to their origin file.

It occurs to me that clang already has a lot of machinery for doing this, used by the include stack. I am wondering if anyone has an examples or advice on how I could reuse that machinery; I’m guessing I’m not the first one to want to do this.

I’ll also explain the use case, which you may find interesting: it is an attempt to copy the design of clang’s diagnostic system, but as a “resuable library” for my other projects, and without using tablegen:

I like the idea of logging/error reporting by defining “diagnostics” in a simple declarative language, along with a tool that generates boilerplate C++ diagnostic classes. Part of my “library” is the actual library (which has the DiagnosticConsumer-like stuff in it), and part is the code generation tool. I cannot use tablegen – unlike the clang diagnostic system, diagnostics can be parameterized by any C++ type, and are printable as long as the type has an ostream operator<< overload. Because of how it works, it makes sense for the diagnostic declaration language to be a superset of C++, since parts of the code generator need to understand C++.

I created this tool once already, using some macro weirdness. Diagnostic files were normal C++ headers, and the “diagnostic declarations,” which were just macros (DIAG, DIAG_LIBRARY, DIAG_CATEGORY). The macros would make “dummy” placeholder decls that would encode the info I needed, and a libtooling-based clang tool would find them up and spit out the generated code, while leaving all “normal” declarations alone.

I would like to reimplement this tool as actual fork of clang that adds a few custom Decl nodes (e.g., DiagDecl, DiagCategoryDecl, DiagLibraryDecl). Although it’s probably not worth the complexity over the earlier approach, I want to take my understanding of the clang AST to the next level anyway, and I think this will be a fun project for doing that. It will also make it possible to write a “diagnostic declaration” on more than one line without ending every line with a “” the way you have to do with macros, which drives me crazy.

Thanks!
Ken