Using Clang libraries to query declarations

Hi all,

First of all, thank you to all Clang developers for this great product!

I'm thinking of building a tool to generate bindings to a C++ library, obviously using Clang to parse C++ header files, and would appreciate your help.

Actually, to keep things concrete and simpler, let's say we want to write a tool that reads some header files, and lets the user type in qualified names of C++ types and functions, and then the tool reports what it actually is. For example, we might #include <string> and submit a query for "std::string::length", and then the tool could tell us that

"std" is a namespace
"string" is a typedef which expands to std::basic_string<...>
and "length" is a method that is const taking no arguments, etc.

I skimmed through the source code of LLDB for inspiration, when it parses an input C++ expression. It seems to put the user's expression in a temporary translation unit, and then attempts to compile it. So it invokes the whole parser machinery, which makes sense, after all, because of how complicated it is to parse C++.

If we have multiple queries for the same header files (included in the same order), we don't want to be re-parsing them again and again. And LLDB, from my cursory reading of the code, handles that by parsing the header files and making them available as an "external AST source".

I wonder if it's possible to do the same using libclang? I suspect the answer is "no", currently. I didn't see anything of the sort in the Doxygen pages. Though I'll consider extending libclang, though I'm a complete newbie here. Could you give me some pointers, and what would be the best design for the API? I guess the end result is that we get some CXType or CXCursor for the desired declaration, which we could walk through with the existing libclang API.

I also through of putting the headers into a module and then importing the module, if that's easier. But I never used modules before in C++, and it's not clear to me if that approach would work with libclang.

Thank you,
Steve

[snip]

Replying to myself, but I did have a positive resolution and wanted to share my findings, should anybody be interested.

By using the C++ libraries, not libclang, I found I could control clang::Parser precisely, so it first parses the header files I need, then individual C++ qualified-ids and type-ids in isolation. And I can feed what comes out to clang::Sema.

I then realized that I only needed semantic information and not so much the lexical information from the C++ headers. So there's not much I could reuse from libclang anyway, which is primarily designed around AST cursors. I originally wanted to use libclang because it seemed to be recommended for API stability, and my frontend isn't able to link to LLVM's C++ libraries directly. But I can build C interfaces myself for the stuff I'm extracting with Clang.

Steve