Using libclang for simple per file static analysis

I've recently tried to use libclang's Python bindings for a quick and
dirty script that would answer some questions about C++ code.

The details: I was trying to fix this Mozilla bug
https://bugzilla.mozilla.org/show_bug.cgi?id=798914 which meant a mass
rename across a big code base, big enough that I didn't want to do it
by hand. A nsMallocSizeOfFun typedef needed to be replaced everywhere
by a new MallocSizeOf defined in namespace mozilla in a new
MemoryReporting.h header.

I wanted to ask libclang 2 questions:
1. for this .cpp file, does it have a using namespace mozilla
declaration? (so that I know whether to replace with
mozilla::MallocSizeOf or just MallocSizeOf)
2. for this .cpp file, give me a list of the files it lexically
includes (so that I can search for an appropriate location to insert
the #include for the new MemoryReporting.h header based on Mozilla's
coding guidelines)

I wanted to simply invoke my Python script on just the .cpp file,
without needing to point it to include files, which would have
required a far more complicated integration with Mozilla's build
system to create the list of directories that needed to be in the
include path for every .cpp.

What I noticed is that libclang doesn't give me back the info I wanted
if it acts on incomplete files (even when using the
clang.cindex.TranslationUnit.PARSE_INCOMPLETE option which sounded
very promising).

For example it wouldn't report an using namespace mozilla directive,
unless it would see a namespace mozilla {} before it (actually with
the workaround of always adding namespace mozilla {} at the top of
each .cpp before passing it to libclang I could answer question 1).

And libclang would report some include files but it would stop after
failing to find one. I found no workaround, so for question 2 I gave
up and did some simplistic grepping, which would of course be wrong in
all sorts of cases, for example #includes inside comments.

The general pattern seemed to be that too little is reported when
parsing incomplete files.

Did I miss anything and is libclang suited to answer questions like 1.
and 2. by seeing just 1 file? If not, is it a temporary limitation
(nobody did this yet) or a by design (you're using the wrong tool it's
not meant to do this) limitation?

Thank you,
Catalin Iacob

The sad truth is that it's not really possible to even parse C++ correctly
without the complete translation unit.

I recommend looking at generating a JSON compilation database, which is the
usual way to work around the issue: <
http://clang.llvm.org/docs/JSONCompilationDatabase.html>.

-- Sean Silva

Thanks, I missed this. It seems like a reasonable solution since the
Bear tool that generates the compilation database is very easy to
integrate with build systems.