libclang doesn't parse namespace functions

Hi All,

Newbie to clang here. I wanted to use clang to parse a c++ project and generate some basic call graphs. I wanted to use clang because it has a lot of benefits over other approaches, such as the awareness of namespaces / parent objects for a given method call. The project I’m working on (WebKit) is also compiled with clang++, so I can be fairly sure clang will produce accurate results.

From reading the various tutorials around, I guess that libclang (with it’s python bindings) would be the best tool for the job. I have it setup and working with clang 3.3 from the Ubuntu repos, and I can reproduce basic demo snippets. However, I noticed that when parsing c++ code using namespaces, operations like getting the children of a cursor seem to pass over namespaces and any methods explicitly belonging to a class.

For now, I’m just working on writing code to parse the function definitions out of a single file (https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/core/dom/Node.cpp&q=node.cpp&sq=package:chromium&type=cs). Using the following code:

import clang.cindex

Path setup

clang.cindex.Config.set_library_path(“/usr/lib/llvm-3.3/lib/”)

index = clang.cindex.Index.create()
tu = index.parse(“/home/jon/src/third_party/WebKit/Source/core/dom/Node.cpp”)
for c in tu.cursor.get_children():
print c.kind, c.displayname

I get the following output:

CursorKind.TYPEDEF_DECL __int128_t
CursorKind.TYPEDEF_DECL __uint128_t
CursorKind.TYPEDEF_DECL __builtin_va_list
CursorKind.USING_DIRECTIVE
CursorKind.NAMESPACE WebCore
CursorKind.FUNCTION_DECL showTree(const int *)
CursorKind.FUNCTION_DECL showNodePath(const int *)

It seems that the WebCore namespace is passed over without parsing it’s member functions. If I explicitly iterate over it’s children with the following code:

import clang.cindex

Path setup

clang.cindex.Config.set_library_path(“/usr/lib/llvm-3.3/lib/”)

index = clang.cindex.Index.create()
tu = index.parse(“/home/jon/src/third_party/WebKit/Source/core/dom/Node.cpp”)
for c in tu.cursor.get_children():
if c.kind == clang.cindex.CursorKind.NAMESPACE:
for x in c.get_children():
print x.kind, x.displayname
else:
print c.kind, c.displayname

I get the following output:

CursorKind.TYPEDEF_DECL __int128_t
CursorKind.TYPEDEF_DECL __uint128_t
CursorKind.TYPEDEF_DECL __builtin_va_list
CursorKind.USING_DIRECTIVE
CursorKind.VAR_DECL DEFINE_DEBUG_ONLY_GLOBAL
CursorKind.FUNCTION_DECL oldestShadowRootFor(const int *)
CursorKind.NAMESPACE
CursorKind.FUNCTION_TEMPLATE shouldInvalidateNodeListCachesForAttr(const unsigned int *, const int &)
CursorKind.FUNCTION_DECL appendTextContent(const int *, bool, bool &, int &)
CursorKind.FUNCTION_DECL appendAttributeDesc(const int *, int &, const int &, const char *)
CursorKind.FUNCTION_DECL traverseTreeAndMark(const int &, const int *, const int *, const char *, const int *, const char *)
CursorKind.FUNCTION_DECL parentOrShadowHostOrFrameOwner(const int *)
CursorKind.FUNCTION_DECL showSubTreeAcrossFrame(const int *, const int *, const int &)
CursorKind.FUNCTION_DECL tryAddEventListener(int *, const int &, int)
CursorKind.FUNCTION_DECL tryRemoveEventListener(int *, const int &, int *, bool)
CursorKind.FUNCTION_DECL eventTargetDataMap()
CursorKind.FUNCTION_DECL showTree(const int *)
CursorKind.FUNCTION_DECL showNodePath(const int *)

This exposes more of the functions in the file, but misses a lot of them, notably anything prefixed with Node::.

How can I get clang to recursively parse each file, and return all of the function declarations? I’m also happy to receive any suggestions on how I might be able to achieve my overall goal of parsing a whole source tree and generating a call graph. Currently, I am parsing each file individually, which seems quite clumsy and inefficient.

Thanks in advance,
Jon