[libclang] Parsing an empty file.

Hi,

I've tried using libclang with the following code :

$ cat test.c
#include <assert.h>
#include <clang-c/Index.h>

enum CXChildVisitResult
parse(CXCursor cursor, CXCursor parent, CXClientData client_data)
{
         enum CXCursorKind kind = clang_getCursorKind(cursor);
         CXString String = clang_getCursorKindSpelling(kind);

         (void)parent;
         (void)client_data;

         fprintf(stderr,
                 "%s %s\n",
                 clang_getCString(String),
                 clang_getCString(clang_getCursorSpelling(cursor)));

         return CXChildVisit_Recurse;
}

int
main(int argc, char *argv)
{
         if (argc != 2)
                 return 1;

         CXIndex Index = clang_createIndex(0, 0);
         CXTranslationUnit TU = clang_parseTranslationUnit(
                 Index,
                 argv[1],
                 NULL,
                 0,
                 CXTranslationUnit_None);

         CXCursor Cursor = clang_getTranslationUnitCursor(TU);
         assert(clang_visitChildren(Cursor, parse, NULL) == 0);
         clang_disposeTranslationUnit(TU);
         clang_disposeIndex(Index);
         return 0;
}

Though this code is really simple, I'm getting unexpected results when running it against an __empty__ file:

$ touch empty.c
$ ./test empty.c
TypedefDecl __int128_t
TypedefDecl __uint128_t
StructDecl __va_list_tag
FieldDecl gp_offset
FieldDecl fp_offset
FieldDecl overflow_arg_area
FieldDecl reg_save_area
TypedefDecl __va_list_tag
StructDecl __va_list_tag
FieldDecl gp_offset
FieldDecl fp_offset
FieldDecl overflow_arg_area
FieldDecl reg_save_area
TypedefDecl __builtin_va_list
TypeRef __va_list_tag
IntegerLiteral

I would have thought nothing would be printed, since the file is empty. Trying to understand this behaviour, I noticed that dumping the AST of empty.c gave similar results :

$ clang -cc1 -ast-dump empty.c
typedef __int128 __int128_t;
typedef unsigned __int128 __uint128_t;
struct __va_list_tag {
     unsigned int gp_offset;
     unsigned int fp_offset;
     void *overflow_arg_area;
     void *reg_save_area;
};
typedef struct __va_list_tag __va_list_tag;
typedef __va_list_tag __builtin_va_list[1];

Is there any way to only analyze the code that is actually written by the programmer ?

WBR,
Cyril Roelandt.

I don't know if there's a better way but what I do is I compare the file information for a declaration (or similar) with the input file. If they're the same I process the declaration, otherwise I just skip the declaration.

The C++ AIP has a function for this but there doesn't seem be one in libclang so I rolled my own implementation.

This is an example in pseudo code:

foreach (cursor ; topLevel)
     if (cursor.location.spelling.file == inputFile)
         process(cursor);