How to use libtooling to parse multiple files at once? and succesfully find stddef.h?

I'm writing a tool to parse C family source code basically follow these
tutorials
[1] http://clang.llvm.org/docs/LibTooling.html#libtooling-builtin-includes
[2]
http://kevinaboos.blogspot.com/2013/07/clang-tutorial-part-ii-libtooling.html

I try to parse opencv <https://github.com/Itseez/opencv> for now since it
uses CMake allowing me to get a correct compile_commands.json for free.

My first problme is how do I specify to parse thousands of files at once? At
first I thought giving libtooling the path of compile_commands.json then it
will do the rest. However, if I use run "$ mybinary -p . -- " and use
CommonOptionsParser to process argc, argv, it will complain "Not enough
positional command line arguments specified!", so I still have to specified
all files I want to parse.

If I use CompilationDatabase::autoDetectFromDirectory to load
compile_commands.json and use getAllFiles() to pass to ClangTool to parse
all files, it could parse but it can't find headers like stdarg.h, stddef.h
and such. In FAQ <http://clang.llvm.org/docs/FAQ.html> it said use clang
with -### option will see all options necessary for this issue.

However, since I'm using CompilationDatabase::autoDetectFromDirectory to
start parsing, where should these options go? If I pass these options to
CommonOptionsParser, how to do I start parsing (since it always complains
about missing positional arguments). Besides, I want my tool only need to
pass the path containing compile_commands.json as argument, how could my
program get options from "clang -### -c _file_to_compile_"?

Edwin Vane replied me says compilation database should be self contained.
First I need to make sure the compile_commands.json is generated with using
clang and I could use clang to build opencv.

I set these environment variables

export CC=/home/jcwu/repos/llvm-release/Release/bin/clang
export CXX=/home/jcwu/repos/llvm-release/Release/bin/clang++
export
C_INCLUDE_PATH=/usr/local/include:/home/jcwu/repos/llvm-release/Release/lib/clang/3.4/include:/usr/include/x86_64-linux-gnu:/usr/include
# these are from clang -v -c files.cpp
export
CPLUS_INCLUDE_PATH=/usr/local/include:/home/jcwu/repos/llvm-release/Release/lib/clang/3.4/include:/usr/include/x86_64-linux-gnu:/usr/include

then regenerate compile_commands.json, it could find stddef.h but new issue
comes up

[ 31%] Building CXX object modules/ts/CMakeFiles/opencv_ts.dir/src/ts.cpp.o
In file included from /home/jcwu/repos/opencv/modules/ts/src/ts.cpp:116:
/usr/include/setjmp.h:60:12: error: conflicting types for '__sigsetjmp'
extern int __sigsetjmp (struct __jmp_buf_tag __env[1], int __savemask)
__THROWNL;
           ^
/usr/include/pthread.h:727:12: note: previous declaration is here
extern int __sigsetjmp (struct __jmp_buf_tag *__env, int __savemask)
__THROW;
           ^
1 error generated.
make[2]: *** [modules/ts/CMakeFiles/opencv_ts.dir/src/ts.cpp.o] Error 1
make[1]: *** [modules/ts/CMakeFiles/opencv_ts.dir/all] Error 2
make: *** [all] Error 2

I can't use clang to build opencv due to a type conflict or two system
header files.
Havne't figured out how to solve this.

Edwin Vane replied me says compilation database should be self contained.
First I need to make sure the compile_commands.json is generated with using
clang and I could use clang to build opencv.

I set these environment variables

export CC=/home/jcwu/repos/llvm-release/Release/bin/clang
export CXX=/home/jcwu/repos/llvm-release/Release/bin/clang++
export

C_INCLUDE_PATH=/usr/local/include:/home/jcwu/repos/llvm-release/Release/lib/clang/3.4/include:/usr/include/x86_64-linux-gnu:/usr/include
# these are from clang -v -c files.cpp
export

CPLUS_INCLUDE_PATH=/usr/local/include:/home/jcwu/repos/llvm-release/Release/lib/clang/3.4/include:/usr/include/x86_64-linux-gnu:/usr/include

then regenerate compile_commands.json, it could find stddef.h but new issue
comes up

[ 31%] Building CXX object modules/ts/CMakeFiles/opencv_ts.dir/src/ts.cpp.o
In file included from /home/jcwu/repos/opencv/modules/ts/src/ts.cpp:116:
/usr/include/setjmp.h:60:12: error: conflicting types for '__sigsetjmp'
extern int __sigsetjmp (struct __jmp_buf_tag __env[1], int __savemask)
__THROWNL;
           ^
/usr/include/pthread.h:727:12: note: previous declaration is here
extern int __sigsetjmp (struct __jmp_buf_tag *__env, int __savemask)
__THROW;
           ^
1 error generated.
make[2]: *** [modules/ts/CMakeFiles/opencv_ts.dir/src/ts.cpp.o] Error 1
make[1]: *** [modules/ts/CMakeFiles/opencv_ts.dir/all] Error 2
make: *** [all] Error 2

I can't use clang to build opencv due to a type conflict or two system
header files.
Havne't figured out how to solve this.

Well, if you cannot use clang to build your code, you cannot use libtooling
to parse it (or am I misunderstanding something?)

Cheers,
/Manuel

I change my target to gnu tools. Taking wget for example, I could build it by
clang and I use Bear <https://github.com/rizsotto/Bear> to generate
compile_commands.json.
When I use my code
<https://github.com/rayjcwu/clang.libtooling.annotator/blob/master/main.cpp>
, which does nothing meanful except visiting, to visit all ast nodes in
source code, I will get a lot of header files not found error.
But I could use sourceweb <https://github.com/rprichard/sourceweb> , which
is listed on exteral clang projects, to parse wget with the same .json file
without any trouble. Even if I change my llvm/clang to version 3.2 the error
still exists.

Is there any quick possible solution for that before I tracing sourceweb
source code?

This is because clang has some own headers.

Well sourceweb finds the headers because it uses the Clang Path to find the include directory (I guess).https://github.com/rprichard/sourceweb/blob/master/clang-indexer/main.cc#L398 (see the comment there)

You can manually specify the include command (with the -I flag) for each Source-File in the compile_commands.json file.
In linux the path to the files is something like /usr/lib/clang/3.3/include or /usr/lib/llvm-3.4/include/clang/

Also look at this: http://clang.llvm.org/docs/FAQ.html#i-get-errors-about-some-headers-being-missing-stddef-h-stdarg-h

Hope that helps,

Cheers

This is because clang has some own headers.

Well sourceweb finds the headers because it uses the Clang Path to find
the include directory (I guess).

https://github.com/rprichard/sourceweb/blob/master/clang-indexer/main.cc#L398 (see
the comment there)

You can manually specify the include command (with the -I flag) for each
Source-File in the compile_commands.json file.
In linux the path to the files is something like
/usr/lib/clang/3.3/include or /usr/lib/llvm-3.4/include/clang/

Also look at this:
http://clang.llvm.org/docs/FAQ.html#i-get-errors-about-some-headers-being-missing-stddef-h-stdarg-h

I don't think the builtin headers are a problem here - config.h seems to be
a header in the project that's not found.

I’d run your normal build in verbose mode, and compare the command lines you get to what you find in compile_commands.json.

I set the environment variables C_INCLUDE_PATH, CXX_INCLUDE_PATH to
/usr/local/include:/home/myaccount/repos/llvm/Release/lib/clang/3.4/include:/usr/include/x86_64-linux-gnu:/usr/include.
I'm not sure what's the difference between giving -I for each command in
compile_commands.json and only setting these two variables, but it solves
the "stddef.h not found" (while using libtooling), so I guess setting these
two variables is enough.

There are two kinds of config.h, one is <config.h> and the other is
"config.h". The build-in headers solution be irrelevant to "config.h" error.

I'm comparing output generate by -v with compile_commands.json base on
Manuel Klimek's suggestion. Now sure what I'll find...

Poking here and there but I still can't figure out why.
I take away the environment variables C_INCLUDE_PATH. Take tempname.c for
example, if I use verbose mode in make file, I have

in short: * If I change compile_commands.json, leave only those files causing
errors while processing, my program could successfully parse all files
listed in compile_commands.json without any error.*

I compared compiling arguments for clang drivers from make (verbose mode)
and compile_commands.json but didn't find anything. If I run ClangTool to
run the FrontendAction, it will have a lot of error messages and core dump
in the end. Error messages are complaing about can't find some headers
within this project. But if I run those commands in compile_commands.json in
shell, everything works fine.

I'm using llvm/clang r194059 on ubuntu 12.04 trying to parse apache 2.4.7.
Generate compile_commands.json by https://github.com/rizsotto/Bear. Error
messages are like