Extracting the include tree from a source file

Hi all,

For a small program [1] I'm working on to analyze physical properties of a
C++ project, I'm interested in integrating clang. The first (and I think
relative simple) step I'd like to take is using clang to extract the
included headers.

My program already has an internal project graph containing source files,
object files, executables and (external) libraries. This graph I want to
complete with the headers.

So my question is, what is the easiest way, using clang (programmatically)
to extract the include tree given a path to a source file, a list of include
directories (i.e. the one past to the compiler, not internal paths) and a
list of defines (again, only the defines past to the compiler).

I attached some initial code I'm playing with (not integrated with the rest
of my program, just to get a feeling on how to use clang). Currently it
somewhat works, but it doesn't find system includes (i.e. the headers that
the compiler normally internally resolves) and if something goes wrong, e.g.
a header is not found, it just plain crashes. Also, its not clear to me how
to set defines and finally most important, how to actually get the include
tree after (or during?) preprocessing.

Any help is appreciated.

Cheers,

Bertjan

[1] http://gitorious.org/cpp-dependency-analyzer

main.ml.cpp (2.21 KB)

Hi,

Call InitializePreprocessor to add defines. To fix the crash, add
calls to BeginSourceFile()/EndSourceFile() (see
https://github.com/nico/clangtut/blob/master/tut03_pp.cpp ).

Take a look at include/clang/Lex/PPCallbacks.h – you probably want to
derive a class from that and set it on the preprocessor to be notified
of includes and defines in the code.

Nico

Hi,

Nico Weber wrote:

Thanks for the quick reply!

Hi,

Call InitializePreprocessor to add defines. To fix the crash, add
calls to BeginSourceFile()/EndSourceFile() (see
clangtut/tut03_pp.cpp at master · nico/clangtut · GitHub ).

Yes, I'm a step closer now, it indeed doesn't crash anymore when an error
occurs. However, I still have the problem that system includes are not
found. In this particular case stddef.h. I am somewhat assuming that it
shouldn't be needed to add paths like:

/usr/include/linux (for stddef.h)
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include

clang++ (the command line tool), seems to have these paths available as it
is able to preprocess the same file I'm testing with, without complaining
about not finding stddef.h

Take a look at include/clang/Lex/PPCallbacks.h – you probably want to
derive a class from that and set it on the preprocessor to be notified
of includes and defines in the code.

Will do, thanks for the pointer.

Bertjan

main.cpp (2.99 KB)

Hi,

Nico Weber wrote:

Thanks for the quick reply!

Hi,

Call InitializePreprocessor to add defines. To fix the crash, add
calls to BeginSourceFile()/EndSourceFile() (see
https://github.com/nico/clangtut/blob/master/tut03_pp.cpp ).

Yes, I'm a step closer now, it indeed doesn't crash anymore when an error
occurs. However, I still have the problem that system includes are not
found. In this particular case stddef.h. I am somewhat assuming that it
shouldn't be needed to add paths like:

/usr/include/linux (for stddef.h)
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include

clang++ (the command line tool), seems to have these paths available as it
is able to preprocess the same file I'm testing with, without complaining
about not finding stddef.h

All these are added in InitHeaderSearch::AddDefaultCIncludePaths()
and friends in lib/Frontend/InitHeaderSearch.cpp. Maybe step through
that code for your binary and for clang and check which branches are
taken in both cases.

Nico Weber wrote:

Hi,

Nico Weber wrote:

Thanks for the quick reply!

Hi,

Call InitializePreprocessor to add defines. To fix the crash, add
calls to BeginSourceFile()/EndSourceFile() (see
https://github.com/nico/clangtut/blob/master/tut03_pp.cpp ).

Yes, I'm a step closer now, it indeed doesn't crash anymore when an error
occurs. However, I still have the problem that system includes are not
found. In this particular case stddef.h. I am somewhat assuming that it
shouldn't be needed to add paths like:

/usr/include/linux (for stddef.h)
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include

clang++ (the command line tool), seems to have these paths available as
it is able to preprocess the same file I'm testing with, without
complaining about not finding stddef.h

All these are added in InitHeaderSearch::AddDefaultCIncludePaths()
and friends in lib/Frontend/InitHeaderSearch.cpp. Maybe step through
that code for your binary and for clang and check which branches are
taken in both cases.

Found two problems, the first was obvious: I didn't set any language enabled
in LanguageOptions -> langOption.CPlusPlus = true; Doh!

The second, I've a recent Gentoo system and the following path is missing in
InitHeaderSearch.cpp:

// Gentoo amd64 gcc 4.4.4
    AddGnuCPlusPlusIncludePaths(
        "/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include/g++-v4",
        "x86_64-pc-linux-gnu", "32", "", triple);

Cheers,

Bertjan

Bertjan Broeksema wrote:

Nico Weber wrote:

Hi,

Nico Weber wrote:

Thanks for the quick reply!

Hi,

Call InitializePreprocessor to add defines. To fix the crash, add
calls to BeginSourceFile()/EndSourceFile() (see
https://github.com/nico/clangtut/blob/master/tut03_pp.cpp ).

Yes, I'm a step closer now, it indeed doesn't crash anymore when an
error occurs. However, I still have the problem that system includes are
not found. In this particular case stddef.h. I am somewhat assuming that
it shouldn't be needed to add paths like:

/usr/include/linux (for stddef.h)
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include

clang++ (the command line tool), seems to have these paths available as
it is able to preprocess the same file I'm testing with, without
complaining about not finding stddef.h

All these are added in InitHeaderSearch::AddDefaultCIncludePaths()
and friends in lib/Frontend/InitHeaderSearch.cpp. Maybe step through
that code for your binary and for clang and check which branches are
taken in both cases.

Found two problems, the first was obvious: I didn't set any language
enabled in LanguageOptions -> langOption.CPlusPlus = true; Doh!

The second, I've a recent Gentoo system and the following path is missing
in InitHeaderSearch.cpp:

// Gentoo amd64 gcc 4.4.4
    AddGnuCPlusPlusIncludePaths(
        "/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include/g++-v4",
        "x86_64-pc-linux-gnu", "32", "", triple);

Replying to myself, this latter doesn't explain the problem, as it does work
with clang++. The first results in a lot of "ignoring nonexistent directory"
messages so I'm at least a step closer, but there must be something else I'm
overlooking. Maybe more languageOptions must be enabled?

Cheers,

Bertjan

Bertjan Broeksema wrote:

Nico Weber wrote:

Hi,

Nico Weber wrote:

Thanks for the quick reply!

Hi,

Call InitializePreprocessor to add defines. To fix the crash, add
calls to BeginSourceFile()/EndSourceFile() (see
https://github.com/nico/clangtut/blob/master/tut03_pp.cpp ).

Yes, I'm a step closer now, it indeed doesn't crash anymore when an
error occurs. However, I still have the problem that system includes are
not found. In this particular case stddef.h. I am somewhat assuming that
it shouldn't be needed to add paths like:

/usr/include/linux (for stddef.h)
/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include

clang++ (the command line tool), seems to have these paths available as
it is able to preprocess the same file I'm testing with, without
complaining about not finding stddef.h

All these are added in InitHeaderSearch::AddDefaultCIncludePaths()
and friends in lib/Frontend/InitHeaderSearch.cpp. Maybe step through
that code for your binary and for clang and check which branches are
taken in both cases.

Found two problems, the first was obvious: I didn't set any language
enabled in LanguageOptions -> langOption.CPlusPlus = true; Doh!

The second, I've a recent Gentoo system and the following path is missing
in InitHeaderSearch.cpp:

// Gentoo amd64 gcc 4.4.4
AddGnuCPlusPlusIncludePaths(
"/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include/g++-v4",
"x86_64-pc-linux-gnu", "32", "", triple);

Replying to myself, this latter doesn't explain the problem, as it does work
with clang++. The first results in a lot of "ignoring nonexistent directory"
messages so I'm at least a step closer, but there must be something else I'm
overlooking. Maybe more languageOptions must be enabled?

Hm, this _might_ be because clang comes with a few standard headers,
see clang/lib/Headers. Maybe clang adds these to the search path
somehow. Adding a gentoo-specific include path is the correct fix in
this case.

Nico Weber wrote:

Hm, this _might_ be because clang comes with a few standard headers,
see clang/lib/Headers. Maybe clang adds these to the search path
somehow. Adding a gentoo-specific include path is the correct fix in
this case.

Yes it is because clang needs this standard headers installed in
/usr/lib/clang/2.8/include (on linux of course, don't know about other
platforms). From libs/Driver/Driver.cpp: (line 76 and further in 2.8)

// Compute the path to the resource directory.
llvm::sys::Path P(Dir);
P.eraseComponent(); // Remove /bin from foo/bin
P.appendComponent("lib");
P.appendComponent("clang");
P.appendComponent(CLANG_VERSION_STRING);
ResourceDir = P.str();

I don't really understand why. Moreover, I don't understand why
AddGnuCPlusPlusIncludePaths in InitHeaderSearch.cpp is implemented as is.
For example take:

AddGnuCPlusPlusIncludePaths(
         "/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include/g++-v4",
         "x86_64-pc-linux-gnu", "32", "", triple);

The directory above (i.e. /usr/lib/gcc/x86_64-pc-linux-gnu/4.4.3/include/)
also contains the headers which are in /usr/lib/clang/2.8/include. So I
wondered why AddGnuCPlusPlusIncludePaths, doesn't add this path too so that
the clang include directory is not required.

This might be Gentoo specific, meaning that for other distros the current
code *does* find the standard headers and therefore doesn't need the clang
include path. If so, please tell me. In that case for the Gentoo adding a
line like:

AddPath( "/usr/lib/gcc/x86_64-pc-linux-gnu/4.4.4/include/"
         , System, true, false, false );

would remove the need for custom apps to add the clang include dir.

Cheers,

Bertjan

The headers in lib/clang/2.8/include are clang header, the ones for
gcc are gcc header, the ones for clang should be used with clang.
That function only adds the C++ header path which is actually for
libstdc++ not part of gcc the other headers are part of gcc and
shouldn't be messed with AFAIK. As you said the driver computes the
path to clangs headers relative to itself, perhaps this should be
separated into a separate function.

Paul Davey wrote:

It uses the path calculations for a good reason, you can place the
clang executable anywhere and as long as those files are in that
relative location it will work, this is something that should happen.
I understand this can cause other applications that use clang as a
library some problems, but they can either include these files in the
same relative path to themselves and a convenience function be
provided, or InitHeaderSearch have a function added to add the
standard install location for use by 3rd party apps perhaps.