standalone tool: best way to find built-in includes?

Hi guys,

I'm developed a standalone tool for analyzing source code. It uses
CommonOptionParser and ClangTool in what I think are the standard
ways.

My tool analyzes some other code ("foo.c") we have. We build foo.c
with clang the system-wide installed version of clang. Thanks to
cmake, we also produce a compilation database for that build of foo.c.

I'd really like to ensure that when my analysis tool runs, I'd like it
to simulate, as closely as possible, the way we normally build foo.c.
In particular, I'd like to be sure it's using the same builtin headers
and gcc-provided headers.

Unfortunately, I can't easily copy my analysis tools executable into
the same directory as the clang which we use to build foo.c.

So here's my question: Is there a good way for me to force my tool to
search the same include directories, in the same order, as our normal
copy of clang does when it's building foo.c?

I've tried running "clang -### ..." in the build system for foo.c, so
that (I think) I get explicit information about the flags being passed
to the front-end. However, I haven't found a way to pass those flags
to my analysis tool in a way that CommonOptionsParser and/or ClangTool
find acceptable. For example, they reject "-cc1".

Thanks,
Christian

You can set CLANG_RESOURCE_DIR in CMake, or pass -resource-dir (the default resource dir is created by llvm::sys::path::append(<exe-path>, "..", "lib", "clang", CLANG_VERSION_STRING)).

This logic is actually duplicated in more than one place unfortunately. e.g. in CompilerInvocation::GetResourcesPath (lib/Frontend/CompilerInvocation.cpp) and Driver::Driver (lib/Driver/Driver.cpp); there’s also some nastiness in CIndexer::getClangResourcesPath. Needs some refactoring.

– Sean Silva

+1 to needs some refactoring, but generally the trick is: you don’t want to use the builtin includes that are used by the clang you used to produce the compilation database, but the one that is current at the version at which you built your tool.

Thus, optimally you’ll install your tool into some/bin and the builtin headers into some/lib/clang/.

Cheers,
/Manuel

Hi Manuel,

What really matters is that when my tool analyzes the target program's
source, it assumes the same standard headers that are normally used to
build that target program.

For example, the target program might normally be built with clang
3.4, but my analysis tool is built on clang 3.6. When my tool
analyzes the source code of the target program, I want the analysis to
be as though the target program #include'd the 3.4 builtins, not the
3.6 builtins. Because my goal is to obtain the same AST as the one
created during the target program's normal (3.4) build process.

If I understand your suggestion, my analysis would give me an AST
based on the clang 3.6 builtins, not based on the 3.4 builtins. Is
that correct?

Thanks,
Christian

Hi Manuel,

What really matters is that when my tool analyzes the target program’s
source, it assumes the same standard headers that are normally used to
build that target program.

For example, the target program might normally be built with clang
3.4, but my analysis tool is built on clang 3.6. When my tool
analyzes the source code of the target program, I want the analysis to
be as though the target program #include’d the 3.4 builtins, not the
3.6 builtins. Because my goal is to obtain the same AST as the one
created during the target program’s normal (3.4) build process.

The builtin headers are implementation details of the compiler. If you compile a file with a tool based on clang 3.6 and builtin headers of clang 3.4 the probability that you don’t get a correct AST is higher.

If you want to get the same AST, you have to use the same version of clang. The question is, why is it important to you whether it’s the same AST?

The question is, why is it important to you whether it's the same AST?

I'm also generating program traces, which report source locations
based on the DWARF info. I want to be able to reliably relate the
trace events to the AST. I want to minimize the risk that the two
don't properly match.

Then I would suggest you compile the code with the same compiler you build your tool based on…

I had been hoping to avoid that as a requirement, but it sounds like I
can't. Thanks very much for the help.

- Christian

Hi Manuel,

What really matters is that when my tool analyzes the target program's
source, it assumes the same standard headers that are normally used to
build that target program.

For example, the target program might normally be built with clang
3.4, but my analysis tool is built on clang 3.6. When my tool
analyzes the source code of the target program, I want the analysis to
be as though the target program #include'd the 3.4 builtins, not the
3.6 builtins. Because my goal is to obtain the same AST as the one
created during the target program's normal (3.4) build process.

The builtin headers are implementation details of the compiler. If you
compile a file with a tool based on clang 3.6 and builtin headers of clang
3.4 the probability that you *don't* get a correct AST is higher.

It's actually possible for it to fail to build with the wrong builtin
headers. E.g. an intrinsic was changed from using an __builtin_* function
to using a different one that is not present in the other compiler.

-- Sean Silva

Hi Sean:

It's actually possible for it to fail to build with the wrong builtin headers. E.g. an intrinsic was changed from using an __builtin_* function to using a different one that is not present in the other compiler.

Are you basically reinforcing Manuel's warning, by saying there's a
real-world example of this happening?

Thanks,
Christian

Hi Sean:
> It's actually possible for it to fail to build with the wrong builtin
headers. E.g. an intrinsic was changed from using an __builtin_* function
to using a different one that is not present in the other compiler.

Are you basically reinforcing Manuel's warning, by saying there's a
real-world example of this happening?

Yes.

-- Sean Silva