C++ analysis with Clang?

Hi all -

Our group is considering using Clang for some program analysis, primarily aimed at bug-hunting. We're targeting C++, which I understand means the static analyzer isn't an option just yet, but it looks like there's still plenty there that could be useful - so much that I'm not sure what to start experimenting with.

The plugin tutorial uses a PluginASTAction and links to an example with a RecursiveASTVisitor, but of course we'd like to avoid reinventing the wheel (at coding time or at run time) as much as possible. For instance, are CFGs pre-constructed somewhere? Does the dataflow framework in Analysis/FlowSensitive work with C++? Is this Sema of any use outside the path-sensitive analyzer? And is there anything in Clang that could be useful in aggregating results together in whole program analysis?

Thanks!

Hi all -

Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting. We’re targeting C++, which I understand
means the static analyzer isn’t an option just yet, but it looks like
there’s still plenty there that could be useful - so much that I’m not
sure what to start experimenting with.

There are basically 2 different ways to go here, depending on what you’re after:

  • finding single-TU analyzable bugs at compile: you want clang plugins, and just run that as part of your normal build - if it’s not very project specific, consider contributing warnings to clang instead
  • finding cross-TU analyzable bugs; you’ll have a hard time doing that at compile time, as clang is inherently TU-focused; for cross-TU stuff you’ll need to go through a TU independent layer; if you’re after this, you can either use LibClang or the Tooling infrastructure depending on how much control over the AST you want (for a more in-depth comparison see http://clang.llvm.org/docs/Tooling.html.

The plugin tutorial uses a PluginASTAction and links to an example with
a RecursiveASTVisitor, but of course we’d like to avoid reinventing the
wheel (at coding time or at run time) as much as possible. For
instance, are CFGs pre-constructed somewhere? Does the dataflow
framework in Analysis/FlowSensitive work with C++? Is this Sema of any
use outside the path-sensitive analyzer? And is there anything in Clang
that could be useful in aggregating results together in whole program
analysis?

I can only answer the last of your questions:
No, not to my knowledge. But we’ve not needed that so far - what we do is outputting strings of (key, value) pairs from the analysis and later fold those via an outside script; you can use python or somesuch for the post-processing (we happen to use the MapReduce framework :slight_smile:

Cheers,
/Manuel

Hi all -

Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting. We’re targeting C++, which I understand
means the static analyzer isn’t an option just yet,

but it looks like
there’s still plenty there that could be useful - so much that I’m not
sure what to start experimenting with.

The plugin tutorial uses a PluginASTAction and links to an example with
a RecursiveASTVisitor, but of course we’d like to avoid reinventing the
wheel (at coding time or at run time) as much as possible. For
instance, are CFGs pre-constructed somewhere?

CFG is part of Analyzes library and is used by both clang and the analyzer.
You can use ViewCFG and DumpCFG checkers to see how C++ statements are modeled (and what the deficiencies are):
clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c

Does the dataflow
framework in Analysis/FlowSensitive work with C++?

The analyzer’s C++ support is work in progress. You can run the analyzer’s path sensitive checkers on C++ code, however, it does not reason about many C++ concepts.

Is this Sema of any
use outside the path-sensitive analyzer?

You can write a non-path-sensitive checkers, which visit different AST nodes (AST does have full C++ support). See AST Visitors in http://clang-analyzer.llvm.org/checker_dev_manual.html#ast.

And is there anything in Clang
that could be useful in aggregating results together in whole program
analysis?

The static analyzer does not currently support whole program analysis.

Hi all -

Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting. We’re targeting C++, which I understand
means the static analyzer isn’t an option just yet,

but it looks like
there’s still plenty there that could be useful - so much that I’m not
sure what to start experimenting with.

The plugin tutorial uses a PluginASTAction and links to an example with
a RecursiveASTVisitor, but of course we’d like to avoid reinventing the
wheel (at coding time or at run time) as much as possible. For
instance, are CFGs pre-constructed somewhere?

CFG is part of Analyzes library and is used by both clang and the analyzer.
You can use ViewCFG and DumpCFG checkers to see how C++ statements are modeled (and what the deficiencies are):
clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c

Does the dataflow
framework in Analysis/FlowSensitive work with C++?

The analyzer’s C++ support is work in progress. You can run the analyzer’s path sensitive checkers on C++ code, however, it does not reason about many C++ concepts.

Actually, except for some of the new c++11 concepts like lambdas, most are currently handled and the analyzer can already be very useful for C++ programs.

what we do is outputting strings of (key, value) pairs from the analysis
and later fold those via an outside script;

That's roughly what we were thinking about starting off with, which sounds like it should work fine with a plugin. (Unless I'm overlooking something?)

you can use python or somesuch for the post-processing

Speaking of which, there isn't official documentation for the LibClang Python bindings yet, is there?

You can use ViewCFG and DumpCFG checkers to see how C++ statements are
modeled (and what the deficiencies are):
*clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c*

Hmm, my copy is still a bit out of date, but I'm getting nothing for ViewCFG; DumpCFG works fine, though.

Also, is either of these formats meant to be read back in? In case we end up needing more detail in our whole program analyses.

what we do is outputting strings of (key, value) pairs from the analysis

and later fold those via an outside script;

That’s roughly what we were thinking about starting off with, which sounds like it should work fine with a plugin. (Unless I’m overlooking something?)

you can use python or somesuch for the post-processing

Speaking of which, there isn’t official documentation for the LibClang Python bindings yet, is there?

You can use ViewCFG and DumpCFG checkers to see how C++ statements are

modeled (and what the deficiencies are):

clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c

Hmm, my copy is still a bit out of date, but I’m getting nothing for ViewCFG; DumpCFG works fine, though.

debug.ViewCFG output is based on Graphviz: http://llvm.org/docs/ProgrammersManual.html#ViewGraph

Also, is either of these formats meant to be read back in? In case we end up needing more detail in our whole program analyses.

No. I don’t think we have any CFG serialization mechanisms.

Also, Tooling infrastructure and the static analyzer are not currently integrated. The analyzer is relying on scan-build script to interpose itself on a build. Tooling is a new infrastructure which has been used for AST analyzes based projects like refactoring.

$ pydoc clang.cindex

Do you want something more? I agree parts could be more descriptive. Maybe we could check in generated HTML into docs/?

It's a hard problem when you consider any higher-level docs inevitably rewrite libclang's documentation and thus would be better suited there. That's why the existing Python docs mostly assume knowledge of libclang and focus on Python specifics.

I'm open to suggestions for improving things.

Gregory

what we do is outputting strings of (key, value) pairs from the analysis
and later fold those via an outside script;

That’s roughly what we were thinking about starting off with, which sounds like it should work fine with a plugin. (Unless I’m overlooking something?)

Well, you usually want to run clang plugins as part of the build. At least for us, we usually want to run one build, and then run multiple global analysis passes over all the code in parallel.

Cheers,
/Manuel

Hi all -

Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting.

Karl,
What kinds of bugs are you planing to hunt?

–kcc

There is no need to serialize the CFGs, as they can be constructed directly from the ASTs (and depend on the ASTs).

What kinds of bugs are you planing to hunt?

Probably the quickest summary is that we're looking for security bugs involving misused libraries and assumptions about the environment, but this is still pretty early and our exact priorities may shift.

Do you want something more? I agree parts could be more descriptive.
Maybe we could check in generated HTML into docs/?

The most helpful thing I found was this blog post:

http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang/

But it's nearly a year old, and I don't know what's changed since then. So maybe something like that plus a link to the generated HTML?