Our group is considering using Clang for some program analysis, primarily aimed at bug-hunting. We're targeting C++, which I understand means the static analyzer isn't an option just yet, but it looks like there's still plenty there that could be useful - so much that I'm not sure what to start experimenting with.
The plugin tutorial uses a PluginASTAction and links to an example with a RecursiveASTVisitor, but of course we'd like to avoid reinventing the wheel (at coding time or at run time) as much as possible. For instance, are CFGs pre-constructed somewhere? Does the dataflow framework in Analysis/FlowSensitive work with C++? Is this Sema of any use outside the path-sensitive analyzer? And is there anything in Clang that could be useful in aggregating results together in whole program analysis?
Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting. We’re targeting C++, which I understand
means the static analyzer isn’t an option just yet, but it looks like
there’s still plenty there that could be useful - so much that I’m not
sure what to start experimenting with.
There are basically 2 different ways to go here, depending on what you’re after:
finding single-TU analyzable bugs at compile: you want clang plugins, and just run that as part of your normal build - if it’s not very project specific, consider contributing warnings to clang instead
finding cross-TU analyzable bugs; you’ll have a hard time doing that at compile time, as clang is inherently TU-focused; for cross-TU stuff you’ll need to go through a TU independent layer; if you’re after this, you can either use LibClang or the Tooling infrastructure depending on how much control over the AST you want (for a more in-depth comparison see http://clang.llvm.org/docs/Tooling.html.
The plugin tutorial uses a PluginASTAction and links to an example with
a RecursiveASTVisitor, but of course we’d like to avoid reinventing the
wheel (at coding time or at run time) as much as possible. For
instance, are CFGs pre-constructed somewhere? Does the dataflow
framework in Analysis/FlowSensitive work with C++? Is this Sema of any
use outside the path-sensitive analyzer? And is there anything in Clang
that could be useful in aggregating results together in whole program
analysis?
I can only answer the last of your questions:
No, not to my knowledge. But we’ve not needed that so far - what we do is outputting strings of (key, value) pairs from the analysis and later fold those via an outside script; you can use python or somesuch for the post-processing (we happen to use the MapReduce framework
Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting. We’re targeting C++, which I understand
means the static analyzer isn’t an option just yet,
but it looks like
there’s still plenty there that could be useful - so much that I’m not
sure what to start experimenting with.
The plugin tutorial uses a PluginASTAction and links to an example with
a RecursiveASTVisitor, but of course we’d like to avoid reinventing the
wheel (at coding time or at run time) as much as possible. For
instance, are CFGs pre-constructed somewhere?
CFG is part of Analyzes library and is used by both clang and the analyzer.
You can use ViewCFG and DumpCFG checkers to see how C++ statements are modeled (and what the deficiencies are): clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c
Does the dataflow
framework in Analysis/FlowSensitive work with C++?
The analyzer’s C++ support is work in progress. You can run the analyzer’s path sensitive checkers on C++ code, however, it does not reason about many C++ concepts.
Is this Sema of any
use outside the path-sensitive analyzer?
Our group is considering using Clang for some program analysis,
primarily aimed at bug-hunting. We’re targeting C++, which I understand
means the static analyzer isn’t an option just yet,
but it looks like
there’s still plenty there that could be useful - so much that I’m not
sure what to start experimenting with.
The plugin tutorial uses a PluginASTAction and links to an example with
a RecursiveASTVisitor, but of course we’d like to avoid reinventing the
wheel (at coding time or at run time) as much as possible. For
instance, are CFGs pre-constructed somewhere?
CFG is part of Analyzes library and is used by both clang and the analyzer.
You can use ViewCFG and DumpCFG checkers to see how C++ statements are modeled (and what the deficiencies are): clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c
Does the dataflow
framework in Analysis/FlowSensitive work with C++?
The analyzer’s C++ support is work in progress. You can run the analyzer’s path sensitive checkers on C++ code, however, it does not reason about many C++ concepts.
Actually, except for some of the new c++11 concepts like lambdas, most are currently handled and the analyzer can already be very useful for C++ programs.
what we do is outputting strings of (key, value) pairs from the analysis
and later fold those via an outside script;
That's roughly what we were thinking about starting off with, which sounds like it should work fine with a plugin. (Unless I'm overlooking something?)
you can use python or somesuch for the post-processing
Speaking of which, there isn't official documentation for the LibClang Python bindings yet, is there?
You can use ViewCFG and DumpCFG checkers to see how C++ statements are
modeled (and what the deficiencies are):
*clang -cc1 -analyze -analyzer-checker=debug.ViewCFG test.c*
Hmm, my copy is still a bit out of date, but I'm getting nothing for ViewCFG; DumpCFG works fine, though.
Also, is either of these formats meant to be read back in? In case we end up needing more detail in our whole program analyses.
Also, is either of these formats meant to be read back in? In case we end up needing more detail in our whole program analyses.
No. I don’t think we have any CFG serialization mechanisms.
Also, Tooling infrastructure and the static analyzer are not currently integrated. The analyzer is relying on scan-build script to interpose itself on a build. Tooling is a new infrastructure which has been used for AST analyzes based projects like refactoring.
Do you want something more? I agree parts could be more descriptive. Maybe we could check in generated HTML into docs/?
It's a hard problem when you consider any higher-level docs inevitably rewrite libclang's documentation and thus would be better suited there. That's why the existing Python docs mostly assume knowledge of libclang and focus on Python specifics.
what we do is outputting strings of (key, value) pairs from the analysis
and later fold those via an outside script;
That’s roughly what we were thinking about starting off with, which sounds like it should work fine with a plugin. (Unless I’m overlooking something?)
Well, you usually want to run clang plugins as part of the build. At least for us, we usually want to run one build, and then run multiple global analysis passes over all the code in parallel.
Probably the quickest summary is that we're looking for security bugs involving misused libraries and assumptions about the environment, but this is still pretty early and our exact priorities may shift.
Do you want something more? I agree parts could be more descriptive.
Maybe we could check in generated HTML into docs/?
The most helpful thing I found was this blog post: