Hi!
I’m new to this list and to Clang development. Nevertheless I’ve been interested in Clang Static Analyzer for a while. I’ve been using it on a large code base with a lot of success. So let me start by saying: thanks for this amazing piece of code!
But… Some time ago I realized there are hardly any strictly C++ related checkers in CSA. I was wondering if there’s any movement in this area. I was thinking about some checkers for use-after-free for STL containers like std::string, for example:
const char* x = NULL;
{
std::string foo(“foo”);
x = foo.c_str();
}
printf(“%s”, x); // boom
There are also some other common types of errors in C++ like use of iterator after it has been invalidated. FYI this one in particular is detected by cppcheck.
So I decided to dig a bit to find out whether it is hard to write a checker for use-after-free like in the example with std::string. It looks like MallocChecker deals with a similar class of issues.
I was wondering whether it would be the right approach to try to “bend” MallocChecker to my needs (but it’s already 2.5k lines of code) or to start something new on my own.
Honestly it took me some time even to detect a simple std::string constructor call so the road looks rather long and bumpy…
Any hints, pointers? Any related work?
I have looked at this in the past, but it was about 18 months ago. So take my thoughts with that grain of salt. Also note that I’m not a regular or major contributor here. I’ve done very minor patches, but always hoping to do more So here’s my thoughts, and take them as you will.
The MallocChecker is fine, but the problem is that libc++ is really hard to analyze. It is an efficient implementation, but that cleverness really stresses the analyzer. For example, std::string’s memory layout is a union of three different types (“long”, “short”, “raw” buffers). I think the SA gives up on unions immediately.
The best way around this is to simplify what the analyzer sees. Here are two approaches.
One idea is to use “BodyFarm”, whose role is to synthesize alternate implementations for functions that should be simple to model. If you look here, you’ll see a bit about that: http://clang-analyzer.llvm.org/open_projects.html
Another idea is to actually implement a “simple libc++” and interpose that for analysis. For example, std::basic_string class would just be a pointer and two size_t’s, along with simple implementations of all the member functions and simple iterators. In the future, you could add other analysis hooks (for example, check for iterator invalidation).
I did play around a bit on this for Body Farm, and I can forward you the code I did. I got a couple constructors implemented, as well as “empty()” and “size()” for some very basic cases (string literal initialized strings). However, it got a bit tedious and I’m not sure it would scale. I think the second approach is far more interesting and maintainable. But a “simple libc++” could be hard for its own reasons.
Anyway I’m happy to give you my sketches. I’ll email them off-list. Take them or ignore them however you like.