Hi!
In this letter, I’d like to describe a particular problem in the Clang StaticAnalyzer’s BugReporter class that I’d like to tackle in a Google Summer of Code project this summer. I’ll show real-world examples on some of its shortcomings, and propose a potential solution using static backward program slicing. At last, I’ll ask some specific questions. I’d love to hear any and all feedback on this!
This is a problem statement, not a proposal. I plan to send out a formal proposal by Friday (but not later then Saturday), that will contain more details on both the problem and the solution. I don’t introduce myself or my previous works within the project, that will also be detailed in my upcoming letter. I also plan to incorporate the feedback I’ll receive to this letter.
—=== BugReporter constructs bad reports ===—
What does the BugReporter do?
After the Static Analyzer found an error, the BugReporter receives an ExplodedNode, which, accompanied by its predecessors, contains all the information needed to reproduce that error. This bugpath is then shortened with a variety of heuristics, such as removing unrelated function calls, unrelated branches and so on. BugReporter by the end of this process will construct a PathDiagnostic object for each report, that is, ideally, minimal.
Example 1.
Consider the following code example:
1 // leak.cpp
2 int *global_ptr;
3
4 void save_ext(int storeIt, int *ptr) {
5 if (storeIt)
6 global_ptr = ptr;
7 }
8
9 void test(int b) {
10 int *myptr = new int;
11 save_ext(b, myptr);
12 delete global_ptr;
13 }
It’s clear that if test is invoked with b’s value set to true, there is no error in the program. However, should b be false, we’ll leak memory.
$ clang -cc1 -analyze -analyzer-checker=core,cplusplus leak.cpp
The Static Analyzer is able to catch this error, but fails to mention the call to save_ext entirely, despite the error only occurring because the analyzer assumed that storeIt is false. I’ve also attached the exploded graph leak.svg that demonstrates this.
Example 2.
Consider the following code example:
1 // divbyzero.cpp
2 void f() {
3 int i = 0;
4 (void) (10 / i);
5 }
6
7 void g() { f(); }
8 void h() { g(); }
9 void j() { h(); }
10 void k() { j(); }
Its clear that a call to f will result in a division by zero error.
$ clang -cc1 -analyze -analyzer-checker=core divbyzero.cpp
Again, the Static Analyzer is plenty smart enough to catch this, but the constructed bug report is littered with a lot of useless information – it would be enough to only show the body of f, and, optionally where f itself was called. For the sake of completeness, I’ve attached divbyzero.svg that contains the exploded graph for the above code snippet.
The above examples demonstrate that BugReporter sometimes reduces the bugpath too much or too little.
—=== Solving the above problem with static backward program slicing ===—
What is static backward program slicing?
A program slice consists of the parts of a program that (potentially) affect the values computed at some point of interest, called the slicing criterion. Program slicing is a decomposition technique that elides program components not relevant to the slicing criterion (which is a pair of (statement, set of variables)), creating a program slice[1][2]. Static slicing preserves the meaning of the variable(s) in the slicing criterion for all possible inputs[1]. Backward slices answer the question “what program components might affect a selected computation?”[1]
While statement-minimal slices are not necessarily unique[3], Weisel developed a popular algorithm that constructs one. In essence, his fix-point algorithm constructs sets of relevant variables for each edge in between node i and node j in a CFG graph, from which he constructs relevant statements. The fix-point of the relevant statements set is the slice itself.
One of the characteristic of his algorithm is that the resulting program slice will be executable. However, our problem doesn’t require the code to be executable, so we could use a more “aggressive” approach that creates a smaller slice. An improvement to his algorithm is presented in [4].
How does this relate to BugReporter?
We can show that using Weiser’s algorithm, issues raised in Example 1. and Example 2. can be improved upon.
For example 1., the statement-minimal program slice with the criterion (13, ) will contain the statements 5 and 6, and for example 2., the statement-minimal program slice with the criterion (4, i) won’t contain anything but statements 3 and 4. For the latter, we can even improve the algorithm to also contain statement 7, where a call to f is made.
The analyzer, as stated earlier, gives BugReporter an ExplodedNode, from which the slicing criterion must be constructed. The statement corresponding to this node, coupled with the interesting regions the checker itself marked could be used to construct this slicing criterion.
Challenges
While the algorithm Weiser developed along with the improvements made by others are interprocedural, I would imagine that in implementation, it would be a challenging step from an intraprocedural prototype.
Several articles also describe pointers, references, and dynamically allocated regions, as well as gotos and other tricky parts of the language, but I still expect to see some skeletons falling out of the closet when implementing this for C++, not only C.
Drawbacks
Static slicing, as an algorithm that doesn’t know anything about input values, suffers from the same issues that all static techniques do, meaning that without heuristics, it’ll have to do very rough guesses, possibly leaving a far too big program slice. However, with the symbolic values the analyzer holds, this could be improved, turning this into conditioned slicing, as described in [1]. This however is only planned as a followup work after GSoC.
For this reason, this project would be developed as an alternative approach to some of the techniques used in BugReporter, as an optional off-by-default analyzer feature.
—=== Questions ===—
What do you think of this approach?
Do you think that implementing this algorithm is achievable, but tough enough task for GSoC?
Would you prefer to see a general program slicing library, or an analyzer-specific implementation? Traversing the ExplodedGraph would be far easier in terms of what I want to achieve, but a more general approach that traverses the CFG (like llvm::DominatorTree[5]) could be beneficial to more developers, but possibly at the cost of not being able to improve the prototype with the symbolic value information the analyzer holds.
References, links
[1] Gallagher, Keith, and David Binkley. “Program slicing.” 2008 Frontiers of Software Maintenance.
[2] Tip, Frank. A survey of program slicing techniques. Centrum voor Wiskunde en Informatica, 1994.
[3] Weiser, Mark. “Program slicing.” Proceedings of the 5th international conference on Software engineering. IEEE Press, 1981.
[4] Binkley, David. “Precise executable interprocedural slices.” ACM Letters on Programming Languages and Systems (LOPLAS) 2.1-4 (1993): 31-45.
[5] http://llvm.org/doxygen/classllvm_1_1DominatorTree.html
Link to previous GSoC related letter I sent: http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html
Cheers,
Kristóf Umann