[analyzer][GSoC] Problem Statement: Improving BugReporter with static backward program slicing

Hi!

In this letter, I’d like to describe a particular problem in the Clang StaticAnalyzer’s BugReporter class that I’d like to tackle in a Google Summer of Code project this summer. I’ll show real-world examples on some of its shortcomings, and propose a potential solution using static backward program slicing. At last, I’ll ask some specific questions. I’d love to hear any and all feedback on this!

This is a problem statement, not a proposal. I plan to send out a formal proposal by Friday (but not later then Saturday), that will contain more details on both the problem and the solution. I don’t introduce myself or my previous works within the project, that will also be detailed in my upcoming letter. I also plan to incorporate the feedback I’ll receive to this letter.

—=== BugReporter constructs bad reports ===—

What does the BugReporter do?

After the Static Analyzer found an error, the BugReporter receives an ExplodedNode, which, accompanied by its predecessors, contains all the information needed to reproduce that error. This bugpath is then shortened with a variety of heuristics, such as removing unrelated function calls, unrelated branches and so on. BugReporter by the end of this process will construct a PathDiagnostic object for each report, that is, ideally, minimal.

Example 1.

Consider the following code example:

1 // leak.cpp
2 int *global_ptr;
3
4 void save_ext(int storeIt, int *ptr) {
5 if (storeIt)
6 global_ptr = ptr;
7 }
8
9 void test(int b) {
10 int *myptr = new int;
11 save_ext(b, myptr);
12 delete global_ptr;
13 }

It’s clear that if test is invoked with b’s value set to true, there is no error in the program. However, should b be false, we’ll leak memory.

$ clang -cc1 -analyze -analyzer-checker=core,cplusplus leak.cpp

image.png

The Static Analyzer is able to catch this error, but fails to mention the call to save_ext entirely, despite the error only occurring because the analyzer assumed that storeIt is false. I’ve also attached the exploded graph leak.svg that demonstrates this.

Example 2.

Consider the following code example:

1 // divbyzero.cpp

2 void f() {
3 int i = 0;
4 (void) (10 / i);
5 }
6
7 void g() { f(); }
8 void h() { g(); }
9 void j() { h(); }
10 void k() { j(); }

Its clear that a call to f will result in a division by zero error.

$ clang -cc1 -analyze -analyzer-checker=core divbyzero.cpp
image.png

Again, the Static Analyzer is plenty smart enough to catch this, but the constructed bug report is littered with a lot of useless information – it would be enough to only show the body of f, and, optionally where f itself was called. For the sake of completeness, I’ve attached divbyzero.svg that contains the exploded graph for the above code snippet.

The above examples demonstrate that BugReporter sometimes reduces the bugpath too much or too little.

—=== Solving the above problem with static backward program slicing ===—

What is static backward program slicing?

A program slice consists of the parts of a program that (potentially) affect the values computed at some point of interest, called the slicing criterion. Program slicing is a decomposition technique that elides program components not relevant to the slicing criterion (which is a pair of (statement, set of variables)), creating a program slice[1][2]. Static slicing preserves the meaning of the variable(s) in the slicing criterion for all possible inputs[1]. Backward slices answer the question “what program components might affect a selected computation?”[1]

While statement-minimal slices are not necessarily unique[3], Weisel developed a popular algorithm that constructs one. In essence, his fix-point algorithm constructs sets of relevant variables for each edge in between node i and node j in a CFG graph, from which he constructs relevant statements. The fix-point of the relevant statements set is the slice itself.

One of the characteristic of his algorithm is that the resulting program slice will be executable. However, our problem doesn’t require the code to be executable, so we could use a more “aggressive” approach that creates a smaller slice. An improvement to his algorithm is presented in [4].

How does this relate to BugReporter?

We can show that using Weiser’s algorithm, issues raised in Example 1. and Example 2. can be improved upon.

For example 1., the statement-minimal program slice with the criterion (13, ) will contain the statements 5 and 6, and for example 2., the statement-minimal program slice with the criterion (4, i) won’t contain anything but statements 3 and 4. For the latter, we can even improve the algorithm to also contain statement 7, where a call to f is made.

The analyzer, as stated earlier, gives BugReporter an ExplodedNode, from which the slicing criterion must be constructed. The statement corresponding to this node, coupled with the interesting regions the checker itself marked could be used to construct this slicing criterion.

Challenges

While the algorithm Weiser developed along with the improvements made by others are interprocedural, I would imagine that in implementation, it would be a challenging step from an intraprocedural prototype.

Several articles also describe pointers, references, and dynamically allocated regions, as well as gotos and other tricky parts of the language, but I still expect to see some skeletons falling out of the closet when implementing this for C++, not only C.

Drawbacks

Static slicing, as an algorithm that doesn’t know anything about input values, suffers from the same issues that all static techniques do, meaning that without heuristics, it’ll have to do very rough guesses, possibly leaving a far too big program slice. However, with the symbolic values the analyzer holds, this could be improved, turning this into conditioned slicing, as described in [1]. This however is only planned as a followup work after GSoC.

For this reason, this project would be developed as an alternative approach to some of the techniques used in BugReporter, as an optional off-by-default analyzer feature.

—=== Questions ===—

What do you think of this approach?
Do you think that implementing this algorithm is achievable, but tough enough task for GSoC?
Would you prefer to see a general program slicing library, or an analyzer-specific implementation? Traversing the ExplodedGraph would be far easier in terms of what I want to achieve, but a more general approach that traverses the CFG (like llvm::DominatorTree[5]) could be beneficial to more developers, but possibly at the cost of not being able to improve the prototype with the symbolic value information the analyzer holds.

References, links

[1] Gallagher, Keith, and David Binkley. “Program slicing.” 2008 Frontiers of Software Maintenance.
[2] Tip, Frank. A survey of program slicing techniques. Centrum voor Wiskunde en Informatica, 1994.
[3] Weiser, Mark. “Program slicing.” Proceedings of the 5th international conference on Software engineering. IEEE Press, 1981.
[4] Binkley, David. “Precise executable interprocedural slices.” ACM Letters on Programming Languages and Systems (LOPLAS) 2.1-4 (1993): 31-45.
[5] http://llvm.org/doxygen/classllvm_1_1DominatorTree.html

Link to previous GSoC related letter I sent: http://lists.llvm.org/pipermail/cfe-dev/2019-February/061464.html

Cheers,
Kristóf Umann

Somehow the images and the attached files were left out, please find them here:

(Attachment divbyzero.svg is missing)

(Attachment leak.svg is missing)

Ahaaaaa, ooooookkkkk, iiiiiiii seeeeeeeeee!

Sounds fascinating, must-have and hard.

My primary question is how much do you think this will put stress on the checkers. Like, how much more difficult would it be to write checkers when we require them to include the slicing criterion with their bug reports? How much of it would we (i.e., BugReporter) be able to infer ourselves?

One more question about Example 1. Do you think this particular example could be solved by developing some sort of NoStoreFuncVisitor but for MallocChecker? I.e., in addition to emitting notes like "returning without initializing variable x", it could emit notes like "returning without escaping pointer p". If such visitor is developed, what would be the differences between your proposal and whatever behavior we’ll get with such visitor? In other words, is it possible to construct an example with the “use of uninitialized variable” checker that, like Example 1, has vital pieces of information missing, given that this checker already uses NoStoreFuncVisitor?

I mean, it’s most likely a terrible idea to develop such visitor for MallocChecker, because the only reason this visitor works more or less reliably is the existence of const-qualifiers. For MallocChecker they don’t exist, so it’s going to be hard to figure out if a function was supposed to escape the pointer.

Hi,

Ahaaaaa, ooooookkkkk, iiiiiiii seeeeeeeeee!

Sounds fascinating, must-have and hard.

My primary question is how much do you think this will put stress on the checkers. Like, how much more difficult would it be to write checkers when we require them to include the slicing criterion with their bug reports? How much of it would we (i.e., BugReporter) be able to infer ourselves?


Say, let’s take Example 1. You describe the slicing criterion as:

(13, )

The value of the dynamically allocated into object remains the same regardless of whether the object is stored in global_ptr or not, so the slice doesn’t need to include line 5 or 6. Therefore i think that the slicing criterion you proposed is not what we’re looking for, and the real slicing criterion we’re looking for is:

(13, <liveness of >)

Like, we’re mapping every dynamically allocated object to an imaginary heap location that represents its current liveness, and include that imaginary variable, rather than the object itself, in the slicing criterion. Line 6 would affect liveness of the object on line 13 (if it’s stored into a global, it’s alive; otherwise it’s not), therefore it’ll be part of the slice. So, am i understanding correctly that you’re proposing a checker API that’ll look like this:

BugReport *R = new BugReport(Node, …);
R->addLivenessBasedSlicingCriterion(HeapSym);

?

I guess we can infer the statement of the slicing criterion from the Node, but i’m not entirely sure; see also below.

I’d actually love it if you elaborate this example further because it’s fairly interesting. Like, we know that the assignment affects the liveness information, but how would the slicing algorithm figure this out? Do you have a step-by-step description of how the algorithm behaves in this case?

I think leak like slicing criterion will be on of the hardest part of this work. But leak related checks are likely to have isSuppressOnSink set to true. I would suggest to not to use slicing for such checks until we figure out the proper way. It would be really great if we found a way to manage this without relying on liveness analysis, e.g. we could have a special mode that preserves all the escapes of the value and the control dependencies of those escapes (since the value of the allocated int could be changed later on if its address was stored to the global ptr). I did not study the subject in depth yet so let me know if you disagree.


Let’s take another example i just came up with. Consider alpha.unix.PthreadLock - a checker that finds various bugs with mutexes, such as double locks or double unlocks. Consider code:

1 pthread_mutex_t mtx1, mtx2;
2
3 void foo(bool x1, bool x2) {
4 if (x1)
5 pthread_mutex_lock(&mtx1);
6 if (x2)
7 pthread_mutex_lock(&mtx1);
8 // …
9 }

In this example we’ll report a double lock bug due to a copy-paste error: line 7 should probably lock &mtx2 rather than &mtx1.

The whole program is relevant to the report. Without line 5, there would only be a single lock. It transitions the mutex object from state “unknown” to state “locked”.

I don’t think the Analyzer would currently do a bad job at explaining the bug; it’s pretty straightforward.

What would be the slicing criterion in this case? As far as i understand, it’ll be

(7, )

In this case “lockedness” is, again, an “imaginary heap location” that contains metadata for the mutex. More specifically, it is a GDM map value. How would a checker API look in this case? How would it even describe to the BugReporter what to look at, given that currently only the checker understands how to read its GDM values?

My random guess is:

BugReport *R = new BugReport(Node, …);
R->addGDMBasedSlicingCriterion(MutexMR);

I am hoping for a very easy to use interface based on interesting symbols/regions (i.e. some checks with visitors might work out of the box, checks that did not mark anything interesting will not trigger the slicing). I might be missing something, but if the user marked the memory region for the mutex interesting thus we used “value of” the mutex as slicing criterion, we should include both line 5 as the function might mutate the value of the mutex.

Of course, it would still be possible for a checker author to shoot herself in the foot by misusing interestingness. Maybe we could add some sanity checks based on the statements at the bug report and uniqueing location. But I think it is not possible to make such a feature bullet proof.

Yeah, this particular example would be fixed, but we do not know how noisy it would be in general. One way to make it a bit less noisy could be something like what the NoStoreFuncVisitor already does for IVars. We could also add some syntactic checks to see if the method/function actually has the possibility to escape the region.

I am not familiar with Obj-C, but I suspect it would be possible to trick NoStoreFuncVisitor, or at least the syntactic checks in potentiallyWritesIntoIvar .
I think the main value of slicing here could be to get rid of those fully syntactic heuristics and replace them by something smarter. Does this make sense?

The thing in NoStoreFuncVisitor for IVars looks like it’s just saying “let’s see if there’s a data dependency on at least one statement in this call”. Given that the statement found by such AST matcher was not executed on the bug’s execution path, there must be at least one control flow dependency in this function as well.

I think this pattern-matching may only have false negatives, i.e. it’s possible to write into a variable/field/ivar without producing an assignement-operator to a DeclRefExpr for that variable/field/ivar, but it’s not possible to write an assignment into a DeclRefExpr for that variable/field/ivar without modifying the variable/field/ivar in the process. A more sophisticated analysis may find situations when such write is always done via an aliasing pointer, but i cannot think of anything else. Well, there’s also the self-assignment cornercase, i.e. x = x, but that’s esoteric.

I just realized after reading your answer how ambiguous my email was. When I said it is possible to trick the NoStoreFuncVisitor what I meant was to prevent it from emitting notes and thus pruning interesting parts of the bug path. But I am glad that we are actually on the same page regarding the value of detecting control dependencies. The example 4 you provided is a very compelling one :slight_smile:

Thank you for giving this this much thought!!! I reply somewhat slowly because I’m trying to keep things fairly formal and precise, an area in which I still have to improve on.

So, in short, Example 1. was a poor choice. You’re right, the value of isn’t affected by what points to it – using the traditional Weiser algorithm wouldn’t contain statement 5 and 6. In a sense, your mutex code is similar to Example 1, we’re not interested, at least in this case, what the actual value of the mutex is.

For now, I’m only planning to use this technique to reason about values of certain variables. Example 1. wanted to demonstrate how BugReporter is prone to make a too short of a bugpath – however, the following example can show this as well:

asd.cpp

1 void useInt(int);
2
3 int getInt(int x) {
4 int a;
5
6 if (x > 0)
7 a = 3;
8 else
9 a = 2;
10
11 return a;
12 }
13
14 int g();
15
16 int main() {
17 int arr[10];
18
19 for (int i = 0; i < 3; ++i)
20 arr[i] = 0;
21
22 int x = g();
23 int n = getInt(x);
24 useInt(arr[n]);
25 }

image.png

The attached ExplodedGraph demonstrates that the analyzer assumed that n == 3, but the bugreport doesn’t mention that.

Alright, with my original point being made properly, let’s address the raised questions.

Q: My primary question is how much do you think this will put stress on the checkers. Like, how much more difficult would it be to write checkers when we require them to include the slicing criterion with their bug reports? How much of it would we (i.e., BugReporter) be able to infer ourselves?

(Attachment asd.svg is missing)

If i understand correctly (hmm, there seem to be image problems again), in this bug report it’s about doing trackNullOrUndefValue() over n whenever we track the uninitialized value back to a load from an array with index n - i think it easily gets us straight to a = 3 and it doesn’t require introducing any new analyses.

Let’s hope this works now:

Yup, works now!

What i just proposed should hopefully end up looking like this:

1 void useInt(int);
2
3 int getInt(int x) {
4 int a;
5
6 if (x > 0)
7 a = 3;
^ // (8) Assigning 3 to 'a’

8 else
9 a = 2;
10
11 return a;
12 }
13
14 int g();
15
16 int main() {
17 int arr[10];
18
19 for (int i = 0; i < 3; ++i)
**^~~~~~~~~~~~~~~~~~~~~~~~~~~ (1)–(6) Looping around the loop**

20 arr[i] = 0;
21
22 int x = g();
23 int n = getInt(x);
_**^~~~~~~~~ // (7) Calling getInt**_
_**^~~~~~~~~ // (9) Returning ‘3’*_
*^~~~~ // (10) Assigning ‘n’ to ‘3’

24 useInt(arr[n]);
^~ // (11) Loading uninitialized value at index ‘3’*
**
~^~~~ // (12) Use of uninitialized value!
*

25 }

And all we need is to add note (11) by pattern-matching the current statement in trackExpressionValue() (which is a pretty good start regardless) and then invoke another trackExpressionValue() recursively over the index expression.

I mean, assigning 3 to ‘n’ :slight_smile:

Instead we can also have a look at the execution path on the ExplodedGraph that goes through the true-branch and see if the value remains null when it exits the if-branch. Kristof, you’re saying that you plan to do the slicing over the ExplodedGraph - is that what you mean? This could work as long as the other branch is actually explored by the analyzer. If it isn’t, it’ll be almost impossible to detect, and i don’t know how often would this happen, but there’s a good chance it’s going to be much more rare than us having problems with such highlighting right now.

We can also come up with a completely separate CFG-based analysis for this purpose. This is probably the most precise thing to do, because the problem is essentially an all-paths problem (which is why loss of coverage in the ExplodedGraph would bite us). I cannot estimate how difficult such analysis would be to implement (probably pretty scary), but i think Kristof wasn’t looking in this direction.

I guess it depends on what we want to do.

Static slicing isn’t defined for the ExplodedGraph, I’ll have show that we can transform an ExplodedGraph into a CFG, or define the algorithm on it. From afar, it’s very hard to tell how difficult this would be. When looking for a long-term solution, I think it would be better to find an approach where we won’t have to worry about whether the analyzer explored everything or not. On the other hand, we could eventually turn this backward static slicing into a backward conditional slicing with the constraints the analyzer gathered, should we chose to go with an ExplodedGraph based algorithm.

So we’ll have to balance these two. Greater accuracy with a chance that the slice will be too big (CFG), or not as accurate but resulting in a smaller slice (ExplodedGraph). I am not yet sure which one I’d like to pursue, but the latter would probably fit our needs better.

Id suggest first starting with a syntax-based approach (1) because it sounds to me that the approach that we chose for identifying dependencies on paths that aren’t part of the bug path is orthogonal to actually making use of that information to produce notes. Once we learn how to make use of this information, we’ll have a taste of this medicine and see if it works. Then we’ll see if we need to transition to (2) or (3) depending on such evaluation.

I’ll be honest, I don’t really understand what you mean here. Would you like to see a more down-to-earth syntax-based solution as an exercise “for the big thing”? If so, I’m not sure what you mean specifically.

I kinda gathered some understanding of the problem and the solution and trying to come up with some sort of a plan, and for that purpose i’m trying to step back from “we should totally do this cool technique” towards a more “adult”, “boring” point of view that consists in asking, “what do we need to do?”.

For now i’m seeing that the work on this problem can be decomposed into three chunks:

(1) Collect slicing criteria from various sources,
(2) Perform slicing to discover data and control dependencies,
(3) Use information about dependencies to improve the bug report.

Chunk (3) sounds straightforward: we just enable or disable path pieces based on information we already have. I don’t fully understand how it’ll go, but i don’t see any unknowns here.

Chunk (1) sounds like a large amount of routine, repetitive work: convert sources of slicing criteria, such as checkers, to the new system one-by-one. There is certain amount of known unknowns here: some sources may be tricky to take advantage of (eg., liveness-related sources), some may put stress on the checkers.

Chunk (2) is the research-intensive part: it has an easy 80% solution (AST matching based detection that has already been implemented and proved useful for one of the checkers) and an idea to improve upon it by implementing a more sophisticated analysis.

Because chunk (2) is research-intensive, we need experimental data that we currently don’t have. In order to avoid spending a lot of time on something that isn’t going to work, we should try to gather that data as soon as possible in our plan.

Because chunk (1) is repetitive, we don’t need to implement it completely; we can pick one or two sources and get them right.

Therefore here’s the plan that emerges:
i. Pick one or two slicing criteria sources and implement them.
ii. Implement a simple AST-based algorithm for discovering dependencies.
iii. Apply the information produced by that algorithm to the bug report.
iv. Run the updated analyzer on real-world codebases and see what problems remain.
v. Implement solutions to these problems.

Solutions on step v. may or may not be about coding a more sophisticated slicing algorithm. They might as well be improvements in chunk (1), i.e. discovering more slicing criteria, or they may be about a “horizontal” improvement in the existing primitive algorithm (eg., cover more forms of control flow or different variants of data dependencies, which should have been a separate chunk).

That’s roughly how i imagine it.

This. Is. Amazing. Thank you so much for taking the time to assemble this mail! I agree with everything stated above – I think I’m as ready as I’ll ever be to put together a nice looking proposal.

I kinda gathered some understanding of the problem and the solution and trying to come up with some sort of a plan, and for that purpose i’m trying to step back from “we should totally do this cool technique” towards a more “adult”, “boring” point of view that consists in asking, “what do we need to do?”.

For now i’m seeing that the work on this problem can be decomposed into three chunks:

(1) Collect slicing criteria from various sources,
(2) Perform slicing to discover data and control dependencies,
(3) Use information about dependencies to improve the bug report.

Chunk (3) sounds straightforward: we just enable or disable path pieces based on information we already have. I don’t fully understand how it’ll go, but i don’t see any unknowns here.

Chunk (1) sounds like a large amount of routine, repetitive work: convert sources of slicing criteria, such as checkers, to the new system one-by-one. There is certain amount of known unknowns here: some sources may be tricky to take advantage of (eg., liveness-related sources), some may put stress on the checkers.

Chunk (2) is the research-intensive part: it has an easy 80% solution (AST matching based detection that has already been implemented and proved useful for one of the checkers) and an idea to improve upon it by implementing a more sophisticated analysis.

Because chunk (2) is research-intensive, we need experimental data that we currently don’t have. In order to avoid spending a lot of time on something that isn’t going to work, we should try to gather that data as soon as possible in our plan.

Umm, what do you mean under “data”? Like, proving how the examples I showed would be solved by this technique?

Umm, what do you mean under “data”? Like, proving how the examples I showed would be solved by this technique?

I roughly refer-ahead-to the output of step iv. - it should do the trick. See if we’ve already solved the problem at step iii. If not, take a few real-world reports that are still bad, try to understand what’s wrong with them. Eg., are we not stuffing enough slicing criteria into them? Or is a more sophisticated slicing algorithm necessary to discover some of the dependencies? Or are we instead discovering too many dependencies? Or is it a completely unrelated bug that’s suddenly causing havoc? Rinse and repeat until most of our reports look great :slight_smile:

Alright, cheers! I’ll take my time to put something really nice together by tomorrow.