[analyzer] Rough sketch of the algorithm for "Enhancing bug reporting with static backward program slicing"

Hi!

I’d like to share some thoughts about my GSoC project, “Enhancing bug reporting with static backward program slicing”[1].

My proposal is very specific about what I’m aiming to improve on, but vague on the how part of it. This mail aims to add clarify some of this.

Program slicing is essentially a technique of discovering data and control dependencies to the slicing criterion, which is a (statement, {set of variables}) pair. Fortunately, tools for control dependencies, namely, dominator set calculations are already implemented, but seems to be unstable with clang’s CFG. It would be a great tool if I were able to fix it.

While my proposal states that I plan to implement an AST-based solution, I’m actually not sure that this is the optimal approach. We could instead inspect CFG blocks in a backwards manner (coupled with the analyzer’s call stack), and gather relevant variable in the meanwhile.

What would also be really great is to assist this traversal with the information the analyzer already has, essentially only inspecting basic blocks the analyzer actually visits.

Here’s my idea for an algorithm (from the point of the slicing criterion already being constructed):

https://docs.google.com/document/d/1Lx867o3meyQsj0WKOSWMdosSBdw2MUq1aIxyPJM6iLU/edit?usp=sharing

Please note that this is a veeery rough sketch, I didn’t think about every edge case that exists, but I think it’s an okay start.

[1] https://docs.google.com/document/d/1ci1BCAKojPlqIxIw1J_K2dnATA3z01PuwR_vHJS55TI/edit

My primary question is, how do you plan to use the data that you obtain via your analysis?

Like, do you plan to backtrack the value of all relevant variables and/or expressions in the final bug report that were also encountered along the bug path? If yes, then is it possible that the slicing criterion gets updated in the process and you’ll have to re-run the CFG-based analysis to take it into account?

What would also be really great is to assist this traversal with the information the analyzer already has, essentially only inspecting basic blocks the analyzer actually visits.

You mean visits on the current bug path or visits on the equivalence class of bugs or visits in general during analysis?

Regardless, for many problems ideally we should traverse basic blocks that the analyzer didn’t visit (eg., if the function didn’t initialize the variable on the current path, we’re interested in parts of the code in which it did initialize the variable, even though we didn’t necessarily have time to visit them).

It actually sounds as if all problems that we’re trying to solve here can be classified into “must-have-happened” problems (eg., “variable must-have been declared without initialization”, “variable must-have been set to nullptr”, “memory must-have been allocated”) and “should-not-have-happened” problems (eg., “variable should-not-have been initialized”, “null value should-not-have been overwritten”, “pointer should-not-have escaped”).

For must-have-happened problems, i’m recently under an impression that we should suppress the bug report entirely when we fail to solve them (i.e., if fail to explain to the user where exactly does this happen, then the report is impossible to understand anyway). This is currently a very big problem for our null dereference checker on some codebases, especially because it uses this tracking for suppressions as well (aka inlined defensive checks), so when it fails to track the variable it’s likely a legit false positive as well, not simply a hard-to-understand report.

For should-not-have-happened problems i’m much more confused. We’re talking about looking for places where it could have happened and then trying to explain to the user why none of them have actually happened. I’m not sure what are the consequences of failing to explain to the user why didn’t a particular piece of code do something, because i’ve no idea how do users intuitively figure out which code could have done these things and which clearly couldn’t.

Please let me know if I didn’t address something you mentioned!

My primary question is, how do you plan to use the data that you obtain via your analysis?

Gather relevant statements. Keep ExplodedNodes in the bugpath for which PathDiagnosticLocation::getStmt() is in the set (or, something like that at least). But honestly, I didn’t think too much about that yet :^) I guess we could just gather ExplodedNodes rather than statements, we’ll see.

Like, do you plan to backtrack the value of all relevant variables and/or expressions in the final bug report that were also encountered along the bug path? If yes, then is it possible that the slicing criterion gets updated in the process and you’ll have to re-run the CFG-based analysis to take it into account?

No, the slicing criterion is what it is, and will not change. The set of relevant statements to the slicing criterion define the actual program slice, which is basically the thing we’re going for in this project. When it turns out that a value of a variable directly affects a variable in the slicing criterion (either through data or control dependency), we just add it to the set of relevant variables, and then if something in the set of relevant variables is affected (some statements or variables turn out to be a control/data dependencies to them), then it’s added to the respective relevancy sets. Should we only traverse the basic blocks the analyzer visited, I think we can pull off the slicing with a single pass.

What would also be really great is to assist this traversal with the information the analyzer already has, essentially only inspecting basic blocks the analyzer actually visits.

You mean visits on the current bug path or visits on the equivalence class of bugs or visits in general during analysis?

Regardless, for many problems ideally we should traverse basic blocks that the analyzer didn’t visit (eg., if the function didn’t initialize the variable on the current path, we’re interested in parts of the code in which it did initialize the variable, even though we didn’t necessarily have time to visit them).

Well I mean slicing by definition isn’t intended to do that. The entire point of it is to gather the smallest slice related to the slicing criterion, and my project aims to fill in the gaps where we don’t provide enough information.

It actually sounds as if all problems that we’re trying to solve here can be classified into “must-have-happened” problems (eg., “variable must-have been declared without initialization”, “variable must-have been set to nullptr”, “memory must-have been allocated”) and “should-not-have-happened” problems (eg., “variable should-not-have been initialized”, “null value should-not-have been overwritten”, “pointer should-not-have escaped”).

For must-have-happened problems, i’m recently under an impression that we should suppress the bug report entirely when we fail to solve them (i.e., if fail to explain to the user where exactly does this happen, then the report is impossible to understand anyway). This is currently a very big problem for our null dereference checker on some codebases, especially because it uses this tracking for suppressions as well (aka inlined defensive checks), so when it fails to track the variable it’s likely a legit false positive as well, not simply a hard-to-understand report.

I think this set calculating approach inherently gathers far more information, allowing us to make better judgement on whether we should suppress the report.

For should-not-have-happened problems i’m much more confused. We’re talking about looking for places where it could have happened and then trying to explain to the user why none of them have actually happened. I’m not sure what are the consequences of failing to explain to the user why didn’t a particular piece of code do something, because i’ve no idea how do users intuitively figure out which code could have done these things and which clearly couldn’t.

Im really at a loss here :slight_smile: Can you provide some example as to what a problematic “must-have-happened” bug report would look like, and how would you like to see it improved? Same for “should-not-have-happened”. Because as I understand it, and I might be wrong, you have problems with this:

int a; // declaring a without initializing

if (rand()) // assuming that the condition is false
a = 5;

print(a); // a is undef

And prefer to see this:

int a; // declaring a without initializing

if (rand()) // should this condition be false, a’s value will be indeterministic
a = 5; // writing to a skipped

print(a); // a is undef

and I just don’t see how this would be possible with slicing at all. Also, I can’t see how this would scale to real-world production code.

Please let me know if I didn’t address something you mentioned!

My primary question is, how do you plan to use the data that you obtain via your analysis?

Gather relevant statements. Keep ExplodedNodes in the bugpath for which PathDiagnosticLocation::getStmt() is in the set (or, something like that at least). But honestly, I didn’t think too much about that yet :^) I guess we could just gather ExplodedNodes rather than statements, we’ll see.

Like, do you plan to backtrack the value of all relevant variables and/or expressions in the final bug report that were also encountered along the bug path? If yes, then is it possible that the slicing criterion gets updated in the process and you’ll have to re-run the CFG-based analysis to take it into account?

No, the slicing criterion is what it is, and will not change. The set of relevant statements to the slicing criterion define the actual program slice, which is basically the thing we’re going for in this project. When it turns out that a value of a variable directly affects a variable in the slicing criterion (either through data or control dependency), we just add it to the set of relevant variables, and then if something in the set of relevant variables is affected (some statements or variables turn out to be a control/data dependencies to them), then it’s added to the respective relevancy sets. Should we only traverse the basic blocks the analyzer visited, I think we can pull off the slicing with a single pass.

We’ll see about that single pass thing though. I’m gathering some tricky examples in the document I shared and working on a neat algorithm.

Ok, i think i have a little bit of clarity.

Let’s first talk about must-have-happened problems. For instance,

int *foo() {
return coin() ? new int : nullptr; // needs note
}
void bar() {
int *x = foo();
*x = 1; // warning: null dereference
}

In this case trackExpressionValue solves the “must-have-happened” problem of “x must-have been assigned a null pointer value” by displaying a note when foo() returns nullptr. This currently more or less works correctly - there are a lot of bugs but the overall idea behind trackExpressionValue is correct.

Example 1 in your document is another example of a must-have-happened problem: in order for the report to be valid, we need to show that ‘flag’ must-have-changed between line 17 and line 13. That’s something that the Static Analyzer currently cannot handle and you plan to improve upon it.

Hʏᴘᴏᴛʜᴇsɪs 1. All “must-have-happened” problems should be solved with a bug visitor. We don’t need any new analysis algorithm for that.

In particular, i believe that Example 1 can be solved by extending trackExpressionValue() with a bug visitor that detects control-flow-dependencies via this approach of yours:

“Check whether the statement is a control dependency of Statement 18 with dominator sets: 18 doesn’t post dominate 17, but every statement in between them does, meaning that 17 is a control dependency of 18.”

I.e., while ascending through ExplodedGraph, we pick interesting statements and perform this domination check on their predecessor nodes until we reach the control dependency, then we track the control dependency recursively by adding more visitors.

I think this is the first thing we should try on our GSoC.

The reason why i believe that Hypothesis 1 is true is that all the information that we need is fully contained in the bug path. If something must have happened on the path, then it’s probably in the path and we can figure it out by looking at the path. If something must have happened on the path and it’s not in the path, then why did we emit a warning in the first place?

  • Ádám Balogh

Hi!

I wanted to share some of my points as well but had little time to do so.

Artem identified two kinds of issues. One of which has all the information available in the bug path (in his terminology: must-have-happened) and the other which has not ( should-not-have-happened). The goal is the same in both of the cases, identify all dependencies including control and data dependencies. The main difference is that in one case we could use all the information available in the bug path while in the other we cannot.

  1. must-have-happened:
    Since the bug-path contains all the information we need, I do agree with Artem using bug path visitors might be sufficient. I do not know how reliable trackExpressionValue is in identifying data dependencies, but revising it could be one step in GSoC. For example, after quickly skimming through the implementation I am not sure if it is able to track the source of a constant value that was constant folded during the symbolic execution from different values. Only if we had a way to test this functionality in isolation quickly :slight_smile: I think making that possible would also be a valuable addition.

For control dependencies, it is great that LLVM already has some algorithms to calculate dominator sets and it might be possible to make that work for the Clang CFG.

In my opinion, if you only tackle these problems by the end of the summer your project already brought great value to the community.

  1. should-not-have-happened

This category cannot be solved only using visitors for sure. So we might end up with an implementation for this problem that is completely separate from the previous one (maybe except for getting control dependencies?). As Artem pointed out, we might query some information from the analyzer e.g. which functions were inlined. One option is to use a slicing algorithm here. However, we should make sure that the way a function is conservatively evaluated by the slicing algorithm and the analyzer is compatible. E.g.: if the analyzer does not invalidate a memory region during evaluating the call, the statement should not be considered as relevant for the variable in question. We have several challenges here, the analyzer reasons about memory regions while the slicing algorithm will probably not. Also, we need to do something with pointers, since we do not want to implement a full-fledged pointer analysis. We also need to decide how to represent arrays, structures etc.

I skimmed through the algorithm in the google docs, and I think there are some other details we need to work out. For example, you never remove anything from the set of relevant variables. Consider the following example:
x = 5;
y = x + 2;
y = 2;
z = y + 3;

Whit the slicing criterion (z = y + 3). Here, once y is overwritten with 2, it is no longer relevant, so x should not end up in the slice. The algorithm you proposed will not handle this correctly (yet). Also I am not sure if we actually need the relevant_variable_map or having a set will be sufficient.

Regards,

Gabor

Hi!

I wanted to share some of my points as well but had little time to do so.

Artem identified two kinds of issues. One of which has all the information available in the bug path (in his terminology: must-have-happened) and the other which has not ( should-not-have-happened). The goal is the same in both of the cases, identify all dependencies including control and data dependencies. The main difference is that in one case we could use all the information available in the bug path while in the other we cannot.

1. must-have-happened:
Since the bug-path contains all the information we need, I do agree with Artem using bug path visitors might be sufficient. I do not know how reliable `trackExpressionValue` is in identifying data dependencies, but revising it could be one step in GSoC. For example, after quickly skimming through the implementation I am not sure if it is able to track the source of a constant value that was constant folded during the symbolic execution from different values.

It should be able to, because that's how inlined defensive check suppressions work.

Hi!

I wanted to share some of my points as well but had little time to do so.

Artem identified two kinds of issues. One of which has all the
information available in the bug path (in his terminology:
must-have-happened) and the other which has not (
should-not-have-happened). The goal is the same in both of the cases,
identify all dependencies including control and data dependencies. The
main difference is that in one case we could use all the information
available in the bug path while in the other we cannot.

  1. must-have-happened:
    Since the bug-path contains all the information we need, I do agree
    with Artem using bug path visitors might be sufficient. I do not know
    how reliable trackExpressionValue is in identifying data
    dependencies, but revising it could be one step in GSoC. For example,
    after quickly skimming through the implementation I am not sure if it
    is able to track the source of a constant value that was constant
    folded during the symbolic execution from different values.

It should be able to, because that’s how inlined defensive check
suppressions work.

I recently finished (not yet published) my work on implementing control dependencies for clang’s CFG via LLVM’s IDFCalculator. Theoretically speaking, by simply tracking the variables in the block that is a control dependency of the block that is being tracked, we should be able to get a proper bug report for Example 1? From afar, it seems reasonable.

Allow me to elaborate.

Example 1.:

01 int flag;

02 bool coin();

03

04 void foo() {

05 flag = coin(); // no note

06 }

07

08 int main() {

09 int *x = 0; // x initialized to 0

10 flag = 1;

11 foo();

12 if (flag) // assumed false

13 x = new int;

14 foo();

15

16 if (flag) // assumed true

17 *x = 5; // warn

18 }

Example 2.: (note the change on line 15)

01 int flag;

02 bool coin();

03

04 void foo() {

05 flag = coin(); // no note

06 }

07

08 int main() {

09 int *x = 0;

10 flag = 1;

11 foo();

12 if (flag) // assumed false

13 x = new int;

14 foo();

15 x = 0; // x set to 0

16 if (flag) // assumed true

17 *x = 5; // warn

18 }

By tweaking trackExpressionValue a bit:

bool trackExpressionValue(const ExplodedNode *N,

const Stmt *S, BugReport &R,

bool IsArg=false,

bool EnableNullFPSuppression=true);

Should a block B in the bug path be a control dependency of N, track the expression value in the condition of B. With that, I believe we should get a note for statement 14. For statement 10, we still wouldn’t get a note saying that flag’s value was invalidated (I fear), but it’s more of a should-not-have-happened case. For Example 2 however, we’d get a pretty much perfect bug report.

Interesting.

Condition on line 12 would not be our control dependency - neither in example 1 nor in example 2, simply because there’s nothing interesting within either branches of that if-statement. The ‘new int’ assignment would have been interesting if we solved the should-not-have-happened problem of “x should not have been overwritten”, but for now we don’t do that.

At the same time, condition on line 16 would still be a control dependency of our report in both examples, because this is where the actual crash happens. So we will have a note on line 05 in both cases, even if it’s completely unnecessary in the second example.

We would indeed never have a note on line 10. This is, in fact, a dead store, this value is never used, so we wouldn’t be able to track anything back to it.

Interesting.

Condition on line 12 would not be our control dependency - neither in example 1 nor in example 2, simply because there’s nothing interesting within either branches of that if-statement. The ‘new int’ assignment would have been interesting if we solved the should-not-have-happened problem of “x should not have been overwritten”, but for now we don’t do that.

Thats my game plan exactly :slight_smile:

At the same time, condition on line 16 would still be a control dependency of our report in both examples, because this is where the actual crash happens. So we will have a note on line 05 in both cases, even if it’s completely unnecessary in the second example.

Why would it be? We’d never come across the nullptr dereference if it wasn’t for flag having its value reset on line 14.

We would indeed never have a note on line 10. This is, in fact, a dead store, this value is never used, so we wouldn’t be able to track anything back to it.

I messed up the numbering, I did in fact mean 11 when I wrote 10 :slight_smile:

This note is unnecessary because it’s not that surprising that the flag can be true in this case. It simply follows from the presence of the check: if the flag is always false, then this if-statement is dead code, which is a bug on its own. In example 1 the real thing that’s worth conveying is that the value of the flag changes between line 12 and line 16 (and also between line 10 and line 12). But i don’t see an immediate way to formalize this kind of reasoning.

Interesting.

Condition on line 12 would not be our control dependency - neither in example 1 nor in example 2, simply because there’s nothing interesting within either branches of that if-statement. The ‘new int’ assignment would have been interesting if we solved the should-not-have-happened problem of “x should not have been overwritten”, but for now we don’t do that.

Thats my game plan exactly :slight_smile:

At the same time, condition on line 16 would still be a control dependency of our report in both examples, because this is where the actual crash happens. So we will have a note on line 05 in both cases, even if it’s completely unnecessary in the second example.

Why would it be? We’d never come across the nullptr dereference if it wasn’t for flag having its value reset on line 14.

This note is unnecessary because it’s not that surprising that the flag can be true in this case. It simply follows from the presence of the check: if the flag is always false, then this if-statement is dead code, which is a bug on its own.

I somewhat disagree. I think the bug report is clearer when we argue about why we assume (and do not know) the value of flag. But I guess we’d really need to see the impact of this before reasoning about it. I don’t immediately see any major roadblocks that would prevent me from implementing this as an experiment within an hour or so, and gather some experience with it.

In example 1 the real thing that’s worth conveying is that the value of the flag changes between line 12 and line 16 (and also between line 10 and line 12).

But i don’t see an immediate way to formalize this kind of reasoning.

I hope it’ll work out with the combined solution to the “must-have-happened” and “should-not-have-happened” cases, but it’s hard to foresee it this early on.

Nice point, indeed this may be of interest. I think it’d be of much more interest if the Flag was initially initialized to 0. If we fail to justify that, our warning would still be well-justified, but in some cases it might be a nice piece of info to add indeded.

What i was actually trying to say is “hey, our approach is probably still not entirely perfect, but i’ve no idea what ‘perfect’ is and it shouldn’t really be preventing us from trying out your solution; we don’t have anything better anyway” :slight_smile: