Proposal for statistic UncheckedReturnValue checker

Hi clang,

Now in UncheckedReturnValue checker we check whether some functions drop privileges successfully. It’s the most critical case of UncheckedReturnValue. But it seems that UncheckedReturnValue checker should check many other cases. For example, the return values of “malloc”, “pthread_mutex_lock”.

And there may be another aspect that UncheckedReturnValue checker should check. In user defined function, there are some that their return value should always be checked, but we can’t figure out whether a return value should be checked without user specification. So here is a proposal for statistic UncheckedReturnValue checker: when most(e.g. 85%) return values of a function are checked, this checker flags the unchecked return values as defects.

For the second part, IMO we may need some thing like a simple symbol table to store how may return values of a function are checked and also the reference loc(is there any structure i can use, or i should store this in GDM? ). And then do the check and bug report in VisitEndAnalysis.

How does these sounds?

Hi clang,

Now in UncheckedReturnValue checker we check whether some functions drop privileges successfully. It's the most critical case of UncheckedReturnValue. But it seems that UncheckedReturnValue checker should check many other cases. For example, the return values of "malloc", "pthread_mutex_lock".

Hi Lei,

We may wish to extend this check to accept a list of functions whose values need to be checked (and how they should be checked). This could possibly be provided by the user, or a something we just '#include'. I disagree that checking the return value of 'malloc' is an invariant that all codebases care about, so some fine-grain control from the user might be necessary in order to make this useful.

And there may be another aspect that UncheckedReturnValue checker should check. In user defined function, there are some that their return value should always be checked, but we can't figure out whether a return value should be checked without user specification. So here is a proposal for statistic UncheckedReturnValue checker: when most(e.g. 85%) return values of a function are checked, this checker flags the unchecked return values as defects.

For the second part, IMO we may need some thing like a simple symbol table to store how may return values of a function are checked and also the reference loc(is there any structure i can use, or i should store this in GDM? ). And then do the check and bug report in VisitEndAnalysis.

How does these sounds?

I'm a huge fan of these kinds of statistical checks (for background, it was a big part of my thesis work in graduate school). I don't think the approach, however, is going to work without accumulating data across multiple functions (and most likely, multiple files) since your sample size will be way too small to draw statistically significant conclusions.

For example, without a sample size of at least 7 you won't even be able to flag anything, as with 5/6 ≈ 0.83, and 6/7 ≈ 0.857. The means in order to get a frequency of greater than 0.85 (but less than 1) you will need to see the return value checked at least 7 times within a function. Except for very large functions you're simply not going to get enough data. Right now 'VisitEndAnalysis' is only called when a GRExprEngine instance is done analyzing a single function; to properly do this check, you'd probably need to look at the functions analyzed across an entire codebase.

In general, I also think sample size needs to be taken into account to accurately take into account the statistical confidence. While 7 observations is probably good enough for most cases, it might not always be, and in other cases it might be way more data than you need to flag something as a warning (i.e., in the cases that "coincidence" isn't very likely at all). By taking into account sample size (either using traditional statistical or Bayesian techniques), one can actually compute an accurate confidence measure of how likely the deviation is truly a bug.

Accurately reasoning about sample size, however, is mostly a matter of refinement. The immediate problem is collecting the data. Right now we have no good apparatus for doing whole-program checks, or even checks that aggregate results from analyzing all the functions in a translation unit. Both are things we want to be able to do, but it requires extending the current checking infrastructure.

Ted

Hi Ted,

2010/10/25 Ted Kremenek <kremenek@apple.com>

Hi clang,

Now in UncheckedReturnValue checker we check whether some functions drop privileges successfully. It’s the most critical case of UncheckedReturnValue. But it seems that UncheckedReturnValue checker should check many other cases. For example, the return values of “malloc”, “pthread_mutex_lock”.

Hi Lei,

We may wish to extend this check to accept a list of functions whose values need to be checked (and how they should be checked). This could possibly be provided by the user, or a something we just ‘#include’. I disagree that checking the return value of ‘malloc’ is an invariant that all codebases care about, so some fine-grain control from the user might be necessary in order to make this useful.

Make sense.

And there may be another aspect that UncheckedReturnValue checker should check. In user defined function, there are some that their return value should always be checked, but we can’t figure out whether a return value should be checked without user specification. So here is a proposal for statistic UncheckedReturnValue checker: when most(e.g. 85%) return values of a function are checked, this checker flags the unchecked return values as defects.

For the second part, IMO we may need some thing like a simple symbol table to store how may return values of a function are checked and also the reference loc(is there any structure i can use, or i should store this in GDM? ). And then do the check and bug report in VisitEndAnalysis.

How does these sounds?

I’m a huge fan of these kinds of statistical checks (for background, it was a big part of my thesis work in graduate school). I don’t think the approach, however, is going to work without accumulating data across multiple functions (and most likely, multiple files) since your sample size will be way too small to draw statistically significant conclusions.

For example, without a sample size of at least 7 you won’t even be able to flag anything, as with 5/6 ≈ 0.83, and 6/7 ≈ 0.857. The means in order to get a frequency of greater than 0.85 (but less than 1) you will need to see the return value checked at least 7 times within a function. Except for very large functions you’re simply not going to get enough data. Right now ‘VisitEndAnalysis’ is only called when a GRExprEngine instance is done analyzing a single function; to properly do this check, you’d probably need to look at the functions analyzed across an entire codebase.

In general, I also think sample size needs to be taken into account to accurately take into account the statistical confidence. While 7 observations is probably good enough for most cases, it might not always be, and in other cases it might be way more data than you need to flag something as a warning (i.e., in the cases that “coincidence” isn’t very likely at all). By taking into account sample size (either using traditional statistical or Bayesian techniques), one can actually compute an accurate confidence measure of how likely the deviation is truly a bug.

Accurately reasoning about sample size, however, is mostly a matter of refinement. The immediate problem is collecting the data. Right now we have no good apparatus for doing whole-program checks, or even checks that aggregate results from analyzing all the functions in a translation unit. Both are things we want to be able to do, but it requires extending the current checking infrastructure.

Ted

Thanks very much for your reply.

The statistical checkers are more complicate than i thought. I think more background research is needed for me.

According to your words, i think we can separate statistical check to two steps: First begin a statistical check from analyzing a translation unit, and then move to the entire codebase.

It’s something like what need in inter-procedural analysis and whole program analysis(inter-file analysis). But statistical check needs much less information than the latter(in statistical check we don’t need function summary or serialized AST, right?), so maybe we need a mechanism independent of the “inter” analysis?

In most cases the information for a statistical check can be acquired from doing intra-procedural analysis to sample the data (e.g., analyze functions individually and gather observations). The problem is you just need to post-process all that data after analyzing a translation unit or an entire codebase. So it isn't whole-program analysis in the traditional program analysis sense (e.g., using summary based analysis, etc.), just some data crunching afterwards. Note that this kind of preprocessing doesn't have to be sophisticated. Prototyping something is simple; just write a script that post-processes some data that you store on the side. Integrating this completely into clang, however, is advantageous from an integration perspective (e.g., all error reporting goes through the same machinery).

One step I'm interested in taking the existing AnalysisConsumer and Checker objects is having Checker objects (potentially) live through the analysis of all functions (i.e., live through multiple instances of GRExprEngine). Such checkers would then be able to accrue data across multiple functions, and then do the kind of analysis you are talking about. This seems like a potentially essential step on the road to more sophisticated analyses, but the details are not fleshed out.

That would make it fairly trivial to calculate cyclomatic complexity, Halstead complexity and a few other metrics on a file as well. That could be interesting in some contexts. Right now as an experiment I'm doing that by printing the AST as XML and post-processing, but it would obviously be far faster inline.

Andrew