Hi Edoardo,
What you are observing is something that is potentially easy to “fix”, but essentially requires additional domain-specific modeling in the static analyzer.
To understand what is going on, consider the following example:
void foo(int x) {
1 / x;
}
In this example we have a potential division-by-zero. If ‘foo’ is analyzed in isolation, we don’t know anything about the value of ‘x’. We can do one of two things:
(1) Warn that ‘x’ could potentially be zero.
(2) Do not emit a warning, but register there was a division and that ‘x’ cannot be zero after the division, and keep analyzing.
In both cases the value of ‘x’ is unconstrained, so we can look at #1 and #2 as two potential default behaviors. In the analyzer we do #2. Essentially, although the value of ‘x’ is unconstrained, we don’t warn here. The rationale is that the analyzer would emit a huge number of false positives for integers that are possibly perfectly safe. We take the same tactic for pointers, who all could hypothetically could be null.
Instead, we take the strategy in the analyzer to flag bugs like null pointers and division-by-zero if we have evidence belief there may be a problem. For example, if a pointer is checked against null, we have evidence to believe that the pointer could be null. Similarly, there may be some APIs that we know could return a null pointer, and thus we can possibly track that information and see if a pointer coming from one of these sources is dereferenced.
The same tactic can be taken with integers and divide-by-zero. Let’s look at one of your examples:
void f_fscanf()
{
int data;
fscanf(stdin, “%d”, &data);
/* POTENTIAL FLAW: Possibly divide by zero */
printIntLine(100 / data);
}
After the call to ‘fscanf’ the analyzer tracks that ‘data’ binds to some symbolic integer value. That value is considered to be underconstrained, just like in my example above. In this example however, we could track some additional information that it came from an unsafe API like fscanf. This information is something additional that we would need to model, as we probably would not want to do this for any arbitrary API call. Modeling this might not be too hard. For example, the symbolic integer value does have some information associated with it to tell us where it originated; we could possibly use this to consult an API list to determine if the integer value should be treated as unsafe. We’d then modify the divide-by-zero logic in the analyzer to emit a warning if an under constrained integer came from one of these sources. We’d probably could conjure up something fairly ad hoc at first for a few specific cases, and then generalize it. A while ago we experimented with generalized taint analysis in the analyzer, which is something that would also possibly be useful here.
Does that answer your question? Essentially we need a little additional modeling in the analyzer for these APIs.
Ted