I am trying to implement a checker for Clang Static Analyzer which needs to follow function calls. In some testcases, some calls can be nested to facilitate recursion. I wanted to understand how the Static Analyzer handles recursion (or cyclic function calls, for e.g. f1 calls f2, f2 calls f3, f3 calls back f1). Is there a limit to which the Static Analyzer will keep hopping into the next function before identifying a cycle and skipping the next function call to evaluate the rest of the function body?
There’s a limit of 4 nested calls. It doesn’t care whether it’s recursion or just nested calls. That’s mostly because our interprocedural analysis is explosive enough in complexity as it is to go any deeper. We’re also making an exception for “small” functions that don’t introduce any branches, these can go much deeper (branches are important because every branch doubles our analysis time as we have to explore both directions).
In that case, is there any way to skip to checking the other parts of the code, which are neglected because they are not reachable in the current nested call? For e.g. consider a recursive function f(int a), which calls f(a-1) somewhere in its body. The base case occurs when a = 1. Now, say we start the analysis with f(10), then because of the limit on depth of nesting, we will never reach the call to f(1), due to which static analyzer will rule out the base case as an unreachable part of the code, which is certainly not the case. Is there any work around to this?
One possible solution which I came up with was to check only the function definitions and not the calls themselves (so that we don’t bias any part of the code due to argument values. But, I have neither found any way to direct the Static Analyzer to prevent analyzing a function/method nor make it so that it checks a function/method twice.
1 #include "stdlib.h"
2
3 int fb(int a) {
4 if (a > 1) {
5 return fb(a-1);
6 }
7 char *b = (char *)malloc(sizeof(char));
8 free(b);
9 free(b);
10 return a;
11 }
12
13 int main() {
14 fb(10000);
15 return 0;
16 }
Recursion also reduces code coverage. Is there any way to ensure that lines 7-10 are executed, as they currently seem to be skipped resulting in a false negative?
Hi,
AFAIK there is no way around this unfortunately. Its on our radar and we plan to tackle this problem somewhere around Q4.
PS: one could fiddle with the internal max block visit count but it will hinder performance a lotand it still wouldnt guarantee you anything.
PS2: Ive seen similar examples with loops as well.
Yeah we literally simulate the code step by step so we can’t proceed until we’ve simulated it all. It’s theoretically possible to “widen” the state to continue the analysis, and that’s actually what we do at the call site if we run out of resources modeling recursion (so we’ll typically analyze return 0
in main()
even if we can’t simulate fb(10000)
). But inside the same function, the precision of an analysis that doesn’t understand what the initial half of the function is doing will probably be unacceptably low.
True, widen should be a good choice to start with. But for the recursion in this case, widen will detect the double-free error but in general failed to track the value of a. The precision may be unacceptably low like you said, if I got you right. I am wondering whether CSA should consider to provide a speical program point for recursion, which will be triggered when the recursive function is being called 4 times. Then at this program point, the checker writer could have more control over the execution of a recursive function. (tradeoff precision for code coverage)
The checker writer cannot have more control over that because all checkers operate on the same analysis graph; this decision has to be made for all checkers simultaneously.