Hi,
This year we plan to experiment with function summaries in CSA. We have a budge roughly for 2 people to work on this.
Short-term plans:
^^^^^^^^^^^^^^^^^
We’d like to be able to read and use hand-written function summaries. A good example of such hand-written summaries are in Cppcheck. The existing StdLibraryFunctionsChecker is the entry point for our upcoming changes. I have created a patch which eliminates the macros in that checker, this way the used classes like ValueRange
become extensible [https://reviews.llvm.org/D73897]. E.g. we can add an argument constraint that must be true on all branches, if the constraint fails then we warn [https://reviews.llvm.org/D73898]. (Note that these patches are still WIP.)
What could be in a summary?
- Branches
- E.g. for f(x), if x is in [‘0’, ‘9’] then ret is not 0.
- This is already handled in StdLibraryFunctionsChecker.
- Argument constraints (a contsraint that must be true on all branches)
- Range constraint: E.g. x must be in [0, 255] on all branches. This is the
case for instance with isalpha(x) where if x is out of the range then the
program is ill-formed (undefined behaviour) according to the C standard. - Size requirements
E.g.: asctime_s(char *buf, rsize_t bufsz, const struct tm *time_ptr);
buf
size must be at leastbufsz
. - Not-null
- Not-uninitalized
- Not-tainted (?)
- Return value properties (?)
- E.g.: the return value must be checked always.
void foo() {
void *P = aligned_alloc(4, 8);
if (P == NULL) { // OK
}
}
void bar() {
void *P = aligned_alloc(4, 8);
void *X = P + 1; // expected-warning{{Use of return value that was not checked}}
}
Storing this information in summaries could provide configuration for checkers like this https://reviews.llvm.org/D72705.
Also statistics based checkers could also benefit from this information.
Perhaps we should come up with a summary description file format (e.g. json or
whatever) that could be parsed by the future variant of the existing
StdLibraryFunctionsChecker.
After we have a strong foundation of handling existing summaries then we can
move on to generate summaries automatically. We do believe that summary-based
analysis could have a strong added value only if we are capable of
automatically synthesizing the summaries.
Mid-term plans:
^^^^^^^^^^^^^^^
Automatically create summaries based on the methods described in SMART and Microsoft PEX [Godefroid2007, Godefroid2008].
The basic idea is that we create summaries on-demand during the analysis and then later we try to reuse them if the calling-context allows that.
This is something like an advanced caching of the effects of function calls.
A summary is in the form [pre]f(x)[post], where pre
is the preconditions over function inputs and post
is postconditions over function outputs.
There are many open questions and concerns here. E.g. about performance: probably we should use an SMT solver to decide whether we can apply a summary.
Also, the mentioned publications do runtime execution to drive the symbolic execution towards certain paths.
Naturally we cannot do that, but we hope that the worklist based driving could have a similar role.
Long-term plans:
^^^^^^^^^^^^^^^^
Create summaries based on the ideas in Facebook Infer’s publications [FBInfer2005, FBInfer2020].
The basic idea here is to create summaries the are free from the calling-context.
This can be done only with a special data-flow computation and with a proofing system.
This solution would render the analysis to a two-phase analysis.
References:
^^^^^^^^^^^
[Godefroid2007] Godefroid, Patrice. “Compositional dynamic test generation.” Proceedings of the 34th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages. 2007.
[Godefroid2008] Anand, Saswat, Patrice Godefroid, and Nikolai Tillmann. “Demand-driven compositional symbolic execution.” International Conference on Tools and Algorithms for the Construction and Analysis of Systems. Springer, Berlin, Heidelberg, 2008.
[FBInfer2005] Berdine, Josh, Cristiano Calcagno, and Peter W. O’hearn. “Symbolic execution with separation logic.” Asian Symposium on Programming Languages and Systems. Springer, Berlin, Heidelberg, 2005.
[FBInfer2020] O’Hearn, Peter W. “Incorrectness logic.” Proceedings of the ACM on Programming Languages 4.POPL (2019): 1-32.
Thanks,
Gabor