[RFC][analyzer][StdLibraryFunctionsChecker] Parsing signatures?

Hi,

Here is an example of adding a function summary in the StdLibraryFunctionsChecker:
// ssize_t recv(int sockfd, void *buf, size_t len, int flags);
addToFunctionSummaryMap(
“recv”,
Signature(ArgTypes{IntTy, VoidPtrTy, SizeTy, IntTy}, RetType{Ssize_tTy}),
Summary(NoEvalCall)
.ArgConstraint(ArgumentCondition(0, WithinRange, Range(0, IntMax)))
.ArgConstraint(BufferSize(/Buffer=/ArgNo(1),
/BufSize=/ArgNo(2))));

Instead, I’d like to have the following in the future:
addToFunctionSummaryMap(
“recv”
Signature(“ssize_t recv(int sockfd, void *buf, size_t len, int flags);”),
Summary(NoEvalCall)
.ArgConstraint(ArgumentCondition(0, WithinRange, Range(0, IntMax)))
.ArgConstraint(BufferSize(/Buffer=/ArgNo(1),
/BufSize=/ArgNo(2))));

Why? This could simplify the type matching code in the Checker extremely. Besides, whenever we reach up to a point where we can read up summaries from e.g. YAML files (maybe when we merge with the TaintChecker) then the user could specify the signatures as they would write that in C/C++, which seems to be an ultimate convenience.

To achieve this I have to parse the string given to the Signature in the ASTContext of the TU that is being analyzed. I am considering two options to develop this:

  1. Seems like BodyFarm/ModelInjector does something similar (it reads function bodies from model files). However, I am not sure if that solution is flexible enough. Gabor, what do you think, would it make sense to extend into this direction, could we handle C++ declarations as well? What other weak points or difficulties do you see?
  2. Maybe we could use the parser with a custom ExternalASTSource implementation that could do the job. Actually, this is how LLDB does it, the implementation of the ExternalASTSource interface uses the ASTImporter under the hood. I am not sure if ASTImporter could be used for this, but maybe some parts of it, we could reuse.

Thanks,
Gabor

Why? This could simplify the type matching code in the Checker extremely. Besides, whenever we reach up to a point where we can read up summaries from e.g. YAML files (maybe when we merge with the TaintChecker) then the user could specify the signatures as they would write that in C/C++, which seems to be an ultimate convenience.

Another use case could be to boost up the CallDescriptionMap by using the same infrastructure. Currently we match by function names and by argument numbers and this has caused bugs already.
Imagine this:
CallDescriptionMap FnDescriptions = {
{{“FILE *fopen(const char *pathname, const char *mode)”}, // parse and match by the full signature
{nullptr, &StreamChecker::evalFopen, ArgNone}},

Cheers,
Gabor

This sounds like a facility that can get fairly complicated and will never be completely reliable or do exactly what we want. I guess it could be taught to handle simple cases but the old approach will never really be going away.

Say, if we are to write such prototypes for C++ collection methods we’ll probably want to completely drop template arguments because we can’t list all possible arguments. This basically means that using the actual compiler to parse such prototypes will inevitably fail in this case. Another recurring problem with C++ is inline namespaces, say the inline namespace __1 that shows up in the libc++ method prototypes and should be actively ignored by any such system.

In C some standard functions are implemented as macros expanding to builtins and such builtins can potentially have more arguments than the function they implement (extra arguments automatically filled in by the macro).

I think it’s better to target only plain C functions with this and do a completely dumb custom parser for the prototypes. Probably also drop support for hard-to-parse types like function pointers. Anything beyond that sounds questionable to me.

I guess that kind of answers why CallDescription works the way it
does, huh. In InnerPointerChecker, std::string::c_str() is done with
the following: CallDescription{"std", "basic_string", "clear"}, which
will match to anything from std::basic_string::clear() to
std::__cxx11::basic_string<T, N>::clear(). I wonder how we could make
CallDescription smarter without disrupting the matching capability it
has for C++ methods (a notable problem:
https://reviews.llvm.org/D81745).