Analyser - variadic arguments

If i was to write such checker, i'd try the following:

Map va_list-type variables (as memory regions) upon va_start(), map them to the number of read arguments (0 so far), and to the current StackFrameContext pointer, which represents a particular function call within which we are now (CheckerContext::getCurrentStackFrame()) (second program state map - or map to structure with two items).

You can retrieve [path-sensitive] function declaration and [path-insensitive] call expression from the stack frame context, and compute the number of variadic arguments passed based on that.

If va_start() is called upon a variable that is already tracked, warn.

If va_start() is called twice in the same StackFrameContext, warn.

Upon va_arg(), increment the number of reads for the respective variable. If the variable is not tracked, then the user has forgotten to call va_start(), hence warn. If the number exceeds the remembered total number of arguments, warn.

Upon va_copy(), copy the whole map item to the new variable. If the source variable is not va_start()'ed, warn.

Upon va_end(), erase the map item, which means that you no longer track the variable. If the variable is not va_start()'ed, warn.

Upon checkDeadSymbols, see if any of the tracked va_list variables is dead (in the sense of SymbolReaper::isLiveRegion()). If it is dead, then va_end() will never be called on it (and it wasn't so far, because the variable is still in the map). Hence warn on forgotten va_end().

You don't need to track end of function to warn on forgotten va_end() - checkDeadSymbols is already more frequent.

If you expect seeing a lot of passing va_list variables by pointers (which is insane but i guess possible), then consider the following.

Upon checkPointerEscape, see if any of the tracked va_list variables escapes. If so, mark it in the map as escaped: it might have va_end()'ed elsewhere, so we should not warn if it dies. Hence you cannot erase it from the program state - instead, you'd essentially need a separate trait flag (though you may put some magic constant into the existing trait).

If the memory region of the va_list variable has symbolic base (came to us by pointer, rather than variable declaration), then it might have already been initialized with va_start. Hence you shouldn't warn if va_arg()/va_end()/va_copy() is called upon it without prior va_start().

Hence the checker looks pretty similar to SimpleStreamChecker described in the video, and it's a very typical path-sensitive checker. I don't expect much problems on this path - it may sound like a lot of info, but basically it should be easy.

If you are only interested in argument count overflows, but not in other va_* API misuse, then probably you may simplify this "typestate machine". But you'd still need to catch va_start() and va_args() and most likely va_copy(), and once you do, you get all other checks for almost-free.

Thanks for your reply! I agree that va_start/copy/arg/end call ordering should be reasonably easy to check.

Unfortunately I’m still having trouble getting the number of arguments - I suspect this is because IPA is disabled for variadic functions at https://github.com/llvm-mirror/clang/blob/4ab9d6e02b29c24ca44638cc61b52cde2df4a888/lib/StaticAnalyzer/Core/ExprEngineCallAndReturn.cpp#L735

Unfortunately, the details are tracked at rdar://problem/12147064 which seems to be private so I don’t know the full details. Can anyone give me any pointers as to what the issue is, and whether it’s a task it would be feasible for a new developer to attempt?

Hi Artem!

I wanted to note that a checker about the va_api call odering is under review: https://reviews.llvm.org/D15227

Regards,

Gábor

Oh damn, i had a feeling it all sounds familiar! Now i'm bound to have a look :slight_smile:

Uhm.

You can enable IPA for variadic functions for your experiments, but i guess the problem was that it's not truly modeled - eg., va_arg() doesn't really return the variadic argument values. Which is bad, because it makes the analyzer make false assumptions (eg., suppose we inline scanf(), but do not understand that the place scanf() writes into is our local variable - we'd keep thinking that this variable is uninitialized and throw a false warning when it's used.

So you can just disable this check in your experiments (and deal with some false positives, which might not be too many), but a proper fix would need to be done before this check is disabled in the mainline analyzer.

In order to produce a proper fix, we'd need to model VAArgExpr's - in the ExprEngine or in Gabor's checker. In fact, a checker might be a good place for it, because it would already have all the necessary infrastructure (va_start() also needs to be modeled in the same place, perhaps through evalCall() to reduce invalidations). The modeling itself is not hard, but would require some understanding.