I do see the same bug reports in example_good (godbolt) and example_bad (godbolt):
<source>:27:5: warning: Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems) [alpha.security.taint.TaintPropagation]
system(&(src[0]));
^
<source>:43:9: note: Assuming 'inFile' is not equal to NULL
if (inFile==NULL)
^~~~~~~~~~~~
<source>:43:5: note: Taking false branch
if (inFile==NULL)
^
<source>:65:5: note: Taint originated here
fread(inBuf,1,inBufSize,inFile); //S33578
^
<source>:79:9: note: Assuming the condition is false
if (inBuf[inBufSize-1] == 0x55){
^~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:79:5: note: Taking false branch
if (inBuf[inBufSize-1] == 0x55){
^
<source>:90:5: note: Calling 'someFoo'
someFoo(globInBuf, inBufSize);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:27:5: note: Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)
system(&(src[0]));
^ ~~~~~~~~~
Can you provide a godbolt link that matches your configuration more closely?
Also, why was globInBuf derived from symbols conjured at statements S33699, S33668 in example_good.c and from data readed by fread in example_bad.c before corresponding statements?
Again, I don’t see this on godbolt:
<source>:73:5: warning: initial value of global variable 'globInBuf' [debug.ExprInspection]
clang_analyzer_explain(globInBuf);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:77:5: warning: &SymRegion{reg_$18<char * globInBuf>} [debug.ExprInspection]
clang_analyzer_dump(globInBuf);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What might be going on is, you disabled too many essential checkers. Static analyzer checkers actively impact each other, and running without the core
package isn’t a supported configuration. You’ve definitely some of the core checkers, because otherwise the analysis finds critical undefined behavior at
char *inBuf;
clang_analyzer_explain(inBuf);
and refuses to explore further. Without some other essential checkers, it may be unable to prove that functions like fread()
don’t touch user globals, so it goes with the default overapproximation. In this case the conjured symbol in your dump corresponds to symbolic value of all data potentially touched by fread()
and the derived symbol represents the portion of the data that corresponds to the global variable.
Since you appear to be studying how the static analyzer works, I usually recommend against doing that with ExprInspection
. Instead I recommend dumping the entire analysis graphs with -analyzer-dump-egraph
and exploded-graph-rewriter.py
which are designed to answer every question you could possibly have (see also 2019 LLVM Developers’ Meeting: A. Dergachev “Developing the Clang Static Analyzer” - YouTube).