GenericTaintChecker - taint status examination

Here inBuf (example_good.c - Pastebin.com) is the local buf wich was successfuly tainted.
So passing derived globInBuf to someFoo (which has system call) leads to emit Bug Report by GenericTaintChecker.
It’s all good (for example_good.c). But when example_bad.c (example_bad.c - Pastebin.com) is passed to GenericTaintChecker there is no any Bug Reports.
As far as I understand it is necessary to examine all regions from wich some symbol passed to taint sink can be derived.
So what is the best thing to do?

Also, why was globInBuf derived from symbols conjured at statements S33699, S33668 in example_good.c and from data readed by fread in example_bad.c before corresponding statements?

I do see the same bug reports in example_good (godbolt) and example_bad (godbolt):

<source>:27:5: warning: Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems) [alpha.security.taint.TaintPropagation]
    system(&(src[0]));
    ^
<source>:43:9: note: Assuming 'inFile' is not equal to NULL
    if (inFile==NULL)
        ^~~~~~~~~~~~
<source>:43:5: note: Taking false branch
    if (inFile==NULL)
    ^
<source>:65:5: note: Taint originated here
    fread(inBuf,1,inBufSize,inFile); //S33578
    ^
<source>:79:9: note: Assuming the condition is false
    if (inBuf[inBufSize-1] == 0x55){
        ^~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:79:5: note: Taking false branch
    if (inBuf[inBufSize-1] == 0x55){
    ^
<source>:90:5: note: Calling 'someFoo'
    someFoo(globInBuf, inBufSize);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:27:5: note: Untrusted data is passed to a system call (CERT/STR02-C. Sanitize data passed to complex subsystems)
    system(&(src[0]));
    ^      ~~~~~~~~~

Can you provide a godbolt link that matches your configuration more closely?

Also, why was globInBuf derived from symbols conjured at statements S33699, S33668 in example_good.c and from data readed by fread in example_bad.c before corresponding statements?

Again, I don’t see this on godbolt:

<source>:73:5: warning: initial value of global variable 'globInBuf' [debug.ExprInspection]
    clang_analyzer_explain(globInBuf);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<source>:77:5: warning: &SymRegion{reg_$18<char * globInBuf>} [debug.ExprInspection]
    clang_analyzer_dump(globInBuf);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

What might be going on is, you disabled too many essential checkers. Static analyzer checkers actively impact each other, and running without the core package isn’t a supported configuration. You’ve definitely some of the core checkers, because otherwise the analysis finds critical undefined behavior at

    char *inBuf;
    clang_analyzer_explain(inBuf);

and refuses to explore further. Without some other essential checkers, it may be unable to prove that functions like fread() don’t touch user globals, so it goes with the default overapproximation. In this case the conjured symbol in your dump corresponds to symbolic value of all data potentially touched by fread() and the derived symbol represents the portion of the data that corresponds to the global variable.

Since you appear to be studying how the static analyzer works, I usually recommend against doing that with ExprInspection. Instead I recommend dumping the entire analysis graphs with -analyzer-dump-egraph and exploded-graph-rewriter.py which are designed to answer every question you could possibly have (see also 2019 LLVM Developers’ Meeting: A. Dergachev “Developing the Clang Static Analyzer” - YouTube).

Thank you, Artem. I’ve already watched this video, and it helped me in learning CSA. As well as “CSA-A Checkeer Developer Manual, 2016”.
The screenshot shows the options you used there.


How to understand which checkers are required for proper operation (GenericTaintTracer or any self - written checker)?

I didn’t use godbolt. My invocation was: clang -cc1 $MY_INCLUDES -analyze -analyzer-checker=alpha.security.taint.TaintPropagation example_good/bad.c.
But when I added the core package the results didn’t change.
Now I will try to build last version of Clang, previous one could be changed during my experiments. I will check and write

Just use the default setup with clang --analyze, like I do in godbolt examples. Then you can pass extra frontend flags through -Xclang.

Or, ideally, start with invocations produced by scan-build like I did in the video (I didn’t type that command by hand!).

Godbolt is great because you can share your results through it, so that the readers didn’t need to figure out what setup you used and how to reproduce your problems. It provides a fresh clang as well as a selection of old clangs.

As I understood, I should definitely use the core package. However, if there are errors in the code being analyzed that are detected by the checkers of core, the analysis stops with a bug report. That is, it does not come to checks somewhere deep in the code.

When I call CSA with silence-checkers=core, the check also fails (Compiler Explorer ) due to UB in char *inBuf; clang_analyzer_explain(inBuf).

Thus, how not to interrupt the analysis and at the same time perform modeling using the core package?

You can initialize the variable?

Yes, of course. Null pointer is critical in this case. Thank you, Artem.