I am a fairly new entrant to Clang SA. I wanted to confirm the behavior of the following two analyzer-config options:
max-times-inline-large : When CSA is analyzing a compilation unit, at most max-times-inline-large “large” functions will be inlined during the analysis of the entire CU. In particular, it is not applied per-top-level function analyzed. Is this correct?
A test I did: Suppose a CU t.c contains a function foo() that has a null dereference, and several other functions. The command “clang -cc1 -analyze -analyzer-checker=core.NullDereference t.c -analyze-function=foo” detects the null deref bug while the command “clang -cc1 -analyze -analyzer-checker=core.NullDereference t.c” does not. When I increased the max-times-inline-large value, running CSA on the entire t.c did detect the null deref bug.
max-symbol-complexity=35 : What is the number 35 counting? Is it (approximately) the number of operators in a symbolic expression?
I’ll really appreciate any clarification on these points.
Yup, max-times-inline-large controls how many times do functions that have (min-cfg-size-treat-functions-as-large) or more Control Flow Graph blocks are “inlined” during analysis when called from other functions. This doesn’t apply to functions from which the analysis starts (you can list them via -analyzer-display-progress).
But note that if a function was inlined, then it usually won’t be re-analyzed as a top-level function. This heuristic is based on the assumption that the reliability of warnings found by analyzing the function as a top-level function is usually much lower than that of warnings found within an inlined function, because when inlining, you have more context on how the function is actually used in the program.
Yup, the complexity value that max-symbol-complexity limits is roughly as long as the “length” of the most complicated symbolic expression. Every atomic symbol (SymbolData) has complexity of 1, a constant has complexity 0, and every operation also adds 1 to the complexity of its operands, unless both operands are non-constants (but this last part seems to be an accidental omission). Anyway, the point is, when the symbolic expression becomes too long, the Static Analyzer prefers to collapse it to UnknownVal in order to avoid performance problems. This is usually not terrible because when a piece of code produces terribly complicated symbolic expressions, the user is more likely to forgive us for not understanding it perfectly
Generally, these -analyzer-config tweaks are simply a way to avoid magic constants in the code. They help when trying to find the best value for the magic constant, but they aren’t recommended for everyday use. But if it suddenly turns out that some non-standard -analyzer-config values make the Static Analyzer perform significantly better on your code, please let us know