exhaustiveness of CSA checkers

Hello cfe-dev,

In prototyping a custom checker for the Clang Static Analyzer, I’ve found analysis terminates at some complexity limit. That is, when your target function exceeds some complexity bound, CSA stops path traversal and your checker does not receive callbacks for any remaining unvisited nodes. The two specific scenarios where I’ve run into this are high-iteration-count loops and complex conditionals (multiple short circuiting && and || operators). The first I can work around by rephrasing the target loops or something like -analyzer-max-loop, but I can’t find a way to affect the behavior of the second. To compound the situation, I cannot see how the checker can detect that path exploration was incomplete.

Is there a way to control the complexity limit enforced for conditionals? Or, failing that, to detect within the checker when path exploration was incomplete?

To give some more context, my checker is an experiment and not something I am intending to upstream. Runtime is not an issue; I am fine with the analyzer taking multiple hours for a single run. Though I understand why the existing CSA bound choices have been made, as most users do not want their compiler to run for this long.

Please CC me in replies as I’m not subscribed.

Thanks,

Matthew

Hi!

The clang static analyzer does not give you any guarantees regarding the coverage/exhaustiveness. There is no way to ensure exhaustive analysis (such analysis is likely to be unbounded for most non-trivial programs, so this is not only about runtime, but also termination). For this reason all the checks have to be implemented with non-exhaustiveness in mind.

Could you share what you are trying to achieve? Maybe symbolic execution is not the right tool for that problem.

Cheers,
Gabor

Hi Gabor,

Thanks for your reply. The checker I’m implementing is similar to PthreadLockChecker. It knows the correct acquire/release patterns for certain primitives and checks for them. If analysis fails to reach the end of a function, the checker cannot warn for e.g. unreleased locks.

This is a somewhat unorthodox case as I know the target code to which this will be applied. All functions are <500LoC and the only loops are statically bounded. It is observable statically that all functions terminate and there are a finite number of paths.

I was hoping to use CSA for this because it handles path enumeration and constructing the exploded graph very nicely. Someone suggested to me I might have to move to KLEE, but that would be a shame because I’d need to introduce some code instrumentation/annotation to achieve what I want. Another option would be to use an AST visitor to enumerate the paths myself, but it would be nice to leverage LLVM’s existing functionality for this.

Thanks,

Matthew

(Adding Artem as he is very knowledgeable in this topic)

Oh, I see. In case it is known that you have a bounded number of paths it is not entirely unreasonable to use symbolic execution to achieve what you want.

Unfortunately, this is not a use-case that the static analyzer was designed for. I think it should be possible to tweak it but I have no idea how much work would that be.

But even though it might be possible to tweak the analyzer I am not sure if this would be the right thing to do. Some questions that might help:

  1. How much control-flow awareness do you need? Do you really need path-sensitivity or flow-sensitive is sufficient? Or maybe lexical scoping is enough?

You only need path sensitive check if you want to avoid false positives in the form of:
if (cond)
lock();
// …
if (cond)
unlock();

It looks like you already have some constraints on the coding style in the code you want to check. So I guess there is a chance that users are not allowed to do locking using complex patterns like the one above. If that is the case, flow-sensitive analysis might be a better fit as it is easier to make that exhaustive and will perform much better.
Or in case RAII style locking would be sufficient but you do not have dtors in C, you can have syntactic checks that enforce hand-written RAII style resource management.

  1. Do you need interprocedural analysis? If so, do you have recursion? Do you need context sensitivity? Can you add annotations to help guide the analysis?

  2. How complex is the task that you want to accomplish? Are locks reentrant? Do you have to support more complex try_lock style APIs? Or is it sufficient to only check the order of the API calls?

In case you can add annotations and you do not need path sensitivity you could take a look at Thread Safety Analysis: https://clang.llvm.org/docs/ThreadSafetyAnalysis.html

Cheers,
Gabor

Thanks, Gabor. Sounds like this might be beyond CSA’s abilities. Answers to your very apt questions below.

> How much control-flow awareness do you need? Do you really need path-sensitivity or flow-sensitive is sufficient? Or maybe lexical scoping is enough?

It seems to me currently that we need path-sensitivity. I’ve been exploring some lexical approaches in parallel though, including an RAII-style rephrasing of the code.

> Do you need interprocedural analysis? If so, do you have recursion? Do you need context sensitivity? Can you add annotations to help guide the analysis?

No IPA necessary and there’s no recursion. In most cases we can ignore context, even to the extent of ignoring the content of a conditional and just noting there’s a branch in control flow (with the exception of when a branch condition depends on a lock acquisition result). Annotations are possible, but if it comes to this I would probably look at more drastic refactoring of the code. CSA would not be the only thing consuming this code, so it would still need to be correct and complete without the annotations.

> How complex is the task that you want to accomplish? Are locks reentrant? Do you have to support more complex try_lock style APIs? Or is it sufficient to only check the order of the API calls?

The locks are not re-entrant, though their only APIs are try_lock-style. So the analysis needs to comprehend whether lock acquisition succeeded or failed.

> you could take a look at Thread Safety

Interesting, I was not aware of this. It looks like maybe I can make this work for my purposes. Thanks for the pointer.

Yup, i mostly agree with Gabor. If you want to hack on the Clang Static Analyzer in order to push it past its default limits, you can try bumping the following flags (intended for hacking purposes only!):

// Interrupts the analysis when a CFG block is visited that many times.
-analyzer-max-loop=4

// Interrupts the analysis when ExplodedGraph has that many nodes.
-analyzer-config max-nodes=225000

and setting the following flags (intended for hacking purposes only!):

// Allow unrolling loops indefinitely when the concrete bound is known (currently off by default).
-analyzer-config unroll-loops=true

// Disables function inlining (you said you don't need IPA).
-analyzer-config ipa=none

and also removing the artificial heuristics for loop unrolling (that attempt to discover whether the loop is statically bounded in LoopUnrolling.cpp). That would give you the most complete exploration the Static Analyzer could ever achieve. There may be more flags that i forgot about, but the above should be pretty good.

It is still impossible to achieve hardcore *verification* this way. The Analyzer will occasionally drop execution paths for many other reasons, and these reasons are fairly hard to enumerate. Like, it may encounter exotic language constructs that it still doesn't yet understand, or simply becomes too confused to continue, or it might turn out that an execution path looks infeasible to the Static Analyzer because of a bug but it may be taken in reality. At the end you will never be sure that the program is definitely correct.

But if you simply want to find "most" of the bugs, for a certain definition of "most", the above should do the trick.

Thanks, Artem. Lots of useful flags. Unfortunately I have already experimented with all of them though.

Loop unrolling won't work on most loops if you don't remove the leashes in Static Analyzer's LoopUnrolling.cpp that i mentioned before. We don't have flags to control this yet (but i don't mind having some).

Other than that, it works for me most of the time when i debug false negatives. So your lack of success makes me curious about specifics. I might have forgotten something or it might be that something really curious is going on in your case.

The loop unrolling itself is not so much of a problem. However, I think the control flow in the body of the loop causes path explosion. A sample loop looks something like:

    for (size_t i = 0; i < SOME_CONSTANT; i++) {
      if (some condition) {
        call_noreturn_function();
      }
    }

However loop unrolling is a secondary problem to me (I have other work arounds for the target loops). I think the real problem is the branching structure. Much of the function looks like:

    if (some condition)
      call_noreturn_function1();

    if (some other condition)
      call_noreturn_function2();

    ...

Based on a quick eyeballing, there are roughly 20 such branches in the target function. Adding up the '&&' and '||' operators used in the branch conditions, there's about 30 of these, which no doubt compounds the problem. We are certainly talking about a lot of paths here and I anticipated I would need to do some optimization, but I was surprised that (a) I can't seem to tune how early analysis terminates and (b) there is no way for my checker to observe this. Wrt (a), analysis seems to give up after ~19 calculated branches, and given this is 2**19 paths maybe this is reasonable. Wrt (b) I can't find a callback or way to observe this truncation. I guess my checker could define a destructor that validates whether it ever saw the end of the function, but this seems pretty hacky.

Given, as I mentioned, some of the properties I'm trying to check may be provable structurally, I'm going to try an AST visitor approach. Presumably this will scale far better than a path-sensitive analysis, but I was hoping I could leverage the path enumeration logic.

Hi!

I am not sure what are you planning to do based on the AST, but you might want to look into the clang CFG and the ExprSequence class in clang-tidy 1. They might prove useful for you.

Cheers,
Gabor