Analyze the code after the state machine in the loop

Is it possible to supply specific values for input variables in order to direct the analysis along a specific execution path? My goal is not to let the analyzer get stuck in the main file when analyzing the command line options being passed or in similiar tasks when analyzing state machine in the loop. Analyzing commandline options and args in a loop leads to a huge number of states and does not allow to go deeper. Potential solutions:

  • start the analysis with the desired function (-analyze-function) located in the call stack after parsing command line arguments
    (but will ctu analysis work correctly?)

  • modify the interface to present part of the input data as specific instead of symbolic.

Which solution is correct? Or does the solution already exist?

The static analyzer defines the macro __clang_analyzer__ when run.You can use it to exclude code from analysis, or even “mock” code to simplify analysis:

int main(int argc, char **argv) {
    struct arg_t *args = malloc(argc * sizeof(struct arg_t));

#ifndef __clang_analyzer__
    // Actual implementation.
    for (int i = 1; i < argc; ++i) {
        if (!strcmp(argv[i], "--help")) {
            args[i] = { .kind = HelpArgKind, ... };
        } else if (...) {
          ...
        }
    }
#else
    // Mock implementation: forward-declare a fake function
    // with no definition (it's not like we're linking or anything)
    // and pretend to call it to initialize the 'args' array.
    void fake_parse_args(void *);
    fake_parse_args(args);
#endif

    // Do the rest of the stuff.
    return 0;
}

Of course, caution is advised. Typically the only safe way to use this macro would be to exclude entire function definitions from analysis (while leaving the forward declaration up) (the static analyzer is designed to behave reasonably well when the definition isn’t available). Whenever you’re excluding parts of a function, there could be various unintended consequences, so the mock implementation should reflect what’s going on pretty accurately.

I don’t think we have stuff like this. At least, not how you would likely imagine it.
You could put a bunch of assert(...) in the code to introduce the necessary assumptions to the analyzer, that it can likely use to reduce the exploration space even to a single path in extreme situations. However, I bet that’s not what you wanted.

I’m not sure if I follow. We employ multiple heuristics to cut paths on the exploration graph. One of such is a limit of how many times we visit a CFG basic-block, or how many nodes (aka.~steps) we decide to spend from an entry-point (aka. top-level function). But there are actually much more under the hood. Do we unroll loops with concrete bounds? Do we widen loops if we cannot model all iterations?
What functions we attempt to inline?

You can read more about these internal options at https://github.com/llvm/llvm-project/blob/main/clang/include/clang/StaticAnalyzer/Core/AnalyzerOptions.def

So, I’d say, it’s a bit complicated to get the desired outcome.
All of these affects how deep the exploration goes.

One more note is that, as-of-now, we prefer functions as entrypoints for which we don’t have callsites. The idea is that they will explore more stuff with the most context when they get to inline the given other function at a call expression.
Any functions that were not reached or analyzed part of an entrypoint, are considered as an entrypoint later. Thus, we should at some point analyze it anyway in some context. (If that’s not the case, than it’s probably a bug.)

That flag is for debug purposes. I wouldn’t recommend using it anything else, but probably it would do what you would imagine it does.
It would start a single analysis from that function in top-level context, aka. without knowing anything about the parameters or the global state.
CTU will likely work just fine even for this case, but I’m not sure how that is connected to this discussion.

I’m not sure how would you do that. Can you give an example? Do you think of something like “mock” or “dummy” data? Mind that since we frequently don’t see concrete values, we usually don’t put much effort into handling them, thus I’d say that the benefit should be marginal or inconsistent at best.

Unfortunately, I don’t have an answer for you. We have our limitations.
If you can craft a toy example, we can explore some ways circumventing some, but it depends on the case.