Clang Static Analyzer conditional terminating call back

Hello all,
I want to check if a tainted value can affect the control flow of some sensitive functions. For example:

value = taint_source()
if (value < xxx) {
  sensitive_func()
}

The taint propagation in clang static analyzer fit part of my need. One approach I can think of is:
Whenever I encounter a branch condition (register checkBranchCondition() call back), I will push a tag(tainted or not) to a taintStack variable in ProgramState.
After the branch block closed, I will pop one tag.
If sensitive_function() get encountered, I will check all the tags in taintStack to see if any of them is tainted.

The problem is I did not find a callback like checkBranchCondition() which will be called every time exiting a branch block. Then what should be a good approach for this control flow checking?

Any suggestions would be appreciated.

Thank you,
Gavin

  • Artem because he knows everything about the analyzer and symbolic execution, + Balázs because he is currently working on TaintChecker.

My first instinct here would be to combine pathsensitive analysis with control flow analysis. In the header file clang/include/clang/Analysis/Analyses/Dominators.h you will find the class ControlDependencyCalculator. You could calculate the control dependencies of the block in which sensitive_func() is called (you can retrieve that through the current ExplodedNode) and find that the CFGBlock whose getLastCondition() is value < xxx is in fact a control dependency. Then, you could, in theory, check whether parts of this expression is tainted.

Artem, do you think this makes any sense?

@Gavin: I’m worried that you’re choosing a wrong strategy here. Branches with tainted conditions can be used for sanitizing the input, but it sounds like you want to ban them rather than promote them. That said, i can’t figure out what’s the right solution for you unless i understand the original problem that you’re trying to solve.

@Kristof: Do you think you can implement a checkBeginControlDependentSection / checkEndControlDependentSection callback pair on top of your control dependency tracking mechanisms, so that they behaved intuitively and always perfectly paired each other, even in the more complicated cases like for-loops and Duff’s devices? (there’s no indication so far that we really need them - scope contexts are much more valuable and might actually be helpful here as well - but i’m kinda curious).

@Gavin: I’m worried that you’re choosing a wrong strategy here. Branches with tainted conditions can be used for sanitizing the input, but it sounds like you want to ban them rather than promote them. That said, i can’t figure out what’s the right solution for you unless i understand the original problem that you’re trying to solve.

@Kristof: Do you think you can implement a checkBeginControlDependentSection / checkEndControlDependentSection callback pair on top of your control dependency tracking mechanisms, so that they behaved intuitively and always perfectly paired each other, even in the more complicated cases like for-loops and Duff’s devices? (there’s no indication so far that we really need them - scope contexts are much more valuable and might actually be helpful here as well - but i’m kinda curious).

I guess so. I’m seeing a couple things to keep track of (inlined function calls to name one), but nothing too bad.

It raises (haha) a question about exceptions, if we ever end up supporting them, what happens if an exception is raised? Also, just came to my mind, should any block with a non-noexcept function call have an edge to the exit block if we take exceptions into account?

Thanks for the help,
@Artem, I think the taint propagation is necessary for my problem. I want to analyze if an untrust input can somehow affect the control flow of some sensitive function (tainted source determine whether a sensitive function get executed or not). The untrusted input can taint other variables and eventually taint the branch condition expression. It still needs to be path sensitive. For example:

config_from_file = parse_config_file() // taint source
/* the tainted value may infect other variables (should_enc) in some paths*/
if (use_default) {
config = default_config // in this path, taint does not flow to condition expr
}
else {
config = config_from_file // taint flow to config
}
should_enc = (config.secure_level > 10) // taint flow to should_enc
if (should_enc) { // branch is tainted in one path
do_encrypt(data) // the execution of sensitive function is affected by taint source in one path.
}
else { // this block is also tainted if use_default

} // after exiting the block, everything should be fine.
other_sensitive_func(); // not affected by taint source in both paths

@Kristof, I think ControlDependencyCalculator might do the trick. I do not need to use a stack structure to track the blocks myself. Here’s what I might do:
-in checkPreStmt(const CallExpr *CE, CheckerContext &C) , check if the statement is a sensitive function call
-get cfg from C->ExplodedNode()->getCFG, and create cdc = ControlDependencyCalculator(cfg)
-get dependent blocks from cdc->getControlDependencies(C->ExplodedNode()->getCFGBlock())
-for each returned block, check if the condition expr is tainted in current state.

If ControlDependencyCalculator can correctly calculate the dependence, I think the above steps should work. I am not sure if the getLastCondition()s return from dependency blocks overlaps, but it will not affect the result.

Gavin

What about the following test cases? // (1): if (config.secure_level > 10) // not a control flow dependency of the sensitive call! should_enc = true; // concrete value, not tainted! else should_enc = false; // concrete value, not tainted! if (should_enc) // concrete true or false, not tainted! do_encrypt(data); // (2): if (config.secure_level > 10) do_encrypt(data); else do_encrypt(data); // encryption is done on both branches anyway! // (3): if (config.secure_level > 10) // tainted symbol collapsed to a constant! do_unrelated_stuff(); if (config.secure_level > 10) // concrete true or false, not tainted! do_encrypt(data); Basically i want to know not only about the bug you’re trying to find, but more about what your users are and what quality requirements do you have. If you’re writing a tool for yourself (eg., for doing a security audit of a specific project), you can get away with a high false positive rate. If you’re making a tool for automatic code review that’ll point out potential security breaches to other developers as they write new code, you’ll have to make sure your tool doesn’t prevent the developers from easily writing the secure code that they need to write, so a high false positive rate is unacceptable, and you’ll need to formulate precise rules in an as simple manner as possible instead of relying on an unpredictable emergent behavior. If you’re really paranoid about security, you should go for a verification tool that has high false positive rate and zero false negatives. If you can make your own APIs, you should probably make safer APIs that are either taking care of the security issues on the type system level or generally make life easier for static analysis. Also Static Analyzer is tweaked for finding very pinpointed bugs that can be proven by looking at a specific execution path without taking into account the surrounding code that didn’t get executed on the current path. Your question seems to be focused on the difference in behavior between the situations in which the branch is taken or not, which is already too much of a global reasoning. The condition expression is not an active expression at this point, so it doesn’t have a value at all in the current state. You’ll have to go back in time, to the moment of time where the condition was evaluated, in order to understand what its value was. Which is why your original approach was better. You may be able to store branch conditions in the program state for later use in an Environment-like map, i.e. ‘(Expr *, LocationContext *) → SVal’, clean it up as location contexts are destroyed, and get them overwritten when looping around in a loop. Or you can emit a bug on every sensitive function and attach a bug visitor to it that will suppress the report when it’s unable to find the tainted dependency. This is probably the easiest way to implement this right now - not sure about performance though.

Hi Artem,

I am writing a tool for myself. The propose of this tool is to help me prove such vulnerabilities may exist in some applications. (with a different threat model, some originally trusted value become untrusted/tainted) I will manually look at the report anyways, so a high false-positive rate is acceptable.

Thank you for pointing out those cases. I do not have a good solution for them yet. Maybe taint analysis with symbolic execution is not the best approach for my problem. But for the current stage, I just want to have a tool to list some potentially vulnerable code so that I can hopefully detect at least one true vulnerability.

You are right, I can not get the value of expression when it leaves the current context. But how can I intercept the moment a location contexts get destroyed?

Gavin

Ok! Yeah, that should get you something, at least :slight_smile: For now that’s basically checkEndFunction. Maybe we’ll add more location contexts in the future, with more fine-grained callbacks for them.