More static analysis...

I'm interested in looking at detecting "known bad" patterns, for example:

(<expr> & 0) == 0 (this example is stolen from FindBugs)

if(<non-boolean value>) (cause of recent OpenSSL vuln)

is there code that does this sort of thing already? Or something
related so I can get some hints?

Both of these would be easy to add. We've been mainly focusing on gradually enhancing the base symbolic reasoning of integer values, providing the substrate to write checks like these. We can talk about specifics if you are interested. I believe it wouldn't take more than a few lines of code to add checks for these.

I'm interested in looking at detecting "known bad" patterns, for example:

(<expr> & 0) == 0 (this example is stolen from FindBugs)

if(<non-boolean value>) (cause of recent OpenSSL vuln)

Both of these would be easy to add. We've been mainly focusing on gradually
enhancing the base symbolic reasoning of integer values, providing the
substrate to write checks like these. We can talk about specifics if you
are interested.

Yes, please. Though note that doing the second example properly
requires global analysis...

I believe it wouldn't take more than a few lines of code to
add checks for these.

Cool!

I'm interested in looking at detecting "known bad" patterns, for example:

(<expr> & 0) == 0 (this example is stolen from FindBugs)

Quick question. Did you literally want to look for this syntax in the AST (i.e., the integer literal '0') or did you want to use the dataflow analysis to see cases when an expression evaluates to '0' and then is bitwise-ANDed with <expr>? Is the motivation of this check is that it is trivially true?

if(<non-boolean value>) (cause of recent OpenSSL vuln)

This seems a little too vague to me. People check non-boolean values all the time in 'if' conditions. Were you thinking about cases when <non boolean value> was negative? Positive numbers seem completely legit to me (e.g., some flag was set in a bitmask, etc.).

In either case, both of these seem like "auditor" checks. For the first check if all you wanted to do is grep the syntax, we can easily add another check that gets called by AnalysisConsumer. Check out the functions "ActionXXX" in AnalysisConsumer.cpp to see all the places where the various checks are called. An ActionXXX can get called for each function, an entire Objective-C class, or the entire translation unit. For example:

static void ActionWarnDeadStores(AnalysisManager& mgr) {
   if (LiveVariables* L = mgr.getLiveVariables()) {
     BugReporter BR(mgr);
     CheckDeadStores(*L, BR);
   }
}

This action gets called for each function/method declaration. The action queries the AnalysisManager (the thing that runs the checks) for the live variables analysis result on the current function (which gets computed on-the-fly). The function 'CheckDeadStores' is then called with (a) the results of the live variables analysis and (b) and BugReporter object which is used to issue bug diagnostics to the user. Analyses get registered in the Analyses.def file, where the "scope" of the analysis is declaratively specified. For example:

ANALYSIS(WarnDeadStores, "warn-dead-stores",
          "Warn about stores to dead variables", Code)

Here "Code" means "code declaration"; that is a FunctionDecl or an ObjCMethodDecl.

A syntactic check can just walk the AST. A good example of a syntactic check is CheckObjCDealloc.cpp. Although this check will one day be revamped to do more real analysis, you can see how it walks the AST using iterators and does various pattern matching. Syntactic checks are fairly easy to implement in this way and hook up to AnalysisConsumer. The down-side is that they can be fairly dumb if the property you are checking requires any real smarts.

Another possible way to implement an "auditor" is to use the GRAuditor interface defined in GRAuditor.h:

template <typename STATE>
class GRAuditor {
public:
   typedef ExplodedNode<STATE> NodeTy;
   typedef typename STATE::ManagerTy ManagerTy;

   virtual ~GRAuditor() {}
   virtual bool Audit(NodeTy* N, ManagerTy& M) = 0;
};

Checks that subclass GRAuditor<const GRState*> can be registered with a GRExprEngine and have their "Audit" method called after the path engine evaluates all expressions of a given type. The Auditor is then free to inspect the analysis state at that point and flag errors. A good example of this kind of check is in BasicObjCFoundationChecks.cpp. The class in question is AuditCFNumberCreate, which flags misuses of the function CFNumberCreate (http://developer.apple.com/documentation/CoreFOundation/Reference/CFNumberRef/Reference/reference.html). That function just interposes on the path engine at every CallExpr and checks the arguments in the function call for illegal values.

This is probably enough for one email. I'm more than happy to dig into more details depending on what direction you decide to go in. Also, these interfaces are obviously evolving. We haven't written many checks, so any idea of how to refactor the interface, make it better, etc., are welcome.

I'm interested in looking at detecting "known bad" patterns, for
example:

(<expr> & 0) == 0 (this example is stolen from FindBugs)

Quick question. Did you literally want to look for this syntax in the AST
(i.e., the integer literal '0') or did you want to use the dataflow analysis
to see cases when an expression evaluates to '0' and then is bitwise-ANDed
with <expr>?

Not sure I'd care either way.

Is the motivation of this check is that it is trivially true?

Right - and probably incorrect.

if(<non-boolean value>) (cause of recent OpenSSL vuln)

This seems a little too vague to me. People check non-boolean values all
the time in 'if' conditions. Were you thinking about cases when <non
boolean value> was negative? Positive numbers seem completely legit to me
(e.g., some flag was set in a bitmask, etc.).

I'd be interested to see how often this actually comes up (as opposed
to <non-boolean value>&<bitmask>). The bug in OpenSSL was a function
that returned 1 for success, 0 for failure and -1 for error.
if(func(...)) obviously is trying to test for success but includes
error. Which is bad.

My claim is that if(x) is usually one of:

x is a pointer, and we are testing for non-NULL (which is fine).
x is boolean.
x is something else, and this is a bug.

I agree that people _may_ use if(x) as shorthand for if(x != 0), but I
am from the school of thought that says that asking people to make
their code clearer is not a sin, even if you are a static analyzer.

In either case, both of these seem like "auditor" checks. For the first
check if all you wanted to do is grep the syntax, we can easily add another
check that gets called by AnalysisConsumer. Check out the functions
"ActionXXX" in AnalysisConsumer.cpp to see all the places where the various
checks are called. An ActionXXX can get called for each function, an entire
Objective-C class, or the entire translation unit. For example:

static void ActionWarnDeadStores(AnalysisManager& mgr) {
if (LiveVariables* L = mgr.getLiveVariables()) {
BugReporter BR(mgr);
CheckDeadStores(*L, BR);
}
}

This action gets called for each function/method declaration. The action
queries the AnalysisManager (the thing that runs the checks) for the live
variables analysis result on the current function (which gets computed
on-the-fly). The function 'CheckDeadStores' is then called with (a) the
results of the live variables analysis and (b) and BugReporter object which
is used to issue bug diagnostics to the user. Analyses get registered in
the Analyses.def file, where the "scope" of the analysis is declaratively
specified. For example:

ANALYSIS(WarnDeadStores, "warn-dead-stores",
"Warn about stores to dead variables", Code)

Here "Code" means "code declaration"; that is a FunctionDecl or an
ObjCMethodDecl.

A syntactic check can just walk the AST. A good example of a syntactic
check is CheckObjCDealloc.cpp. Although this check will one day be revamped
to do more real analysis, you can see how it walks the AST using iterators
and does various pattern matching. Syntactic checks are fairly easy to
implement in this way and hook up to AnalysisConsumer. The down-side is
that they can be fairly dumb if the property you are checking requires any
real smarts.

Another possible way to implement an "auditor" is to use the GRAuditor
interface defined in GRAuditor.h:

template <typename STATE>
class GRAuditor {
public:
typedef ExplodedNode<STATE> NodeTy;
typedef typename STATE::ManagerTy ManagerTy;

virtual ~GRAuditor() {}
virtual bool Audit(NodeTy* N, ManagerTy& M) = 0;
};

Checks that subclass GRAuditor<const GRState*> can be registered with a
GRExprEngine and have their "Audit" method called after the path engine
evaluates all expressions of a given type. The Auditor is then free to
inspect the analysis state at that point and flag errors. A good example of
this kind of check is in BasicObjCFoundationChecks.cpp. The class in
question is AuditCFNumberCreate, which flags misuses of the function
CFNumberCreate
(http://developer.apple.com/documentation/CoreFOundation/Reference/CFNumberRef/Reference/reference.html).
That function just interposes on the path engine at every CallExpr and
checks the arguments in the function call for illegal values.

This is probably enough for one email. I'm more than happy to dig into more
details depending on what direction you decide to go in. Also, these
interfaces are obviously evolving. We haven't written many checks, so any
idea of how to refactor the interface, make it better, etc., are welcome.

Wow. OK, I'm going to have to play with code before I can comment, but
thanks for the detailed info!

I'd be interested to see how often this actually comes up (as opposed
to <non-boolean value>&<bitmask>). The bug in OpenSSL was a function
that returned 1 for success, 0 for failure and -1 for error.
if(func(...)) obviously is trying to test for success but includes
error. Which is bad.

If you look at Unix system calls, often the return value is 0 for
success, so if (foo()) {} may be checking correctly for failure...

My guess is you'd find lots of occurrences of this sort of thing in
the wild. Unless a style guide a project is using explicitly forbids
it, people will use it.

-Alexei

I found some time to take a look at this today, and I'm puzzled: I
can't see where auditors choose what type they are interested in?

2009/3/30 Ben Laurie <benl@google.com>

Ah, yes, thanks!

Hi Ben,

I just remembered that the auditor interface doesn't work for "branch" statements like IfStmt, ForStmt, WhileStmt, etc., since that uses a different "builder" interface. Also, what you are interested in auditing is the branch condition, not the branch itself. This can be done using the current auditor interface, with the auditor checking if a given expression is the condition for an IfStmt by using a ParentMap. I suggest doing this for an initial implementation. The problem is that "AddCheck" expects a StmtClass value, and there is no StmtClass value for "all expressions".

To me both these points indicate that the auditor interface needs to be evolved to make it more useful. I think Auditors needs a declarative way where they indicate what expressions they are interested in auditing. AddCheck would just take an auditor (and no StmtClass value) and then figure out when the auditor should be called by interrogating the auditor. This interface should potentially be rich enough to say "only audit expressions used as conditions for branches", etc. As I mentioned in my earlier email, the Auditor interface isn't extensively used yet, so it certainly can and should be evolved.

I just added a version of "AddCheck" that just takes an auditor an no StmtClass (this should help you Ben with implementing your check):

   http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20090330/014758.html

Auditors registered using this version of AddCheck will be called for every non-terminator statement/expression that appears within a basic block. As I said before, branches are not directly audited (because that part of the analysis engine literally just handles the branching of the paths *after* the branch condition has already been evaluated), but the branch conditions themselves will still get passed to the auditor.