Checking if a ParmVarDecl is null in a Checker

I’m attempting to add a Checker subclass that will warn on NSError-returning functions that return failure without having written to their outError parameter.

The bit I’m currently stuck on is this simple test input:

  - (BOOL)failWithError:(NSError **)outError;
  {
    if (outError) {
      *outError = [NSError errorWithDomain:@"domain" code:1 userInfo:nil];
    }
    (void)outError; // Added as a precaution in reference to <http://lists.llvm.org/pipermail/cfe-dev/2017-April/053510.html>
    return NO;
  }

In my `checkEndFunction` implementation, I can look up the ParmVarDecl for `outError`, but if I do something like:

    class Loc LV = State->getLValue(OutError, LocCtxt);
    ConditionTruthVal IsNull = State->isNull(LV);

The IsNull returned is marked under constrained when checking the path where outError was NULL.

Since this is my first attempt at writing a checker after reading <https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf> and <https://clang-analyzer.llvm.org/checker_dev_manual.html>, I’m probably doing something horribly wrong (still stumbling around trying to figure out the difference between SVal/Loc/SymbolRef/MemRegion and their conversions…), so any pointers would be helpful.

Thanks!

-tim

I’m attempting to add a Checker subclass that will warn on NSError-returning functions that return failure without having written to their outError parameter.

+George because i heard that he was also tackling a similar idea.

The bit I’m currently stuck on is this simple test input:

  - (BOOL)failWithError:(NSError **)outError;
  {
    if (outError) {
      *outError = [NSError errorWithDomain:@"domain" code:1 userInfo:nil];
    }
    (void)outError; // Added as a precaution in reference to <http://lists.llvm.org/pipermail/cfe-dev/2017-April/053510.html>
    return NO;
  }

In my `checkEndFunction` implementation, I can look up the ParmVarDecl for `outError`, but if I do something like:

    class Loc LV = State->getLValue(OutError, LocCtxt);
    ConditionTruthVal IsNull = State->isNull(LV);

The IsNull returned is marked under constrained when checking the path where outError was NULL.

LV in your code would represent the address of variable "outError" on the stack. It will always be non-null, but that's not the value you're looking for. You need to load from the variable:

SVal RV = State->getSVal(LV, OutError->getType());

...or something like that.

Since this is my first attempt at writing a checker after reading <https://llvm.org/devmtg/2012-11/Zaks-Rose-Checker24Hours.pdf> and <https://clang-analyzer.llvm.org/checker_dev_manual.html>, I’m probably doing something horribly wrong (still stumbling around trying to figure out the difference between SVal/Loc/SymbolRef/MemRegion and their conversions…), so any pointers would be helpful.

You might find my old workbook moderately useful: https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf

Probably also http://lists.llvm.org/pipermail/cfe-dev/2017-June/054084.html because it's slightly more correct in some places.

LV in your code would represent the address of variable "outError" on the stack. It will always be non-null, but that's not the value you're looking for. You need to load from the variable:

  SVal RV = State->getSVal(LV, OutError->getType());

...or something like that.

Ah, I was wondering if that was the case. But trying this:

     class Loc LV = State->getLValue(OutError, LocCtxt);
     SVal RV = State->getSVal(LV, OutError->getType());
     llvm::errs() << " checking null on " << RV << "\n";
     ConditionTruthVal IsNull = State->isNull(RV);
      llvm::errs() << " IsNull.isUnderconstrained() = " << IsNull.isUnderconstrained() << "\n";

On the branch where outError is NULL, I get:

  checking null on &SymRegion{reg_$1<NSError ** outError>}
  IsNull.isUnderconstrained() = 1

You might find my old workbook moderately useful: https://github.com/haoNoQ/clang-analyzer-guide/releases/download/v0.1/clang-analyzer-guide-v0.1.pdf

Probably also http://lists.llvm.org/pipermail/cfe-dev/2017-June/054084.html because it's slightly more correct in some places.

Thanks — looks like there is lots of good background info there!

-tim

Hmm, ok, the problem may also be there:

(void)outError; // Added as a precaution in reference to <http://lists.llvm.org/pipermail/cfe-dev/2017-April/053510.html>

Because in this case you're still doing your check *after* the precaution code which is the last use of the variable.

You might want to move (or rather duplicate) your check into checkDeadSymbols(). Because once the symbol is dead, there's no way the program would be able to constrain it or initialize the memory it points to. So you can warn immediately when the symbol is dead, based on the information that's already available, assuming that we're still within the function of interest.

I believe that the check in checkEndFunction is still necessary in case the function of interest is also inlined during analysis, because in this case the symbol may remain alive for a separate reason (still referenced by the caller).

Note - this is a guess, i didn't actually look at how it works. The easiest way for you to understand that is to dump the "exploded graph" and see everything for yourself, as explained in http://clang-analyzer.llvm.org/checker_dev_manual.html#visualizing

I was guessing that the outError would never go out of scope since it is a parameter, but I guess I don’t know for sure that it wouldn’t do so as you suggest. I was starting to guess that the constraint might only be valid inside the two arms of the if but that might be misguided. Implementing the evalAssume call back, I can that it does mark outError a couple times as well, so that’d be a second option if checkDeadSymbols doesn’t pan out.

Thanks, this gives me a couple paths to try.

-tim

The constraint definitely is supposed to remain after the if. That’s the whole point of the analyzer. That’s essentially the difference between the exploded graph and your usual CFG. The constraint itself may disappear when the program no longer references the constrained symbol, but the two analysis paths are not merged until they lead to (program state, program point) pairs that are actually exactly identical in every possible aspect. I still strongly encourage you to have a look at the exploded graph. It’ll immediately explain the whole analysis step-by-step to you and show you everything that’s available on every step and you won’t have to blindly experiment with random callbacks anymore. Please ask if you’ll have problems understanding it.

For extremely simple examples, the exploded graph is sort of tractable, but in large cases it is difficult to figure out what is going on. Maybe I missing something but I’d find it much more useful if I could find a way to:

- Have my custom state (registered with REGISTER_MAP_WITH_PROGRAMSTATE) be logged in the nodes
- Have error nodes marked red
- Have newly added/updated/removed state called out with some style changes (bold blue for added, bold black for updated, red for removed?)
- Add log messages of findings in my checker callbacks on the ProgramState or ExplodedNode

These might be a project to tackle on their own, but if there are tips for getting some useful meaning out of the exploded graph, it would be helpful.

Thanks,

-tim

Yep, indeed, this interface could be improved dramatically with a bit of work.

I'm usually getting away pretty well by searching the graph in clever manners. If your dot viewer doesn't support text search across the graph (most don't), you might want to convert it to .svg (eg. dot -Tsvg graph.dot -o graph.svg) and open the svg in a web browser and then use the find-on-page feature.

I still strongly encourage you to have a look at the exploded graph. It'll immediately explain the whole analysis step-by-step to you and show you everything that's available on every step and you won't have to blindly experiment with random callbacks anymore. Please ask if you'll have problems understanding it.

For extremely simple examples, the exploded graph is sort of tractable, but in large cases it is difficult to figure out what is going on. Maybe I missing something but I’d find it much more useful if I could find a way to:

- Have my custom state (registered with REGISTER_MAP_WITH_PROGRAMSTATE) be logged in the nodes

This can be done by implementing the printState() callback. See PthreadLockChecker, MallocChecker, RetainCountChecker, etc. For now not many checkers implement it, but i wouldn't mind if all checkers implemented it, as long as they don't print anything when they don't track anything.

- Have error nodes marked red

You can use -trim-egraph and you'll only see error nodes and paths towards them. Most error nodes would in this case be at the end of the trimmed path (though if multiple non-fatal errors are produced on the same path, the one in the middle would still not be very apparent).

Also checker error nodes, like all nodes created by the checker, are marked with the checker's tag (at the bottom of the rectangle), so you can search for the checker name in the graph to find them.

- Have newly added/updated/removed state called out with some style changes (bold blue for added, bold black for updated, red for removed?)

Yep, that would have been great. I usually get away with searching for the specific symbol or region or binding that i'm interested in - it immediately highlights where a particular string has disappeared from the state.

- Add log messages of findings in my checker callbacks on the ProgramState or ExplodedNode

I didn't ever try that, but you could probably make nodes with custom ProgramPointTag's that contain arbitrary strings.

Yep, indeed, this interface could be improved dramatically with a bit of work.

I’m usually getting away pretty well by searching the graph in clever manners. If your dot viewer doesn’t support text search across the graph (most don’t), you might want to convert it to .svg (eg. dot -Tsvg graph.dot -o graph.svg) and open the svg in a web browser and then use the find-on-page feature.

Thanks for the hints!

I started printing the state pointers in my log output and that’s helping match up my log messages with where I am in the graph.

One other thing that occurred to me is that some of the large graphs I’m seeing are due to inlining across ObjC methods:

  • (BOOL)inner:(NSError **)outError; {…}
  • (BOOL)outer:(NSError **)outError; {
    return [self inner: outError];
    }

Here I’ll get something like:

begin function
begin call
begin function

end function
end call

end function

but I don’t need this sort of inlining for my checker. When processing -outer: I can assume that the call to -inner: does the right thing (writes to outError if its result is null, and leaves it undefined if it is non-null), and then -inner: can be independently checked.

But, if a method calls a block literal, I do need to let the inner function be processed. I haven’t yet arrived at a way using generateSink() that doesn’t just end the whole checking pass — is there a way to conditionally stop inline checking that I can use here?

void NSErrorWriteChecker::checkBeginFunction(CheckerContext &C) const {

if (!C.inTopFrame()) {
const LocationContext *LocCtxt = C.getLocationContext();
if (dyn_cast(LocCtxt->getDecl()) == NULL) {
// ???

return;
}
}


}

Thanks!

-Tim

The analyzer constructs only one exploded graph for every top-level function. It’s the same graph used by all checkers. This is why we have dozens of checkers enabled simultaneously without significant performance regressions - we are designed to scale really well in this sense. But it also means that different checkers cannot have different ideas of what functions they need to inline.

You really do want to inline as much as possible. Even if nothing interesting happens within the function with respect to your checker, the analyzer will be able to model everything else correctly, and the ultimate goal of that would be to make sure that the execution path on which a bug is found by your checker is indeed a valid path through the program. For instance, if the function does not ever return null on any path, the analyzer will not proceed to explore the path on which the return value of that function is assumed to be null; if it did explore such path, any bugs found on such path would immediately be classified as false positives.

Still, the analyzer cannot inline everything, because sometimes the source code of the function is not available, and in other cases it needs to make difficult choices on whether it’s worth it to inline a huge function and make the user wait forever for the analysis to complete, or skip the function and risk producing false positives. And the lack of inlining often does indeed requiring a special care on the checker side to carefully assume that the inlined function is indeed doing something sane (that’s why checkers are subscribing for “pointer escapes” and/or “invalidations”).

That said, i don’t think i understand your question. Controlling inlining by the checker is definitely not possible, but what makes you think it’s necessary? Because for now no checkers have ever needed that. Just in case - you might be thinking about a situation like this:

// calls the block and exits
void call_block(^(*block)());

void foo() {
call_block(^{});
}

In this case if call_block isn’t inlined, there’s no way for the analyzer to understand that the block will be called within call_block(). Letting the checker model such call_block() call by explaining to the analyzer core that this function does indeed call the block is currently an open problem for the analyzer. It should be possible, but nobody had it implemented. Some such functions are modeled with the “body farm” technique - eg. dispatch_once(), but there’s no way of doing it in the checker.