Reporting a bug found at EndPath at its correct location

I am trying to solve a problem LIKE "unused variables": A symbol is assigned
a value and the value is never used.

For such a problem I need to track:
  Assignments: mark LHS as "assigned a value".
  Statements: for each symbol used in the expression, remove its "assigned a
value" property.
  EndPath: report each symbol that is marked "assigned a value".

The problem is that the report appears at the end of the path and not at the
point of the assignment.

On attempting to solve this problem I tried:
  Assignments: mark LHS as "assigned a value"; attach an ExplodedNode
marking the assignment location.
  Statements: for each symbol used in the expression, remove its "assigned a
value" property.
  EndPath: report each symbol that is marked "assigned a value", using the
previously attached ExplodedNode.

I would like to hear your opinions about these reporting scheme:
- Would this scheme work?
- Is it expensive in memory?
- Is there a preferred scheme?

Thanks.

I hate to say this, but you’re not going to be able to do this as a usual path-sensitive check. The main analyzer checkers are good at finding “is X true on any path”, but not so good at “is X true on all paths”. There are plenty of cases where the analyzer might not explore all possible paths:

  • The function is too big.
  • There’s a construct the analyzer can’t model (try-catch)
  • For efficiency, simplifying assumptions have been made that won’t be true in real life.
  • For efficiency, not all functions are analyzed as top-level, which means certain paths might not be taken due to known argument values.

And there are cases where the general rule isn’t going to help:

  • The region being stored to isn’t a local variable.
  • The address of the region escapes but not the region itself.

I think what you want is something more like the current DeadStoresChecker, which uses custom transfer functions to walk the CFG and detect if a store ever goes unread. It’s flow-sensitive, not path sensitive, but is there anything in particular that you want it to do that it doesn’t already?

Jordan

Sorry for the misunderstanding. It is just a bad choice of example; of course
I cannot correctly solve the issue of unused variables in a path-sensitive
check.

However, the issue I AM trying to solve IS path-sensitive:
I need to mark many statements in the path and emit my bug reports at end of
path.

My questions are about the scheme of creating and saving many ExplodedNodes
on the state, and then putting the bug reports on the relevant nodes:
- Would this scheme work?
- Is it expensive in memory?
- Is there a preferred scheme?

Thanks.

Fair enough. We've done a couple things like this before, the main one being leak reports (which need to report their allocation site). In these cases, we don't save the ExplodedNode for the leak event, but instead walk back up the path to the change of state that corresponds to the allocation site. I'm not sure if I've seen any real alternative strategies.

But there's nothing inherently wrong with doing it your way, other than that you can't clear out your map of ExplodedNodes until the end of analysis (because you don't know all the paths they participate in). There are two caveats I can think of, though:

- Putting ExplodedNodes in the state is a bad idea (because it means two branches will never be able to re-merge), so instead you could store them in a side table on your checker. This is usually discouraged (you'll notice checker callbacks are const) but if you're careful there should be no problem.

- Make sure to generate a new node at the point you want to track, even if you don't have any state to change. The analyzer is allowed to recycle ExplodedNodes that have the same state as their predecessor (plus a few other conditions) unless it was explicitly created by a checker.

I'd see if the "go find the node retroactively" approach works for you before trying the map approach, though.

Does that help?
Jordan

Oops, one key aspect of this scheme that I forgot to mention: you do still put information into the state, and you use that information at the end-of-path nodes to see if there was a problem. You just don't store the nodes where the problem happened, because you know you can find them later. You can take a look at the example SimpleStreamChecker for the simple case, though it doesn't show where the allocation site happened.

Note, also, that a single ExplodedNode may be problematic on multiple paths. For example, I might remember to call free() for an allocated region on one path, but forget on two other early exits. It's not exactly appropriate, then, to report the error at the allocation site, since the problem is path-sensitive.

Jordan

Thanks Jordan.
This is very helpful.
I will reconsider the need and if I still think I need to report along the
path, will try your approach.

BTW, The example in SimpleStreamChecker reports the leak where the symbol
dies and not where it is opened.