Reference to an int symbol in the store from PostCall

I am currently working on a checker that is supposed to detect missing error handling.

For example int myFunc(int ** buf) has side-effects on buf and in case of an error myFunc returns -1. The Idea is that the checker throws a report if the returned int can still be -1 when buf is used for the first time after the call.

A small example:

char * buf;
// myFunc returns -1 on error
int res = myFunc(&buf);
...
buf[0] = 'x';

The checker is supposed to throw an error on the last line because buf is used for the first time and res could still be -1.

The checker currently has a map from the main symbol (buf) to it’s associated symbol (reg). It gets these symbols in the PostCall method.

REGISTER_MAP_WITH_PROGRAMSTATE(AssociatedSymbolMap, const MemRegion *, SVal)

Whenever the main symbol is encountered the checker asserts if the associated symbol can still be -1. However, running the checker on the following code it still throws an error thinking that the associated Symbol can be -1.

char * buf;
// myFunc returns -1 on error
int res = myFunc(&buf);
if (res == -1)
    res = 42;

buf[0] = 'x';

This code should not lead to a report from my checker as res cannot be -1. It still thinks res can be -1 though.

I have also tried to store the associated symbol (res) as a nonloc::ConcreteInt but that didn’t work either. Res is correctly tracked in the store i just don’t know how to store a reference to it in my checkers map.

Any help would be greatly appreciated :slight_smile:

Mind that the value of res is not used anymore after the equality check; hence it’s going to be garbage-collected by the SymbolReaper. And for garbage collected symbols the ConstraintManager won’t associate any constraints, hence appearing as if it was never constrained at the buffer access.

You can verify this theory by keeping the value of res alive after the buffer access. Basically, inserting the (void)(res) after your buffer access will do it.

Also mind that you will have two paths:

  1. where res == -1 was true
  2. where res != -1 was true

On the (1) path you will definitely encounter the buggy buffer access. To prevent that you will need to sink that path by e.g. returning. This would mimic the error handling on that path, preventing the execution from reaching the buffer access.

About the symbol reclamation (aka. garbage collection):
This is a mark-and-sweep algorithm:

  1. The SymbolReaper will ask every checker for the set of symbols that they want to keep alive. You gotta make sure you keep your interesting symbol alive here. This is done by the checkLive callback.
  2. After all checker was asked, the SymbolReaper determines what symbols can be reclaimed and then notifies each checker to clean their state up. This is done by the checkDead callback. In your case, you should simply remove all entries from your map referring to such dead symbols.
1 Like

While @steakhal is most likely correct about symbol reaper, I think you’re missing a bigger point here:

Even though value of variable res cannot be -1 anymore, the symbol returned by myFunc() may still be equal to -1.

That’s what a symbol is. It’s a named value. When the variable is overwritten with a new value, the symbol still refers to the old value.

We’re basically approaching static analysis like a math problem for kids. Suppose the problem goes like “There were some apples on the plate, then Johnny ate 3 of them, and there were 4 left, how many apples there were initially?”. Then you start solving it by saying “Let X be the number of apples on the plate”. Then when Johnny eats 3 apples, X no longer represents the number of apples on the plate. The number of apples on the plate has changed to X - 3, but X itself didn’t change, it still refers to the original amount of apples. In this example “X” is what we call a symbol, whereas “number of apples on the plate” acts like a variable. This is what makes the symbol useful, as we can now have two names for the same value: X - 3 = 4, which gives us an equation that we can solve for X.

Static analyzer works the same way. It gives names to unknown values, then taking different branches in the program gives us equations over these values (“we took false branch, therefore the symbolic expression for the branch condition is equal to zero” - an equation!). If this system of equations has at least one solution, then the state is non-null and therefore the branch is worth exploring. Or, if we’re at division by symbolic value V, and adding an extra equation “V is non-zero” causes the system to have no solution, then there’s division by zero.


Now, to your specific problem. Even though your code makes sure that res isn’t equal to -1, this really isn’t important to your problem at all. You want to make sure that the user checks for errors returned from myFunc before using the buffer. But overwriting the variable in which the error code is stored isn’t a valid way to check for error in the first place! Overwriting the variable doesn’t undo the harm of using the buffer, it’s just an ostrich burrowing his head in the sand and thinking he becomes invisible this way. The code doesn’t squeeze 42 back into myFunc(), it simply throws away the actual error code and acts as if the error didn’t happen. So you really don’t care whether the code under analysis overwrites the variable or not. This is why symbols are useful and work exactly as intended!

Basically, in the following example:

char * buf, *buf2;
int res;

res = myFunc(&buf);
res = myFunc(&buf2);

if (res == -1)
  exit(1);

buf[0] = 'x';

even though variable res was checked for -1, the check for -1 wasn’t checking the right symbol, so it’s not a valid check and the warning should still be emitted. By tracking the map from region to symbol, as opposed to a map from region to variable, you achieve this automagically.

Now, back to your initial example, it depends on specifics whether it deserves a warning or not. The code clearly checks for -1, but it doesn’t accomplish anything to avoid the problem later on. Depending on what myFunc does, this may or may not be problematic. Say, if myFunc simply fills the buffer with garbage data, overwriting the buffer with some actual data isn’t bad at all, the code should be allowed to do that. But if myFunc calls free() on the buffer, then yeah, there’s no way to use it correctly ever.

So in the former case, where there are valid ways to use the buffer, you may want to settle on a weaker check that simply verifies that the error code was checked at all. This would be the check that doesn’t warn on your initial example. Then the question “Can that symbol be equal to -1?” becomes the wrong question to ask. The right question would be “Are there any equations over that symbol present in the state?”, regardless of their solutions. In order to answer such question, you can directly inspect the equations (the internal data structures of RangeConstraintManager), or you can subscribe to checkBranchCondition to find relevant branch conditions. If you go this way, you’ll sometimes notice that the symbol disappears from the state completely without branch conditions. If this happens, this means that there’s no way for the program to check the symbol anymore, so you can clean it up from your state and instead mark any future use of the buffer as unchecked. In this case checkDeadSymbols will helpfully inform you that the symbol is “garbage-collected” from the state, and you can refer to @steakhal’s answer for more insights on that.

In the latter case, you’re asking the right question with “Can that symbol be equal to -1?” and your initial example is an example of a valid warning that should not be removed. But you can still need to handle dead symbols correctly; if the symbol is “dead”, then all equations it was involved in may also be forgotten. In this case checkDeadSymbols() is, again, the last moment of time where you need to make a decision about that symbol, and it’s also the best moment of time to do so, as it’s proven that you won’t get any more information about that symbol. So, again, you’ll need to erase it from your state, and instead warn on every use of the buffer from that point onward.

1 Like