I was toying with CSA when I noticed something weird.
On the following example, CSA warns about a division by zero for x and y, but if I replace malloc by any other function then the warning disappears, including other libc functions such as fopen or rand, and functions without body (eg. a simple void f(); to give as little information as possible about what the function does).
#include <stdlib.h>
#include <unistd.h>
int x = 0;
static int y = 0;
int main() {
int z = 0;
// free is only called to prevent a warning about un-freed memory
free(malloc(sizeof(int)));
if (x != 0) {
// with malloc, warning here
return 1 / 0;
}
if (y != 0) {
// with malloc, warning here
return 1 / 0;
}
if (z != 0) {
return 1 / 0;
}
return 1;
}
I would assume that either all extern functions trigger the warning (considering that variables are global and might be changed by those functions), or that none should (assuming that external functions don’t change global variables defined in the current compilation unit).
I guess CSA went the 2nd way, and it seems that something in how malloc is modeled is incorrect and the knowledge about x and y is dropped after the call.
If it’s actually not a bug then I’m curious to know why this is.
FYI I tried with clang 14.0.6 and 15.0.0, on x86_64-unknown-linux-gnu.
The default, fallback model for all functions is that they may, in theory, mutate your global variables. It avoids false positives but has obvious false negatives which aligns with our overall preferences of not bothering the user until we’re certain there’s a bug.
But we may choose like five other ways to model a function’s behavior when we have extra information: we have hardcoded perfect models for some functions like malloc(), we will behave differently when we have access to the source code, for some functions we provide “fake” source code (or, to be more precise, fake Clang AST), for some functions we have “partial” models, etc.
That said, I’d already expect us to know that library functions typically don’t mutate user-defined variables, especially static variables (unless they’re passed into them directly). If that doesn’t work, it’s a bug.
The default, fallback model for all functions is that they may, in theory, mutate your global variables.
This is what I initially assumed before trying the example above.
we have hardcoded perfect models for some functions like malloc()
Well in my example it seems to be the exact opposite then: malloc is assumed to be a able to mutate global variables, and unknown functions are assumed not to…
Oh, right, ok, whew, most of the stuff works correctly then.
perfect models for some functions like malloc()
Wait, never mind, the model for malloc() is partial. So, malloc() mostly works correctly, just like other functions.
I debugged this a bit deeper, looks like data doesn’t disappear because we “invalidate globals”, it disappears because the relatively young memory model (RegionStore) subsystem that lets us trust global constant initializers when analyzing main() (but not in any other analysis entry point) suddenly forgets that it started in main(). Something special happening near malloc() triggers this, I didn’t debug what exactly, but one way or another, this information shouldn’t ever be forgotten, so the fix probably goes in RegionStore.
One way or another, most of the stuff appears to be mostly correct, so, whew.