Static Analyzer and signed 64bit constraints

Hello,

I’m attempting to write a checker to look for certain signedness issues. I have code similar to the following:

int32_t z = -1;
read (STDIN_FILENO, &z, sizeof(z));

if (z > 20) exit (EXIT_FAILURE);

memcpy (x, y, z);

Inside the checker I am printing the constraints on the SVal ‘z’ with a PreCall, filtering on memcpy:

ProgramStateRef State = C.getState();

State->getConstraintManager().print(State, llvm::outs(), “\n”, " ");

I have two questions related to this:

  1. The constraint on the above value is printed as ‘conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }’. What happened to zero? My best guess is the static analyzer realizing that ‘memcpy(x, y, 0)’ is a nop and pruning that program state?

  2. The above code, aside from the zero exception, makes sense when z is an int8_t, int16_t, or an int32_t:

conj_$0{int8_t} : { [-128, -1], [1, 20] }

conj_$0{int16_t} : { [-32768, -1], [1, 20] }

conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }

However, if I switch this to an int64_t I get:

conj_$0{int64_t} : { [1, 20] }

What happened to all the negative numbers?

Apologies if I’ve missed something obvious.

Cheers!
-yrp

The constraint on the above value is printed as 'conj_$0{int32_t} : {

[-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and pruning that program state?

Yes, indeed, it is an effect of how we model 'memcpy'. I don't think it's the desired behavior though. In particular, we don't display a warning on 'memcpy(NULL, y, 0)', even though it's UB. So, yeah, sounds like either a minor bug. It's not obvious for me how often paths with zero-size memcpys are actually feasible in real-world programs.

> However, if I switch this to an int64_t I get:
> conj_$0{int64_t} : { [1, 20] }

Hmm, weird, i'm not able to reproduce it. Could you provide more info? I'm trying:

$ cat test.c

#include <stdlib.h>
#include <stdio.h>

void foo(void *x, void *y) {
int64_t z = -1;
read(1, &z, sizeof(z));
if (z > 20)
exit(-1);
memcpy(x, y, z);
// triggers state dump via ExprInspection
clang_analyzer_printState();
// don't garbage-collect 'z' until now
z;
}

$ clang --analyze test.c -Xclang -analyzer-checker -Xclang debug.ExprInspection

Store (direct and default bindings), 0x7fcf3114a5e8 :
(z,0,direct) : conj_$0{int64_t}
(GlobalInternalSpaceRegion,0,default) : conj_$1{int}
(GlobalSystemSpaceRegion,0,default) : conj_$2{int}

Expressions:
(0x7fcf30579920,0x7fcf3113c0c0) clang_analyzer_printState : &code{clang_analyzer_printState}

Ranges of symbol values:
conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }

Hi Artem,

Thanks for the reply and info. After minimizing it appears this is related to sizeof + 64bit perhaps? I had to modify your testcase slightly to demonstrate the behavior Im seeing:

#include <stdlib.h>
#include <stdio.h>

void foo(void) {
int64_t z = -1;
read(1, &z, sizeof(z));

char x[20] = { 0 };
char y[20] = { 0 };

// if (z > 20)
if (z > sizeof x)
exit(-1);
memcpy(x, y, z);
// triggers state dump via ExprInspection
clang_analyzer_printState();
// don't garbage-collect 'z' until now
z;
}

$ clang --analyze test.c -Xclang -analyzer-checker -Xclang debug.ExprInspection
...
conj_$0{int64_t} : { [1, 20] }

If I change the type of 'z' to int32_t I get:

conj_$0{int32_t} : { [-2147483648, -1], [1, 20] }

If I change the if guard on 'z' to the literal '20' and keep the type as int64_t I get the expected range:

conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }

Have I misunderstood some C behavior regarding sizeof and this is expected? I'll feel pretty silly if this is the case...

In case it's relevant: clang version 6.0.0 (trunk 318002) running on linux

Cheers!
yrp

The constraint on the above value is printed as 'conj_$0{int32_t} : {

[-2147483648, -1], [1, 20] }'. What happened to zero? My best guess is
the static analyzer realizing that 'memcpy(x, y, 0)' is a nop and
pruning that program state?

Yes, indeed, it is an effect of how we model 'memcpy'. I don't think
it's the desired behavior though. In particular, we don't display a
warning on 'memcpy(NULL, y, 0)', even though it's UB. So, yeah, sounds
like either a minor bug. It's not obvious for me how often paths with
zero-size memcpys are actually feasible in real-world programs.

However, if I switch this to an int64_t I get:
conj_$0{int64_t} : { [1, 20] }

Hmm, weird, i'm not able to reproduce it. Could you provide more info?
I'm trying:

$ cat test.c

#include <stdlib.h>
#include <stdio.h>

void foo(void *x, void *y) {
int64_t z = -1;
read(1, &z, sizeof(z));
if (z > 20)
exit(-1);
memcpy(x, y, z);
// triggers state dump via ExprInspection
clang_analyzer_printState();
// don't garbage-collect 'z' until now
z;
}

$ clang --analyze test.c -Xclang -analyzer-checker -Xclang
debug.ExprInspection

Store (direct and default bindings), 0x7fcf3114a5e8 :
(z,0,direct) : conj_$0{int64_t}
(GlobalInternalSpaceRegion,0,default) : conj_$1{int}
(GlobalSystemSpaceRegion,0,default) : conj_$2{int}

Expressions:
(0x7fcf30579920,0x7fcf3113c0c0) clang_analyzer_printState :
&code{clang_analyzer_printState}

Ranges of symbol values:
conj_$0{int64_t} : { [-9223372036854775808, -1], [1, 20] }

Hmm, this looks correct. Since typeof(sizeof(...)) is size_t, which is uint64_t on that architecture, and C promotion rules for comparison of int64_t and uint64_t would require conversion of both operands into uint64_t, all negative values of 'z' would turn out to be larger than 20. This is different from comparison of, say, int16_t and uint16_t, which would be converted to int for the purposes of comparison.

Like, the analyzer doesn't always do a good job around integral casts, but this case looks correct.

Arg, man that's frustrating but makes sense when you point it out. Sorry for wasting your time with it, and thanks again for the clarification on the other point.

Cheers,
-yrp

Hmm, this looks correct. Since typeof(sizeof(...)) is size_t, which is
uint64_t on that architecture, and C promotion rules for comparison of
int64_t and uint64_t would require conversion of both operands into
uint64_t, all negative values of 'z' would turn out to be larger than
20. This is different from comparison of, say, int16_t and uint16_t,
which would be converted to int for the purposes of comparison.

Like, the analyzer doesn't always do a good job around integral casts,
but this case looks correct.