Handling of pointer difference in llvm-gcc and clang

Hi,

We are developing a bounded model checker for C/C++ programs (http://baldur.iti.kit.edu/llbmc/) that operates on LLVM's intermediate representation. While checking a C++ program that uses STL containers we noticed that llvm-gcc and clang handle pointer differences in disagreeing ways.

Consider the following C function:
int f(int *p, int *q)
{
     return q - p;
}

Here's the LLVM code generated by llvm-gcc (2.9):
define i32 @f(i32* %p, i32* %q) nounwind readnone {
entry:
   %0 = ptrtoint i32* %q to i32
   %1 = ptrtoint i32* %p to i32
   %2 = sub nsw i32 %0, %1
   %3 = ashr exact i32 %2, 2
   ret i32 %3
}

And here is what clang (2.9) produces:
define i32 @f(i32* %p, i32* %q) nounwind readnone {
   %1 = ptrtoint i32* %q to i32
   %2 = ptrtoint i32* %p to i32
   %3 = sub i32 %1, %2
   %4 = ashr exact i32 %3, 2
   ret i32 %4
}

Thus, llvm-gcc added the nsw flag to the sub, whereas clang didn't.

We think that clang is right and llvm-gcc is wrong: it could be the case that p and q point into the same array, that q is 0x80000000, and that p is 0x7FFFFFFE. Then the sub results in a signed overflow, i.e., sub with nsw is a trap value.

Is this a bug in llvm-gcc?

--Stephan

Hi Stephan,

We are developing a bounded model checker for C/C++ programs
(http://baldur.iti.kit.edu/llbmc/) that operates on LLVM's intermediate
representation. While checking a C++ program that uses STL containers
we noticed that llvm-gcc and clang handle pointer differences in
disagreeing ways.

Consider the following C function:
int f(int *p, int *q)
{
      return q - p;
}

Here's the LLVM code generated by llvm-gcc (2.9):
define i32 @f(i32* %p, i32* %q) nounwind readnone {
entry:
    %0 = ptrtoint i32* %q to i32
    %1 = ptrtoint i32* %p to i32
    %2 = sub nsw i32 %0, %1
    %3 = ashr exact i32 %2, 2
    ret i32 %3
}

And here is what clang (2.9) produces:
define i32 @f(i32* %p, i32* %q) nounwind readnone {
    %1 = ptrtoint i32* %q to i32
    %2 = ptrtoint i32* %p to i32
    %3 = sub i32 %1, %2
    %4 = ashr exact i32 %3, 2
    ret i32 %4
}

Thus, llvm-gcc added the nsw flag to the sub, whereas clang didn't.

We think that clang is right and llvm-gcc is wrong: it could be the
case that p and q point into the same array, that q is 0x80000000, and
that p is 0x7FFFFFFE. Then the sub results in a signed overflow, i.e.,
sub with nsw is a trap value.

Is this a bug in llvm-gcc?

in llvm-gcc (and dragonegg) this is coming directly from GCC's gimple:

f (int * p, int * q)
{
   long int D.2718;
   long int D.2717;
   long int p.1;
   long int q.0;
   int D.2714;

<bb 2>:
   q.0_2 = (long int) q_1(D);
   p.1_4 = (long int) p_3(D);
   D.2717_5 = q.0_2 - p.1_4;
   D.2718_6 = D.2717_5 /[ex] 4;
   D.2714_7 = (int) D.2718_6;
   return D.2714_7;

}

Signed overflow in the difference of two long int (ptrdiff_t) values results in
undefined behaviour according to the GCC type system, which is where the nsw
flag comes from.

The C front-end generates this gimple in the pointer_diff routine. The above is
basically a direct transcription of what pointer_diff does.

In short, I don't know if this is right or wrong; but if it is wrong it seems
to be a bug in GCC's C frontend.

Ciao, Duncan.

Hi Duncan,

Hi Stephan,

> We are developing a bounded model checker for C/C++ programs
> (http://baldur.iti.kit.edu/llbmc/) that operates on LLVM's intermediate
> representation. While checking a C++ program that uses STL containers
> we noticed that llvm-gcc and clang handle pointer differences in
> disagreeing ways.
>
> Consider the following C function:
> int f(int *p, int *q)
> {
>
> return q - p;
>
> }
>
> Here's the LLVM code generated by llvm-gcc (2.9):
> define i32 @f(i32* %p, i32* %q) nounwind readnone {
>
> entry:
> %0 = ptrtoint i32* %q to i32
> %1 = ptrtoint i32* %p to i32
> %2 = sub nsw i32 %0, %1
> %3 = ashr exact i32 %2, 2
> ret i32 %3
>
> }
>
> And here is what clang (2.9) produces:
> define i32 @f(i32* %p, i32* %q) nounwind readnone {
>
> %1 = ptrtoint i32* %q to i32
> %2 = ptrtoint i32* %p to i32
> %3 = sub i32 %1, %2
> %4 = ashr exact i32 %3, 2
> ret i32 %4
>
> }
>
> Thus, llvm-gcc added the nsw flag to the sub, whereas clang didn't.
>
> We think that clang is right and llvm-gcc is wrong: it could be the
> case that p and q point into the same array, that q is 0x80000000, and
> that p is 0x7FFFFFFE. Then the sub results in a signed overflow, i.e.,
> sub with nsw is a trap value.
>
> Is this a bug in llvm-gcc?

in llvm-gcc (and dragonegg) this is coming directly from GCC's gimple:

f (int * p, int * q)
{
   long int D.2718;
   long int D.2717;
   long int p.1;
   long int q.0;
   int D.2714;

<bb 2>:
   q.0_2 = (long int) q_1(D);
   p.1_4 = (long int) p_3(D);
   D.2717_5 = q.0_2 - p.1_4;
   D.2718_6 = D.2717_5 /[ex] 4;
   D.2714_7 = (int) D.2718_6;
   return D.2714_7;

}

Signed overflow in the difference of two long int (ptrdiff_t) values
results in undefined behaviour according to the GCC type system, which is
where the nsw flag comes from.

The C front-end generates this gimple in the pointer_diff routine. The
above is basically a direct transcription of what pointer_diff does.

In short, I don't know if this is right or wrong; but if it is wrong it
seems to be a bug in GCC's C frontend.

Ciao, Duncan.

In section 6.5.6 (Additive operators) the C standard says that the result of
subtracting two pointers should have type ptrdiff_t, which is signed. I could
not find any clear statement about this particular case in the standard, but to
me it does not make much sense to treat the subtraction as signed just because
the result is. As Stephan mentioned above treating the subtraction as signed
declares the result of subtracting pointers smaller than 0x7F...F from
pointers larger than 0x80...0 as undefined. This can hardly be intendend.

Regards,
Florian

Hi Stephan,

> We are developing a bounded model checker for C/C++ programs
> (http://baldur.iti.kit.edu/llbmc/) that operates on LLVM's intermediate
> representation. While checking a C++ program that uses STL containers
> we noticed that llvm-gcc and clang handle pointer differences in
> disagreeing ways.
>
> Consider the following C function:
> int f(int *p, int *q)
> {
> return q - p;
> }
>
> Here's the LLVM code generated by llvm-gcc (2.9):
> define i32 @f(i32* %p, i32* %q) nounwind readnone {
> entry:
> %0 = ptrtoint i32* %q to i32
> %1 = ptrtoint i32* %p to i32
> %2 = sub nsw i32 %0, %1
> %3 = ashr exact i32 %2, 2
> ret i32 %3
> }
>
> And here is what clang (2.9) produces:
> define i32 @f(i32* %p, i32* %q) nounwind readnone {
> %1 = ptrtoint i32* %q to i32
> %2 = ptrtoint i32* %p to i32
> %3 = sub i32 %1, %2
> %4 = ashr exact i32 %3, 2
> ret i32 %4
> }
>
> Thus, llvm-gcc added the nsw flag to the sub, whereas clang didn't.
>
> We think that clang is right and llvm-gcc is wrong: it could be the
> case that p and q point into the same array, that q is 0x80000000, and
> that p is 0x7FFFFFFE. Then the sub results in a signed overflow, i.e.,
> sub with nsw is a trap value.
>
> Is this a bug in llvm-gcc?

in llvm-gcc (and dragonegg) this is coming directly from GCC's gimple:

f (int * p, int * q)
{
   long int D.2718;
   long int D.2717;
   long int p.1;
   long int q.0;
   int D.2714;

<bb 2>:
   q.0_2 = (long int) q_1(D);
   p.1_4 = (long int) p_3(D);
   D.2717_5 = q.0_2 - p.1_4;
   D.2718_6 = D.2717_5 /[ex] 4;
   D.2714_7 = (int) D.2718_6;
   return D.2714_7;

}

Signed overflow in the difference of two long int (ptrdiff_t) values results in
undefined behaviour according to the GCC type system, which is where the nsw
flag comes from.

The C front-end generates this gimple in the pointer_diff routine. The above is
basically a direct transcription of what pointer_diff does.

In short, I don't know if this is right or wrong; but if it is wrong it seems
to be a bug in GCC's C frontend.

Shouldn't we cc this over to the gcc mailing list for clarification then?
             Jack