[RFC] Introduce overflow builtins

Hi,

Currently C/C++ programmers write ad-hoc integer overflow checks.
For example,

void *malloc_array(size_t n, size_t size)
{
  if (size && n > SIZE_MAX / size)
    return NULL;
  return malloc(n * size);
}

Some programmers would write:

  size_t bytes = n * size;
  if (size && n != bytes / size)
    return NULL;

Or in some crazier way:

https://github.com/ivmai/bdwgc/commit/83231d0ab5ed60015797c3d1ad9056295ac3b2bb

#define SQRT_SIZE_MAX ((1U << (sizeof(size_t) * 8 / 2)) - 1)

  if ((size | n) > SQRT_SIZE_MAX /* fast test */
      && size && n > SIZE_MAX / size)
    return NULL;

Downsides of ad-hoc overflow checks.

1) Hard to read.

2) Error-prone --- even true for overflow checking libraries written
   by security experts (see http://blog.regehr.org/archives/593).

3) Performance. Below is the generated code of the first example.
   Neither GCC nor Clang optimizes away the division (blame instcombine?).

malloc_array: # @malloc_array
  .cfi_startproc
# BB#0: # %entry
  testq %rsi, %rsi
  je .LBB0_3
# BB#1: # %land.rhs
  movq $-1, %rax
  xorl %edx, %edx
  divq %rsi
  cmpq %rdi, %rax
  jae .LBB0_3
# BB#2: # %return
  xorl %eax, %eax
  ret
.LBB0_3: # %if.end
  imulq %rdi, %rsi
  movq %rsi, %rdi
  jmp malloc # TAILCALL
.Ltmp0:
  .size malloc_array, .Ltmp0-malloc_array
  .cfi_endproc

Let's consider adding overflow builtins to Clang, in the form:

  bool __overflow_*(T*, T, T);

With overflow builtins, programmers can implement the example
as follows.

void *malloc_array(size_t n, size_t size)
{
  size_t bytes;
  if (__overflow_umul(&bytes, n, size))
    return NULL;
  return malloc(bytes);
}

These builtins can be easily lowered to LLVM overflow intrinsics
llvm.*.with.overflow.*. In the generated code, there is only one
more jno instruction.

malloc_array: # @malloc_array
  .cfi_startproc
# BB#0: # %entry
  movq %rdi, %rax
  mulq %rsi
  jno .LBB0_2
# BB#1: # %return
  xorl %eax, %eax
  ret
.LBB0_2: # %if.end
  movq %rax, %rdi
  jmp malloc # TAILCALL
.Ltmp0:
  .size malloc_array, .Ltmp0-malloc_array
  .cfi_endproc

The patch is available at:

https://github.com/xiw/clang/compare/xiw:master...xiw:builtin-overflow

Also see

http://llvm.org/bugs/show_bug.cgi?id=12290

- xi

Hi,

Currently C/C++ programmers write ad-hoc integer overflow checks.
For example,

I definitely agree that it makes sense to have builtins for these.

Downsides of ad-hoc overflow checks.

1) Hard to read.

2) Error-prone --- even true for overflow checking libraries written
  by security experts (see http://blog.regehr.org/archives/593).

3) Performance. Below is the generated code of the first example.
  Neither GCC nor Clang optimizes away the division (blame instcombine?).

Regardless of whether we expose builtins, it would be nice for the optimizer to recognize common idioms, this is just general goodness for a wide range of already-written code.

Let's consider adding overflow builtins to Clang, in the form:

  bool __overflow_*(T*, T, T);

With overflow builtins, programmers can implement the example
as follows.

This sort of prototype makes sense to me, given that C doesn't support multiple return values well. The builtin should start with __builtin though. This is a pretty obvious set of functionality, is there any established practice in other compilers (e.g. MSVC, ICC, or even other more obscure ones?). If there is standing practice somewhere else, it would be best to follow that lead, must so that we don't have to eventually implement both.

-Chris

Regardless of whether we expose builtins, it would be nice for the optimizer to recognize common idioms, this is just general goodness for a wide range of already-written code.

I agree.

Let's consider adding overflow builtins to Clang, in the form:

  bool __overflow_*(T*, T, T);

With overflow builtins, programmers can implement the example
as follows.

This sort of prototype makes sense to me, given that C doesn't support multiple return values well. The builtin should start with __builtin though. This is a pretty obvious set of functionality, is there any established practice in other compilers (e.g. MSVC, ICC, or even other more obscure ones?). If there is standing practice somewhere else, it would be best to follow that lead, must so that we don't have to eventually implement both.

Just out of curiosity, why __builtin_*? We already have __sync_* and __atomic_*. :wink: Actually I used __builtin_saddo (__builtin_sadd_with_overflow is just too long). Since this "saddo" looked a little bizarre to me, I changed it to __overflow_sadd.

I was also wondering which would be better, __overflow_*(T*, T, T) or __overflow_*(T, T, T*).

I didn't find overflow intrinsics in msvc and icc. They do have something like "__int64 __emul(int a, int b)" though.

http://msdn.microsoft.com/en-us/library/26td21ds.aspx

http://software.intel.com/file/6373

- xi

I see two different options for the functions:

(1) Provide __overflow_ # op which derives the types from the arguments.
In that case it should be just add, sub, div, mul, rem.

(2) Provide __overflow_ # op # _ # type which is explicitly typed. Op is
still naturally add, sub, div, mul, rem.

Using "umul" doesn't add any value IMO. One practical issue is how
should div/rem work for 0?

Joerg

Assuming the point of these builtins is primarily performance of
library implementations (since most projects would need some sort of
fallback for compilers which don't provide this builtin), I'm not sure
it makes sense to have div/rem at all; there isn't any underlying
llvm.div.with.overflow to map it to...

-Eli

Signed division can overflow (INT_MIN / -1). There is also the question
of division by 0. It seems natural to consider it as overflow in which
case both div and rem make sense.

Joerg

Given that people are going to use these builtins to implement secure/safe code,
it would also be nice if __builtin_overflow_div() also _forced_ people to think about both the quotient and remainder at the same time. For example: __builtin_overflow_div(T *quotient, T *remainder, T value, T divisor).

davez

Regardless of whether we expose builtins, it would be nice for the optimizer to recognize common idioms, this is just general goodness for a wide range of already-written code.

I agree.

Let's consider adding overflow builtins to Clang, in the form:

  bool __overflow_*(T*, T, T);

With overflow builtins, programmers can implement the example
as follows.

This sort of prototype makes sense to me, given that C doesn't support multiple return values well. The builtin should start with __builtin though. This is a pretty obvious set of functionality, is there any established practice in other compilers (e.g. MSVC, ICC, or even other more obscure ones?). If there is standing practice somewhere else, it would be best to follow that lead, must so that we don't have to eventually implement both.

Just out of curiosity, why __builtin_*? We already have __sync_* and __atomic_*. :wink:

AFAIK, these came to GCC through a spec established by the Itanium ABI, not because they fit well :slight_smile:

Actually I used __builtin_saddo (__builtin_sadd_with_overflow is just too long). Since this "saddo" looked a little bizarre to me, I changed it to __overflow_sadd.

I don't think a long name is a bad thing. These builtins will be rarely used, it's just that they are important when they do get used.

I was also wondering which would be better, __overflow_*(T*, T, T) or __overflow_*(T, T, T*).

I don't have a preference one way or the other. Are there any precedents?

I didn't find overflow intrinsics in msvc and icc.

Ok.

-Chris

As Eli said, there's no such thing as llvm.div.with.overflow ---
I am not sure if there's any performance gain from compiler support.

I would rather to leave div/rem/shl/shr for overflow libraries.
It's straightforward to define a function or macro to perform
overflow checking for them.

- xi

Okay. Let's try

  __builtin_*_with_overflow(T, T, T*);

to keep the names consistent with the LLVM counterparts. Also it's probably more intuitive to let the output parameter come after the inputs.

Does this sound good?

- xi

There is precedence for the opposite order. For example, the __sync_*() APIs put the pointer first: T*, T, ...

This makes sense too. One shouldn't have to reorder "x = y + z" to "y, z, &x". That'll just confuse people in the long run.

davez

Hah, the signature of __overflow_* was actually copied from __sync_bool_compare_and_swap. :wink:

- xi

As Eli said, there's no such thing as llvm.div.with.overflow ---

Just because it hasn't been done...

I am not sure if there's any performance gain from compiler support.

At least for x86, the same argument applies here as for the other cases:
you can detect it with a trivial flag compare. An additional branch, if
you also want to handle divisor of 0.

Joerg

Which case did you mean? The same argument holds for signed division,
but probably not for unsigned division and shifts.

If the LLVM IR adds llvm.sdiv.with.overflow, we can definitely add
a corresponding builtin. I feel like it is better to leave the
rest cases to an overflow library, and have the compiler focus on
providing minimum necessary support for overflow detection (rather
than implementing a complete library itself).

- xi

As originally said, a large point of doing it in the compiler is making
sure it is low overhead. unsigned divisions can't overflow in the
traditional sense. There is still the point of divide by zero and
depending on where you come from, it is a form of overflow. I don't
believe merging division and remainder computation forcefully provides
any value -- modulo can't overflow and if you checked the division for
divide-by-0, the modulo works. The cost of the
overflow-or-divide-by-zero checking builtins on x86 is one post-branch
and in case of divide/remainder one pre-branch per instruction.

Joerg

INT_MIN % -1 in C is undefined.

-Eli

Hi,

I updated the overflow builtins to the form

    __builtin_*_with_overflow(T*, T, T)

as suggested by Chris and Dave.

The patch is still available at

https://github.com/xiw/clang/compare/master...builtin-overflow

- xi

Xi,

Thanks for the update! Have you tried inferring the sign of T* in addition to the size of T* in the API? That would let you simplify this:

  __builtin_sadd_with_overflow()
  __builtin_uadd_with_overflow()
  __builtin_ssub_with_overflow()
  __builtin_usub_with_overflow()
  __builtin_smul_with_overflow()
  __builtin_umul_with_overflow()

To this:

  __builtin_add_with_overflow()
  __builtin_sub_with_overflow()
  __builtin_mul_with_overflow()

Also, have you verified that -Wconversion does the right thing and warns when the sign of the parameters are inconsistent with each other (or the API if you do not make the above simplification)?

davez

Looking at this and more importantly the behavior of GCC compiled code
-- this definitely means that / and % should be covered by overflow
checks...

Joerg

Have you tried inferring the sign of T* in addition to the size of T* in the API? That would let you simplify this:

  __builtin_sadd_with_overflow()
  __builtin_uadd_with_overflow()
  __builtin_ssub_with_overflow()
  __builtin_usub_with_overflow()
  __builtin_smul_with_overflow()
  __builtin_umul_with_overflow()

To this:

  __builtin_add_with_overflow()
  __builtin_sub_with_overflow()
  __builtin_mul_with_overflow()

Sounds like a good idea. I am just a little worried that this is more error-prone than explicitly specifying s/u (e.g., if the programmer accidentally mixes signed and unsigned integers). -Wconversion would help (see below), but I guess the option is turned off in general.

Also, have you verified that -Wconversion does the right thing and warns when the sign of the parameters are inconsistent with each other (or the API if you do not make the above simplification)?

I tried a few examples and -Wconversion seemed to work. Below is an example.

#include <stdlib.h>

void *malloc_array_1(int n, size_t size)
{
        size_t bytes;
        if (__builtin_umul_with_overflow(&bytes, n, size))
                return 0;
        return malloc(bytes);
}

void *malloc_array_2(size_t n, size_t size)
{
        int bytes;
        if (__builtin_umul_with_overflow(&bytes, n, size))
                return 0;
        return malloc(bytes);
}

$ clang -c -Wconversion m.c
m.c:6:43: warning: implicit conversion changes signedness: 'int' to 'size_t' (aka 'unsigned long') [-Wsign-conversion]
        if (__builtin_umul_with_overflow(&bytes, n, size))
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
m.c:16:16: warning: implicit conversion changes signedness: 'int' to 'size_t' (aka 'unsigned long') [-Wsign-conversion]
        return malloc(bytes);
               ~~~~~~ ^~~~~
m.c:14:43: warning: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'int'
      [-Wshorten-64-to-32]
        if (__builtin_umul_with_overflow(&bytes, n, size))
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^
m.c:14:46: warning: implicit conversion loses integer precision: 'size_t' (aka 'unsigned long') to 'int'
      [-Wshorten-64-to-32]
        if (__builtin_umul_with_overflow(&bytes, n, size))
            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ^~~~
4 warnings generated.

- xi