Detecting undefined pointer arithmetic

I noticed that none of the sanitizers seems to support checking for
out-of-bounds pointer arithmetic, even though my understanding of
the C standard is that this is undefined behavior. In particular, I
believe the following trivial program has undefined behavior (assuming
malloc() succeeds), but none of the sanitizers flag any warnings:

#include <stdlib.h>
int main(void) {
   char *buf = malloc(1);
   if (buf) {
      char *this_is_ub = buf + 3;
      free(buf);
   }
}

Of course, I suspect this just has not been implemented yet, but
it still leaves me at a loss for how to track this form of UB down.
Is there a better solution than manual code review?

Sincerely,

Demi Obenour

This program does not have UB.
There’s nothing wrong with forming an "out-of-bound” pointer.
If you use it for anything, then that is UB - and address sanitizer will find such usages for you.

Like this:

#include <stdlib.h>
int main(void) {
  int ret = 0;
  char *buf = (char *) malloc(1);
  if (buf) {
     char *this_is_ub = buf + 3;
     ret = *this_is_ub;
     free(buf);
  }
  return ret;
}

Dear All,

Actually, I think this is technically undefined behavior as (IIRC) C allows a pointer to extend one byte past the end of the referent memory object but not any further than that.

That said, I wouldn’t be surprised if the sanitizers do not catch this case because:

  1. Since the code doesn’t do anything useful, dead code elimination may remove the offending code before the sanitizers instrument the program.
  2. The out of bounds pointer is not used to read or write memory, so it can’t corrupt or leak memory. About the most danger it poses is that it may enable optimizations that the programmer is not expecting because the code is undefined.

I’ve lost track of all the different sanitizers in LLVM and how they work, but the original Address Sanitizer just checks for out of bounds loads and stores; it doesn’t place any checks on pointer arithmetic operations (like the LLVM gep instruction). That makes it faster at the cost of not catching all pointer arithmetic errors.

Regards,

John Criswell

We do actually have (at least) two sanitizers that catch some cases of this source of undefined behavior.

  1. We have a pointer overflow sanitizer that catches pointer arithmetic that wraps around the address space.

buf + very_large_number

  1. We have an array bounds sanitizer that catches pointer arithmetic that leaves the bounds of an array of known size.

int n = 7;
int arr[5];
int *p = arr + n;

Both checks are somewhat simplistic and could be extended to catch more cases:
(1) could be taught to catch pointer arithmetic that leaves the bounds determined by __builtin_object_size.
(2) could be taught to determine the known array bound in more cases, for example catching ‘&n + 2’ because (at least in C++) the language rules say that ‘n’ is treated as an array of bound 1 for the purposes of pointer arithmetic.

We would/should not exploit UB in such a case, at least not in the shown
example. The pointer computation might yield `poison` but that is it.
If you'd use the pointer afterwards, the situation is different though.

~ Johannes

The code pattern I see in practice is essentially:

    char *p = malloc(some_bytes);
    if (!p)
        goto fail;
    initialize(p, some_bytes);
    char *end = p + some_bytes;
    int offset = untrusted_user_input();
    if (offset < 0 || offset > (1 << 28))
        goto fail;
    if (p + offset > end)
        goto fail;

    /* assume that offset is in bounds from here on */

RPM uses this quite a bit.

Sincerely,

Demi

The problem is the following pattern in C:

#include <stdlib.h>
#include <stdint.h>
#include <arpa/inet.h>
#include <string.h>
#include <stdbool.h>

/* Struct for parsing results */
typedef struct some_struct some_struct;
/* Stream of untrusted data */
typedef struct untrusted_stream untrusted_stream;

/* Parse untrusted data into `output`.  Returns false on error */
bool parse(untrusted_stream *stream, some_struct *output);

/*
 * Read `len` bytes from `stream` into `ptr`.  Returns false if it cannot read
 * the full amount.
 */
bool read_data(untrusted_stream *stream, void *ptr, uint32_t len);

/* Maximum input length */
static const uint32_t MAX_INPUT = 1024 * 1024;

static bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out);

/**
 * Parse some untrusted data of length `len`.  We are guaranteed that `ptr` points
 * to `len` bytes of data.
 * @param ptr Pointer to `len` bytes of untrusted data.
 * @param len The length of the untrusted data.
 */
bool process_untrusted_input(const uint8_t *ptr, const size_t len, some_struct *bar) {
   const uint8_t *end = ptr + len; /* end is now one-past-the-end */
   uint32_t foo;
   /* some parsing code */
   if (!extract_be_u32(ptr, end, &foo))
      return false;
   foo = ntohl(foo);
   /* more parsing code follows that uses foo */
   return true;
}

bool extract_be_u32(const uint8_t *ptr, const uint8_t *end, uint32_t *out) {
   uint32_t res;
   /* this is the bug: it should be `end - ptr < sizeof res` */
   if (ptr + sizeof res > end)
      return false;
   memcpy(&res, ptr, sizeof res);
   *out = ntohl(res);
   return true;
}

/* Parse some data from `stream` into `output`.  The data is not trusted. */
bool parse(untrusted_stream *stream, some_struct *output) {
   uint32_t len;
   if (!read_data(stream, &len, sizeof len))
      return false;
   len = ntohl(len);
   /* Avoid DoS attacks */
   if (len > MAX_INPUT)
      return false;
   uint8_t *ptr = malloc(len);
   if (!ptr || !read_data(stream, ptr, len))
      return false;
   bool res = process_untrusted_input(ptr, len, output);
   free(ptr);
   return res;
}

This code has undefined behavior if `process_untrusted_input` is passed less
than 4 bytes of input, which an attacker can often arrange to happen. However,
I know of no way to detect this, or even test for it. Sanitizers and valgrind
won’t catch it, since the out-of-bounds pointer is not dereferenced. And the
amount by which the pointer is out of bounds is bounded, so it won’t wrap
in practice. And if Clang optimized away the bounds check, the result is a
security vulnerability.

This isn’t just theoretical; I have seen similar bugs in the wild.
Fortunately, I haven’t seen any cases of them being miscompiled.

Will Clang miscompile code like this? Even though it is undefined behavior,
it happens often enough that I would prefer if Clang made it defined as
an extension. Dereferencing the out-of-bounds pointer would still be UB, of
course. I believe there is precedent for this, in that type-punning via unions
is undefined behavior in C++ but (to my knowledge) works on both GCC and Clang.
Also, GCC supports `-fwrapv-pointer`.

Sincerely,

Demi

Hi All,

There are two instrumentations in the ASan that are related to the subject:

  • pointer-subtract,
  • pointer-compare.

They check if a pointer arithmetic operation of a given kind (subtract or compare) is not UB. The pointers must be from the same memory buffer allocation, including the one-element-after end pointer.

The option must be combined with […] […] By default the check is disabled at run time. To enable it, add detect_invalid_pointer_pairs=2 to the environment variable ASAN_OPTIONS. Using detect_invalid_pointer_pairs=1 detects invalid operation only when both pointers are non-null.

https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Instrumentation-Options.html

These were good news. Bad news are:

Despite the above issues, these checks have found some bugs in my projects. I consider them useful if you develop in UB-free pedantic mode.

Have fun,
Paweł Bylica