annotating libc++ to catch buffer overflows in vector/string with asan

Hi Marshall, Howard,

[continuing our discussion with Marshall on llvm dev meeting]
Consider the following test case:

#include
#include <assert.h>
typedef long T;
int main() {
std::vector v;
v.push_back(0);
v.push_back(1);
v.push_back(2);
assert(v.capacity() >= 4);
T *p = &v[0];
return p[3]; // OOPS.
}

Here we have a vector with size==3 and capacity==4, and we are reading
the element with index 3 using raw pointer arithmetic.
Today we have no way to detect this bug. AddressSanitizer is silent because
we still access the heap-allocated buffer within bounds.
I propose to annotate the libc++ sources so that AddressSanitizer can catch these bugs.
Here is a prototype of the change.
I’ve only annotated __push_back_slow_path which is enough to handle the test case above.
More methods will need to be annotated for full functionality.
Similar annotations are possible for std::string.
__sanitizer_annotate_contiguous_container is declared in
compiler-rt/include/sanitizer/common_interface_defs.h

— include/vector (revision 195011)
+++ include/vector (working copy)
@@ -282,6 +282,15 @@

define _LIBCPP_ASSERT(x, m) ((void)0)

#endif

+#if __has_feature(address_sanitizer)
+extern “C” void __sanitizer_annotate_contiguous_container(void *, void *,

  • void *, void *);
    +# define _LIBCPP_ANNOTATE_CONTIGUOUS_CONTAINER(beg, end, old_mid, new_mid) \
  • __sanitizer_annotate_contiguous_container(beg, end, old_mid, new_mid)
    +#else
    +# define _LIBCPP_ANNOTATE_CONTIGUOUS_CONTAINER(beg, end, old_mid, new_mid)
    +#endif

Hi Marshall, Howard,

[continuing our discussion with Marshall on llvm dev meeting]
Consider the following test case:

#include <vector>
#include <assert.h>
typedef long T;
int main() {
  std::vector<T> v;
  v.push_back(0);
  v.push_back(1);
  v.push_back(2);
  assert(v.capacity() >= 4);
  T *p = &v[0];
  return p[3]; // OOPS.
}

Here we have a vector with size==3 and capacity==4, and we are reading
the element with index 3 using raw pointer arithmetic.
Today we have no way to detect this bug. AddressSanitizer is silent because
we still access the heap-allocated buffer within bounds.
I propose to annotate the libc++ sources so that AddressSanitizer can catch these bugs.
Here is a prototype of the change.
I've only annotated __push_back_slow_path which is enough to handle the test case above.
More methods will need to be annotated for full functionality.
Similar annotations are possible for std::string.
__sanitizer_annotate_contiguous_container is declared in
compiler-rt/include/sanitizer/common_interface_defs.h

--- include/vector (revision 195011)
+++ include/vector (working copy)
@@ -282,6 +282,15 @@
# define _LIBCPP_ASSERT(x, m) ((void)0)
#endif

+#if __has_feature(address_sanitizer)
+extern "C" void __sanitizer_annotate_contiguous_container(void *, void *,
+ void *, void *);
+# define _LIBCPP_ANNOTATE_CONTIGUOUS_CONTAINER(beg, end, old_mid, new_mid) \
+ __sanitizer_annotate_contiguous_container(beg, end, old_mid, new_mid)
+#else
+# define _LIBCPP_ANNOTATE_CONTIGUOUS_CONTAINER(beg, end, old_mid, new_mid)
+#endif
+
#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
#pragma GCC system_header
#endif
@@ -1525,7 +1534,12 @@
     // __v.push_back(_VSTD::forward<_Up>(__x));
     __alloc_traits::construct(__a, _VSTD::__to_raw_pointer(__v.__end_), _VSTD::forward<_Up>(__x));
     __v.__end_++;
+ if (data())
+ _LIBCPP_ANNOTATE_CONTIGUOUS_CONTAINER(
+ data(), data() + capacity(), data() + size(), data() + capacity());
     __swap_out_circular_buffer(__v);
+ _LIBCPP_ANNOTATE_CONTIGUOUS_CONTAINER(data(), data() + capacity(),
+ data() + capacity(), data() + size());
}

Running:
% clang++ -g -O -fsanitize=address -I$HOME/llvm/projects/libcxx/include vect_oob.cc -fno-exceptions; ./a.out

==20838==ERROR: AddressSanitizer: contiguous-container-buffer-overflow on address 0x60300000eff8 at pc 0x465455 bp 0x7fffc0e9a9f0 sp 0x7fffc0e9a9e8
READ of size 8 at 0x60300000eff8 thread T0
    #0 0x465454 in main /tmp/vect_oob.cc:11

0x60300000eff8 is located 24 bytes inside of 32-byte region [0x60300000efe0,0x60300000f000)
allocated by thread T0 here:
    #0 0x44f8fe in operator new(unsigned long)
    #1 0x46559d in std::__1::allocator<long>::allocate(unsigned long, void const*)
    #2 0x46559d in std::__1::allocator_traits<std::__1::allocator<long> >::allocate(std::__1::allocator<long>&, unsigned long)
    #3 0x46559d in std::__1::vector<long, std::__1::allocator<long> >::size() const
    #4 0x46559d in void std::__1::vector<long, std::__1::allocator<long> >::__push_back_slow_path<long const>(long const&)

Do you like the idea?

Yes - very much so.

Any comments about the prototype patch?

Yes.

1) I would implement it with inline functions other than macros.

Like this:

extern "C" void __sanitizer_annotate_contiguous_container(void *, void *, void *, void *);
void inline __annotate_contiguous container
  ( const void *__beg, const void *__end, const void *__old_mid, const void *new_mid )
#if __has_feature(address_sanitizer)
  { __sanitizer_annotate_contiguous_container(beg, end, old_mid, new_mid); }
#else
  {}
#endif

2) I'd like more information about what __sanitizer_annotate_contiguous_container does :wink:

Any suggestion how to test these annotations in libc++ (we need some kind of DEATH tests)?

We already have failing tests in libc++.
they are named XXXX.fail.cpp.

Getting them to fail only under ASAN will be trickier.

-- Marshall

Marshall Clow Idio Software <mailto:mclow.lists@gmail.com>

A.D. 1517: Martin Luther nails his 95 Theses to the church door and is promptly moderated down to (-1, Flamebait).
        -- Yu Suzuki

> Do you like the idea?

Yes - very much so.

> Any comments about the prototype patch?

Yes.

1) I would implement it with inline functions other than macros.

Like this:

extern "C" void __sanitizer_annotate_contiguous_container(void *, void *,
void *, void *);
void inline __annotate_contiguous container
        ( const void *__beg, const void *__end, const void *__old_mid,
const void *new_mid )
#if __has_feature(address_sanitizer)
        { __sanitizer_annotate_contiguous_container(beg, end, old_mid,
new_mid); }
#else
        {}
#endif

Makes sense, see below.
We''l probably want to move __annotate_contiguous container into a separate
file so that both string and vector use it.
But let's do the whole thing with vector before moving to string.

2) I'd like more information about what
__sanitizer_annotate_contiguous_container does :wink:

In short, it [un]poisons the parts of shadow memory that correspond to the
container's memory.
More details in compiler-rt files:
include/sanitizer/common_interface_defs.h and lib/asan/asan_poisoning.cc

> Any suggestion how to test these annotations in libc++ (we need some
kind of DEATH tests)?

We already have failing tests in libc++.
they are named XXXX.fail.cpp.

These are tests that fail to compile.
for asan annotations we need tests that build and run, and then either fail
at run-time or check the state of shadow.
Like these:
compiler-rt/lib/asan/lit_tests/TestCases/contiguous_container.cc
compiler-rt/lib/asan/lit_tests/TestCases/contiguous_container_crash.cc

Suggestions?

--kcc

--- include/vector (revision 195116)
+++ include/vector (working copy)
@@ -288,6 +288,16 @@

_LIBCPP_BEGIN_NAMESPACE_STD

+extern "C" void __sanitizer_annotate_contiguous_container(
+ const void *, const void *, const void *, const void *);
+void inline __annotate_contiguous_container
+ (const void *__beg, const void *__end, const void *__old_mid, const
void *__new_mid)
+#if __has_feature(address_sanitizer)
+ { __sanitizer_annotate_contiguous_container(__beg, __end, __old_mid,
__new_mid); }
+#else
+ {}
+#endif

Attached is an intermediate patch that passes my tests (also attached). Few questions:

The patch gets a bit too verbose and it has a few repetitions, e.g. we have this snippet repeated multiple times:

  • __annotate_contiguous_container(data(), data() + capacity(),
  • data() + size(), data() + size() + 1);

WDYT about adding a private method to vector like this (and a couple more)?

void __annotate_size_increase(size_t __n) {

__annotate_contiguous_container(data(), data() + capacity(),
data() + size(), data() + size() + __n);

}

In order to test the annotation properly I need to do something like this as often as possible:

template void TestVector(std::vector &v) {
if (v.capacity() == 0) return;
char beg = (char)&v.front();
char *end = beg + v.capacity() * sizeof(T);
char *mid = beg + v.size() * sizeof(T);
for (char *p = beg; p < mid; p++)
assert(!__asan_address_is_poisoned(p));
for (char *p = mid ; p < end; p++)
assert(__asan_address_is_poisoned(p));
}

Is it possible to add code like this as another private method of vector (under some ifdef)?

–kcc

vector_asan_annotations_test.cc (2.06 KB)

vec.patch (6.94 KB)

Marshall,
Here is a more complete patch for asan vector annotations.It passes all tests in libcxx/test/containers/sequences/vector, but doesn’t yet work on larger programs (I tried Chrome).
Two questions:

  1. Does it look like something potentially committable?
  2. Would you suggest some bigger test suite specifically targeted to test std::vector?

Thanks,

–kcc

vector_asan.patch (8.31 KB)

Here is another version; this time I’ve been able to build and run Chromium with libc++ and these annotations.

–kcc

vector_asan.patch (7.46 KB)

Marshall,
Any chance to start the review this month?
Slightly updated patch attached.
FYI: we applied the same approach to libstdc++, ran various tests and found dozens of bugs.
These annotations also help to achieve more precise leak detection with LeakSanitizer.
https://code.google.com/p/address-sanitizer/wiki/ContainerOverflow

If this patch has any chance to get committed eventually, I’d like to understand your preference regarding the testing.
One approach is to test it with asan: then all you need is to run all existing vector tests built with

clang -fsanitize=address -D_LIBCPP_ADDRESS_SANITIZER_ANNOTATIONS
But this will tie the testing process to clang, I am not sure if that is acceptable for you (totally ok for me).
Another approach is to provide a fake standalone implementation of __sanitizer_annotate_contiguous_container;
it will be a slightly less accurate testing, but will not depend on clang.

Thanks,

–kcc

vector_asan.patch (7.47 KB)

Marshall,
Any chance to start the review this month?
Slightly updated patch attached.
FYI: we applied the same approach to libstdc++, ran various tests and found dozens of bugs.
These annotations also help to achieve more precise leak detection with LeakSanitizer.
https://code.google.com/p/address-sanitizer/wiki/ContainerOverflow

Kostya —

First off, I’m sorry for the slow response.

Second, I’m very much in favor of adding these capabilities to the standard library.

Third, looking over this code, it seems to me to be a lot of changes. You seem to tracking all the changes for the storage in vector, discriminating between insertions, deletions, growth and shrinkage. That seems overly complicated to me.

Thinking about invariants; we know that std::vector maintains a block of memory
bounded by [data(), data()+(sizeof(value_type)*capacity())),
of which the block [data(), data()+(sizeof(value_type)*size))
is actually in use.

Why not just turn off “inside the block” ASAN checking upon entry to routines that modify the vector, and reestablish it upon return?

So, for push_back() (as an example), we could write:

template <class _Tp, class _Allocator>
inline _LIBCPP_INLINE_VISIBILITY
void
vector<_Tp, _Allocator>::push_back(const_reference __x)
{
__turn_off_ASAN_internal_for(data());
if (this->_end != this->__end_cap())
{
__alloc_traits::construct(this->__alloc(),
_VSTD::__to_raw_pointer(this->_end), __x);
++this->_end;
}
else
__push_back_slow_path(__x);
__turn_on_ASAN_internal_for(data(), data()+size()); // or whatever parameters ASAN needs
}

[ In actuality, I wouldn’t do it this way, I’d put the calls in the constructor/destructor of an object; that way fast returns and exceptions would not interfere, but you get the idea. ]

If this patch has any chance to get committed eventually, I’d like to understand your preference regarding the testing.
One approach is to test it with asan: then all you need is to run all existing vector tests built with

clang -fsanitize=address -D_LIBCPP_ADDRESS_SANITIZER_ANNOTATIONS
But this will tie the testing process to clang, I am not sure if that is acceptable for you (totally ok for me).
Another approach is to provide a fake standalone implementation of __sanitizer_annotate_contiguous_container;
it will be a slightly less accurate testing, but will not depend on clang.

I think to test this we’ll have to augment the libc++ test process.
It currently has two kinds of tests - ones that fail to compile, and ones that compile and run successfully.
We’ll either need a third kind of test, one that is supposed to fail at runtime, -or-, an ASAN hook to catch ASAN errors and record them.
I’m leaning towards the first one - I don’t think we want to deliberately invoke undefined behavior (which is what this is), and expect anything further from the test.

What do you think?

— Marshall

Hi Marshall,

Marshall,
Any chance to start the review this month?
Slightly updated patch attached.
FYI: we applied the same approach to libstdc++, ran various tests and
found dozens of bugs.
These annotations also help to achieve more precise leak detection with
LeakSanitizer.
https://code.google.com/p/address-sanitizer/wiki/ContainerOverflow

Kostya —

First off, I’m sorry for the slow response.

np, I was having some other fun :slight_smile:

Second, I’m very much in favor of adding these capabilities to the
standard library.

Third, looking over this code, it seems to me to be a lot of changes.

This is sadly so...

You seem to tracking all the changes for the storage in vector,
discriminating between insertions, deletions, growth and shrinkage. That
seems overly complicated to me.

Thinking about invariants; we know that std::vector maintains a block of
memory
bounded by [data(), data()+(sizeof(value_type)*capacity())),
of which the block [data(), data()+(sizeof(value_type)*size))
is actually in use.

Correct.

Why not just turn off “inside the block” ASAN checking upon entry to
routines that modify the vector, and reestablish it upon return?

Because asan-off and asan-on have complexity O(capacity()).
At every point of time the vector looks like GGGGGGPBBB,
where G is good 8-bytes, B is bad 8-bytes and P is a single partially good
8-byte word.
The status of every 8 bytes in application is encoded in a single byte in
shadow.
turn-off or turn-on operation will require to write capacity()/8 bytes to
the shadow.

So, for push_back() (as an example), we could write:

template <class _Tp, class _Allocator>
inline _LIBCPP_INLINE_VISIBILITY
void
vector<_Tp, _Allocator>::push_back(const_reference __x)
{
    __turn_off_ASAN_internal_for(data());
    if (this->__end_ != this->__end_cap())
    {
        __alloc_traits::construct(this->__alloc(),
                                  _VSTD::__to_raw_pointer(this->__end_),
__x);
        ++this->__end_;
    }
    else
        __push_back_slow_path(__x);
    __turn_on_ASAN_internal_for(data(), data()+size()); // or whatever
parameters ASAN needs
}

[ In actuality, I wouldn’t do it this way, I’d put the calls in the
constructor/destructor of an object; that way fast returns and exceptions
would not interfere, but you get the idea. ]

If this patch has any chance to get committed eventually, I'd like to
understand your preference regarding the testing.
One approach is to test it with asan: then all you need is to run all
existing vector tests built with
clang -fsanitize=address -D_LIBCPP_ADDRESS_SANITIZER_ANNOTATIONS
But this will tie the testing process to clang, I am not sure if that is
acceptable for you (totally ok for me).
Another approach is to provide a fake standalone implementation of
__sanitizer_annotate_contiguous_container;
it will be a slightly less accurate testing, but will not depend on clang.

I think to test this we’ll have to augment the libc++ test process.
It currently has two kinds of tests - ones that fail to compile, and ones
that compile and run successfully.
We’ll either need a third kind of test, one that is supposed to fail at
runtime, -or-, an ASAN hook to catch ASAN errors and record them.
I’m leaning towards the first one - I don’t think we want to deliberately
invoke undefined behavior (which is what this is), and expect anything
further from the test.

Some tests could be "ones that compile and run successfully"
We can validate the shadow of the vector using
__asan_memory_region_is_poisoned.
I have a test that does various operations on vector and then calls this:

extern "C" bool __asan_address_is_poisoned(void *addr);

template <class T> void TestVector(std::vector<T> &v) {
  if (v.capacity() == 0) return;
  char *beg = (char*)v.data();
  char *end = beg + v.capacity() * sizeof(T);
  char *mid = beg + v.size() * sizeof(T);
  for (char *p = beg; p < mid; p++)
    assert(!__asan_address_is_poisoned(p));
  for (char *p = mid ; p < end; p++)
    assert(__asan_address_is_poisoned(p));
}

I'd also like to have at least a couple of tests that are supposed to fail
at runtime,
just like the lit-style tests we have for the rest of LLVM.

In both cases we will need to test libc++ with the fresh clang compiler
(which supports asan and asan's annotations)
Will this work?

--kcc

FTR: we have similar changes in our branch of libstdc++: http://gcc.gnu.org/viewcvs/gcc?view=revision&revision=207517
These annotations helped us detect ~15 bugs, and counting.

–kcc