Buildling with/without AddressSanitizer causes divergent execution behaviour

Hi,

# TL;DR

I've been building an application with and without the address
sanitizer (with gcc 5.3 and clang 3.7.1) and I've observed that the
application's behaviour changes (assertion hit/ not hit). I'm
wondering if this could be a bug in address sanitizer or if the
application I'm running is just buggy (e.g. doing bad things like
relying on memory layout, etc.). I'm also observing ASan reporting a
heap-use-after-free which Valgrind is not reporting, which makes me
wonder if it is a false positive.

Any hints on how I might determine this? Building with UBSan doesn't
turn up anything.

# Longer version (if you are interested in the specific details)

The application of interest is the Z3 constraint solver [1].

Much of what I'm going to say is covered in [2] which is a bug report
I opened (including a heap-use-after-free AddressSanitizer finds, I'm
not sure if this a false positive or not) but here are the basics of
what I found.

Build Z3 as follows

git clone https://github.com/Z3Prover/z3.git
cd z3

# Now apply the attached patch.
# Basically this makes it so that in ``examples/c/test_capi.c``
# the main() function only calls two functions.
# Note if you build without the patch when running ``c_example``
program AddressSanitizer
# reports a heap-use-after-free. I'm not sure if this a false positive
or not. Valgrind doesn't
# seem to think there's a problem.

# Build with ASan, Assertion will be hit when running the example
CXX=clang++ CC=clang CXXFLAGS="-fno-omit-frame-pointer
-fsanitize=address" LDFLAGS="-fsanitize=address" python
scripts/mk_make.py --debug --noomp --build build_asan
cd build_asan
make
make c_examples
LD_LIBRARY_PATH=`pwd` ./c_example
...
ASSERTION VIOLATION
File: ../src/sat/sat_clause.h
Line: 59

# Now build without ASan
cd ../
CXX=clang++ CC=clang python scripts/mk_make.py --debug --noomp --build
build_noasan
cd build_noasan
make
make c_example
LD_LIBRARY_PATH=`pwd` ./c_example

# No assertion is hit

Any insights/suggestions on how I could debug what I'm seeing further
would be appreciated.

[1] https://github.com/z3prover/z3
[2] AddressSanitizer is reporting problems running the c_example · Issue #436 · Z3Prover/z3 · GitHub

Thanks,
Dan.

patch_c_examples.patch (1.19 KB)

Hi Dan,

Hi,

# TL;DR

I've been building an application with and without the address
sanitizer (with gcc 5.3 and clang 3.7.1) and I've observed that the
application's behaviour changes (assertion hit/ not hit). I'm
wondering if this could be a bug in address sanitizer or if the
application I'm running is just buggy (e.g. doing bad things like
relying on memory layout, etc.).

We've experienced such situations as well. Usually this was caused by some wild memory write which was unnoticed before but started to corrupt something important due to changed stack/heap layout.

You can try to apply ASan selectively to parts of your program to isolate the problem.

Hi Kostya & Yury,

Thanks for the advice.

Hi Dan,

Hi,

# TL;DR

I've been building an application with and without the address
sanitizer (with gcc 5.3 and clang 3.7.1) and I've observed that the
application's behaviour changes (assertion hit/ not hit). I'm
wondering if this could be a bug in address sanitizer or if the
application I'm running is just buggy (e.g. doing bad things like
relying on memory layout, etc.). I'm also observing ASan reporting a
heap-use-after-free which Valgrind is not reporting, which makes me
wonder if it is a false positive.

Let us start from this heap-use-after-free report. The one in
https://github.com/Z3Prover/z3/issues/436 looks legitimate.
Unless the application does something extremely weird and tricky,
heap-use-after-free reports are usually true positives.

The developers tell me they are doing some unusual things and they
believe that this might the cause of the report.

Can you somehow verify that this heap-use-after-free is happening?
E.g. print all the pointer values coming from memory::allocate, coming into
memory::deallocate, and coming into sat::clause::operator[]

If curious, check what size of quarantine is required to catch this bug
(ASAN_OPTIONS=quarantine_size_mb=N, default=256)
Valgrind may have smaller default quarantine and thus misses this bug.

I was lazy and just told valgrind to execute the program (built by gcc
without ASan) with the largest quarantine it supported.

LD_LIBRARY_PATH=`pwd` valgrind --freelist-vol=10000000000 ./c_example

It didn't report any problems. This fills me with some confidence that
when the application is compiled without ASan that it probably doesn't
have a heap-use-after-free.

Does the application have threads? (If yes, did you run with TSan?)

Not yet but I've stumbled across an issue that looks interesting. See below

Did you try msan?

I just have and it immediately reported a problem. I took a closer
look and it looks like a false positive to me.

Here are the steps to reproduce

git clone https://github.com/Z3Prover/z3.git
cd z3
git checkout 9ed7dadc0251db992b44984edfa6c586aab20ecb
CC=clang CXX=clang++ CXXFLAGS="-fsanitize=memory -fPIE -pie
-fno-omit-frame-pointer -fsanitize-memory-track-origins"
LDFLAGS="-fsanitize=memory" python scripts/mk_make.py --build
build_msan_clang --noomp --debug
cd build_msan_clang
make
make c_example
LD_LIBRARY_PATH=`pwd` ./c_example
==26936==WARNING: MemorySanitizer: use-of-uninitialized-value
    #0 0x7fa7d906f3b0 in Z3_open_log
/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/../src/api/api_log.cpp:33:13
    #1 0x55c0c03107f5 in main
/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/../examples/c/test_capi.c:2794:5
    #2 0x7fa7d78f960f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)
    #3 0x55c0c024d838 in _start
(/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/c_example+0x1c838)

  Uninitialized value was created by a heap allocation
    #0 0x55c0c0253ea0 in __interceptor_malloc
(/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/c_example+0x22ea0)
    #1 0x7fa7e2a6c120 in memory::allocate(unsigned long)
/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/../src/util/memory_manager.cpp:276:16
    #2 0x7fa7d906f125 in Z3_open_log
/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/../src/api/api_log.cpp:31:20
    #3 0x55c0c03107f5 in main
/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/../examples/c/test_capi.c:2794:5
    #4 0x7fa7d78f960f in __libc_start_main (/usr/lib/libc.so.6+0x2060f)

SUMMARY: MemorySanitizer: use-of-uninitialized-value
/home/dsl11/dev/klee/z3/z3_upstream/build_msan_clang/../src/api/api_log.cpp:33:13
in Z3_open_log
Exiting

Side note using ``MSAN_OPTIONS="halt_on_error=0`` doesn't seem to do
anything, the application always exits when it hits the reported bug.
I'd like a way to catch this in gdb but I'm not sure how to do it.

I took a look. MSan is complaining about this code

std::ostream * g_z3_log = 0;

...

    Z3_bool Z3_API Z3_open_log(Z3_string filename) {
        if (g_z3_log != 0)
            Z3_close_log();
        g_z3_log = alloc(std::ofstream, filename);
        g_z3_log_enabled = true;
        if (g_z3_log->bad() || g_z3_log->fail()) {
            dealloc(g_z3_log);
            g_z3_log = 0;
            return Z3_FALSE;
        }
        return Z3_TRUE;
    }

It seems to be complaining that ``g_z3_log`` is uninitialised when
doing ``g_z3_log->bad() || g_z3_log->fail()``. This seems incorrect
because ``alloc()`` appears to be this macro (see
``src/util/memory_manager.h``).

#define alloc(T,...) new (memory::allocate(sizeof(T))) T(__VA_ARGS__)

With the macro expanded it looks like this

g_z3_log = new (memory::allocate(sizeof(std::ofstream)))
std::ofstream(filename);

I think this is calling the "placement" version of ``operator new``
and so constructs a new object (in this case ``std::ofstream``) using
the constructor but uses the memory allocated by
``memory::allocate(sizeof(T))``.

Therefore the memory in the pointer ``g_z3_log`` is initialised and
the "use-of-uninitialized-value" report is a false positive. Would you
agree with this assessment?

Thanks,
Dan.

> Can you somehow verify that this heap-use-after-free is happening?
> E.g. print all the pointer values coming from memory::allocate, coming
into
> memory::deallocate, and coming into sat::clause::operator[]
>
> If curious, check what size of quarantine is required to catch this bug
> (ASAN_OPTIONS=quarantine_size_mb=N, default=256)
> Valgrind may have smaller default quarantine and thus misses this bug.

I was lazy and just told valgrind to execute the program (built by gcc
without ASan) with the largest quarantine it supported.

There are some differences between Clang and GCC, such as left-to-right vs
right-to-left argument evaluation. This usually bites people who are moving
unique_ptr or vector and accessing it in the same expression. Given the
stack traces, I suspect that different orderings of refcount operations is
what's happening here. I would suggest building Z3 with Clang without ASan
and running valgrind on that.

> Did you try msan?

I just have and it immediately reported a problem. I took a closer
look and it looks like a false positive to me.

...

This is because your STL was not compiled with MSan. The std::ofstream
constructor is provided by libstdc++, the writes are not instrumented, and
MSan never sees the initialization. Unfortunately, MSan is not very useful
unless you recompile your *entire* application minus glibc with msan. =/

It is possible to build an MSan-ified libc++ and use it if you want to keep
trying, but it's involved.

Exact steps:
https://github.com/google/sanitizers/wiki/MemorySanitizerLibcxxHowTo

Exact steps:
https://github.com/google/sanitizers/wiki/MemorySanitizerLibcxxHowTo

**face palm**: I should take a closer look at the documentation.

Thanks.

> Can you somehow verify that this heap-use-after-free is happening?
> E.g. print all the pointer values coming from memory::allocate, coming
> into
> memory::deallocate, and coming into sat::clause::operator[]
>
> If curious, check what size of quarantine is required to catch this bug
> (ASAN_OPTIONS=quarantine_size_mb=N, default=256)
> Valgrind may have smaller default quarantine and thus misses this bug.

I was lazy and just told valgrind to execute the program (built by gcc
without ASan) with the largest quarantine it supported.

There are some differences between Clang and GCC, such as left-to-right vs
right-to-left argument evaluation. This usually bites people who are moving
unique_ptr or vector and accessing it in the same expression. Given the
stack traces, I suspect that different orderings of refcount operations is
what's happening here. I would suggest building Z3 with Clang without ASan
and running valgrind on that.

Yeah I tried building Z3 with Clang without ASan and running Valgrind.
It didn't report any problems.

I don't think this is a GCC vs Clang problem. When I build Z3 with GCC
or Clang with ASan enabled both of them report
the same heap-use-after-free issue.

The original problem I reported (the divergent behaviour when building
with and without ASan, i.e. the assert being hit/not hit respectively)
should be disregarded.
I was struggling to reproduce the issue just now and I gave up and
then I accidently hit it again. Then when I tried to run valgrind on a
binary (that wasn't supposed to be built
with ASan) built by Z3's build system it complained about ASan. I
think there's something with wrong with Z3's build system and reusing
the same binary directory for non-asan / asan
build is broken in some way. Lesson learnt, always use a fresh binary
build directory!

The heap-use-after-free issue is reproducible though. I'll report out
what I found to the Z3 devs and leave it for now.

> Did you try msan?

I just have and it immediately reported a problem. I took a closer
look and it looks like a false positive to me.

...

This is because your STL was not compiled with MSan. The std::ofstream
constructor is provided by libstdc++, the writes are not instrumented, and
MSan never sees the initialization. Unfortunately, MSan is not very useful
unless you recompile your *entire* application minus glibc with msan. =/

It is possible to build an MSan-ified libc++ and use it if you want to keep
trying, but it's involved.

Thanks for pointing out. Out of interest how is MSan able to work with
a non MSan-ified glibc?

Thanks,
Dan.

By intercepting almost the entire libc interface and annotating inputs and
outputs:
https://github.com/llvm-mirror/compiler-rt/blob/master/lib/sanitizer_common/sanitizer_common_interceptors.inc
It's a C ABI boundary, so this is hard but feasible. Interposing all of
libstdc++ is less feasible.

Hi,

Well I dug into Z3's codebase a little more and figured out what the
problem was. If you're curious see [1].

What worries me more is that prior to a heap-use-after-free being
reported there an out of bounds write occurs but ASan doesn't catch it
which seems like a bug to me. Note I'm using Clang 3.7.1

Here's how to reproduce (you need to build this on Linux on a x86_64 machine)

git clone -b asan_miss_out_of_bounds https://github.com/delcypher/z3-1.git src
cd src
CXX=clang++ CC=clang CXXFLAGS="-fno-omit-frame-pointer
-fsanitize=address" LDFLAGS="-fsanitize=address" python
scripts/mk_make.py --build build_clang_asan --noomp --debug
cd build_clang_asan
make -j4
make c_example
LD_LIBRARY_PATH=`pwd` ./c_example

You should see output like

```
...
About to do out of bounds access!
Did out of bounds access! If doing an ASan build do I get printed?

Hi,

Well I dug into Z3's codebase a little more and figured out what the
problem was. If you're curious see [1].

Neat bug. :slight_smile:

What worries me more is that prior to a heap-use-after-free being
reported there an out of bounds write occurs but ASan doesn't catch it
which seems like a bug to me. Note I'm using Clang 3.7.1
...
This seems like a bug to me. Thoughts?

m_segments is at the end of the clause_allocator object, which I'm assuming
is allocated in another object sls here:
        clause_allocator m_alloc; // clause allocator
        clause_vector m_bin_clauses; // binary clauses
The out-of-bounds access probably touches memory in m_bin_clauses.

One of ASan's limitations is that it can't currently catch intra-object
overflow:
https://github.com/google/sanitizers/wiki/AddressSanitizerIntraObjectOverflow
There's a prototype that adds padding to make it possible to catch this
kind of bug, but I haven't seen anyone pushing it forward for a while now.

For practical purposes this does not exist yet :frowning:

m_segments is at the end of the clause_allocator object, which I'm assuming
is allocated in another object sls here:
        clause_allocator m_alloc; // clause allocator
        clause_vector m_bin_clauses; // binary clauses
The out-of-bounds access probably touches memory in m_bin_clauses.

One of ASan's limitations is that it can't currently catch intra-object
overflow:
https://github.com/google/sanitizers/wiki/AddressSanitizerIntraObjectOverflow
There's a prototype that adds padding to make it possible to catch this kind
of bug, but I haven't seen anyone pushing it forward for a while now.

Wow thanks for the useful and fast response. It's a shame that ASan
can't catch this but in general this problem looks hard. The approach
of putting redzones betweens fields mentioned in that wiki page sounds
like it could still miss bugs as there will be some offsets that would
cause the redzones between fields to be missed. Doing that also
changes the size of objects which means you'll break ABI
compatibility. It's certainly better than nothing though.

Perhaps this limitation should be added to the Address Sanitizer FAQ?
I'd would add it myself but I don't have access.

Thanks,
Dan.