[RFC] [PATCH] Fix for sys::Process::GetMallocUsage() when using ptmalloc2 allocator in glibc

Hi,

This is a proposed fix for bug 16847 [1]. It is essentially the patch
provided by Martin Nowack in the bug report but I have added a test
case and an ifdef macro around the modified code in
Process::GetMallocUsage()

The issue seems to be that the ptmalloc2 allocator used in glibc (in
my case 2.19) does not include mmap()'ed memory in mallinfo.uordblks
and so mallinfo.hblkhd needs to be added to the value that
Process::GetMallocUsage() will return.

Another easy way to see this issue (other than running the unit test)
is to build and run the attached "malloc-test.c" program on a system
that uses glibc. You will see that after malloc'ing 256MiB of memory
uordblks is still zero but hblkhd has increased.

This is an RFC because I'm not very happy with my use of...

#if defined(__GLIBC__)

This will work in the common case where glibc is being used with its
standard allocator. However it will completely fail in the case where
LLVM is being used as a library and a developer chooses to use a
different allocator. For example if tcmalloc [2] is used then the
calculation is wrong[3] because mallinfo.uordblks includes mmap()'ed
memory and so we would count mmap()'ed memory twice!

Thoughts?

[1] http://llvm.org/bugs/show_bug.cgi?id=16847
[2] TCMalloc : Thread-Caching Malloc
[3] https://github.com/klee/klee/pull/116

Thanks,

0001-Try-to-fix-the-value-returned-by-Process-GetMallocUs.patch (1.86 KB)

malloc-test.c (503 Bytes)

The problem with tcmalloc seems like a real problem. I can’t think of any good workarounds. My best worst idea is to try to figure out if malloc is coming from libc with dlsym and dlopen, and then use that to decide whether we add these two numbers together.

Thanks for the reply.

The problem with tcmalloc seems like a real problem. I can't think of any
good workarounds. My best worst idea is to try to figure out if malloc is
coming from libc with dlsym and dlopen, and then use that to decide whether
we add these two numbers together.

I'm not sure we can always assume libc is a dynamic library. A user
can statically link to libc if they really want to. For the static
case something like libbfd could help provided the symbol table was
still around.

It seems to me whatever we do is going to incur some overhead. Calling
dlsym() sounds a little bit expensive so we might be best off doing
the test on the first call to sys::Process::GetMallocUsage() and
storing a static variable that indicates the result of the test so we
don't need to do it again on subsequent calls to GetMallocUsage().
This would mean if an application decided to change its allocator
during execution GetMallocUsage() might break but changing the
allocator during execution sounds insane anyway, so hopefully no one
does that.

I think the static libc case breaks the dlsym trick. The sanitizers, for
example, provide a statically linked malloc implementation.

It's probably OK to have one-time startup overhead. We only use this for
-time-passes. So... allocate 10mb and see how the counters update? =D

It's probably OK to have one-time startup overhead. We only use this for
-time-passes. So... allocate 10mb and see how the counters update? =D

That is an option but that itself has problems...

- The test would be dependent on the threshold used by the allocator
to decided whether or not to use mmap(). If we pick a too large amount
of memory to allocate users might get annoyed. If we pick too small
amount of memory to allocate then changes to the allocator might break
the test.

- Thread safety. In a multi-threaded application there is a small
chance that additional mallocs (other than our own) may happen between
the first read from mallinfo() and the second read. If those
additional mallocs internally use mmap() there's no problem
(mallinfo.uordblks won't increase) but if they use sbrk()
mallinfo.uordblks will increase and possibly invalidate the test.
Maybe this isn't worth worrying about...

Attached is a rough draft of what you are suggesting. 10MiB wasn't
enough on my system to trigger mmap() usage but 20MiB was (the flaws
in our approach are already showing!). I wouldn't suggest you commit
this but feel free to try out the patch to see how it performs. It
"works fine for me" but I have no idea about other systems.

We probably ought to get the glibc devs to try and fix this...

0001-Try-to-fix-llvm-sys-Process-GetMallocUsage-under-all.patch (3.77 KB)