How to tell if an address belongs to the heap?

Hi everyone,

I am looking for a way to tell whether a memory address belongs to the heap or not.

In other words, I would like to make sure that the address does not reside within any stack frame (even if the stack of the thread has been allocated in the heap) and that it’s not a global variable or instruction.

Checking whether it is a valid or correctly allocated address or a memory-mapped file or register is not a goal, so accessing it in order to decide, at the risk of causing a segmentation fault, is an accepted solution.

I have been thinking of manually checking the address against the boundaries of each active stack frame, the start and end of the instruction segment and the locations of all global variables.

However, I would like to ask where there are better ways to approach this problem in LLDB.

Thank you very much, advance! :slightly_smiling_face:

― Vangelis

In general, getting this kind of information is pretty hard, so lldb does not offer you an out-of-the-box solution for it, but it does give you tools which you can use to approximate that.

If I wanted to do something like this, the first thing I’d try to do is run “image lookup -a 0xaddr”. If this doesn’t return anything then the address does not correspond to any known module. This rules out code, global variables, and similar. Then you can run through all of the threads and do a “memory region $SP”, which will give you bounds of the memory allocation around the stack pointer. If your address is in one of these ranges, then it’s a stack address. Otherwise, it’s probably heap (though you can never be 100% sure of that).

However, it’s not fully clear to me what it is that you’re trying to do here. Maybe if you explain the higher level problem that you’re trying to solve, we can come up with a better solution.

pl

Thank you for your thorough and timely response, Pavel! :slightly_smiling_face:

Your suggestions might actually cover completely what I am attempting to achieve.

Unfortunately, I am not able to disclose the exact reason I need it, but I want to track all heap writes, in order to detect modifications in the heap and save both the old and the newly written value.

For now, this translates to tracking common x86 assembly instructions (mov{l, w, d, q}) for a single thread ―supporting more “exotic” instructions like SIMD, multiple architectures or threads is not currently a goal.

Another method could also be an LLVM instrumentation pass, however I would like to avoid recompiling and modifying the binary, thus I focus on LLDB, even if I end up missing a few writes that way.

I was initially looking for a more complete, cross-platform solution (see: http://lists.llvm.org/pipermail/llvm-dev/2019-November/136876.html), but the solution proved to be too time consuming for the timeframe I have available for my master’s (ending in March).

― Vangelis

Thanks for the explanation, Vangelis.

It sounds like binary instrumentation would be the best approach for this, as this is pretty much exactly what msan does. If recompilation is not an option, then you might be able to get something to work via lldb, but I expect this to be incredibly slow (like 1000x, or more). One thing I might consider in your place is some kind of a in-process solution. For instance, if you intercept mmap (via LD_PRELOAD or something) then you could set it map all anonymous memory (aka heap) as read-only. This way you’ll get a SIGSEGV everytime somebody tries to write to that address. You could intercept that signal and do your analysis there. Assuming heap writes are not very common, this might even give you a reasonable performance.

But this is not going to be super easy either. The trickiest part here will be resuming the program – you’ll need to remap the page read-write, do a single step, and then set it to read-only again.

pl

One important thing I forgot to mention in my previous email (although I thought I had done so) is that I am using LLDB to execute the target in single-step mode, thus I am already incurring the 1000x slowdown. Given that, the extra processing comes practically for free.

In addition, while I currently focus on Darwin on x86-64, I would prefer to make decisions that lead to a cross-{architecture, language, platform} solution, ideally without affecting the binary.

Regarding your mmap() interception suggestion, I had also considered it, but thought that it would require a kernel driver for handling the page faults of the process in order to function properly, since LD_PRELOAD / DYLD_INSERT_LIBRARIES wouldn’t work for programs that use syscalls directly or statically link with libc.

I believe that the initial solution, aka using “image lookup” and “memory region $sp”, would better fulfil my current requirements, so I am going to give that a try.

Last but not least, I would like to mention that I’ve found your insights extremely helpful and really appreciated your willingness to help me, so thank you one more time! :blush:

― Vangelis