ASan LLDB debugging extensions

Hi,

I’m trying to create a better support for debugging ASan-enabled binaries in LLDB. I already started by proposing some API into the ASan runtime library (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-July/074656.html), which should enable the debugger to query various additional information the runtime can provide. Basically this means:

  • malloc/free traces - given a memory address, the ASan runtime can return recorded stack trace(s) of how that chunk of memory was allocated and/or freed.

  • shadow mapping information - these say how exactly is a memory address mapped into the shadow memory and back.

  • locating a memory address - ASan tracks globals and stack variables, so it can provide a name (and size) given such a memory address; for heap addresses it can give out the starting address and size of that chunk.

  • gathering report information - when ASan detects an error, the reporting mechanism can provide additional information, e.g. what kind of bug was found (“heap-use-after-free”).

For the malloc/free stack traces, it seems the best way to add this feature would be to extend the ValueObject class with a generic API to retrieve a list of HistoryThread objects, with some additional enum/constant-string to tell the type of individual threads. Something like:

ThreadList &
ValueObject::GetStackTraces() { … }

The API for this should be reusable for other libraries/tools, for example malloc_history could provide a very similar information. Since I want this to be available in the SB API as well, Python scripting seems not to be the way to go.

The goal is to have ASan-aware LLDB commands, such as:

(lldb) expr -x 0xf00f00
// prints out the value of the expression, and if it’s a pointer also
// prints the malloc and free stack traces
(lldb) memory read --shadow 0xf00f00
// prints out the corresponding shadow memory instead
(lldb) memory locate 0xf00f00
// says it’s a stack variable with name “foo”, size, starting address

I’ll send patch(es) shortly, but do you have any comments/hints on the idea in general?

Thanks,
Kuba

Hi,
I have to say I am not so sure this API makes entirely sense on a ValueObject. I do see the rationale: use the ValueObject containing the pointer as the source of the pointer value to then look up.
However, it’s entirely possible to craft VOs from scratch, so there is no additional guarantee that you will get a valid result back just because a VO claims to be a Foo* at 0x123! Given that it doesn’t buy us anything, a VO is a pretty heavyweight tool and it seems you wouldn’t use anything but an address out of it. So why not just use an address as input?

I think Jason has been working on a SystemRuntime component for this kind of “underlying execution environment, tell me things” behavior.
It might be a good design here to sync up with him and see if it makes sense to add these ASanInfo calls to the SR, possibly taking an lldb::addr_t (or an lldb_private::Address, if you feel you might need that kind of detailed info) instead of a ValueObject (since an address is what you really care about anyway, it makes everyone’s life easier not having to trade in a complicated creature like a VO)

These would be API's on SBValue, not necessarily ValueObject. The problem is that you have to hang them off something that the SB API's expose. We aren't exposing the SystemRuntime to the SB API's. For instance in the stuff Jason did for Queues, this just got hung off of threads. I think in general the APIs will be clearer if we, like Jason did, we hang information off the objects they naturally belong to rather than expose to the SB layer "how we got them..."

So we could either do something like:

size_t
SBProcess::GetNumHistoryFramesForAddress(SBAddress, const char *type);

size_t
SBProcess::GetHistoryFrameAtIndexForAddress(SBAddress, const char *type, size_t idx);

etc. Then you would get the history list passing in the address each time. That's kind of gross. We could make this a little better by returning an SBThreadList (what we ain't got yet).

Or we could hang it off of an address directly. But we want this to be convenient for Xcode to use as well, and it deals mostly with SBValues, so making it grab out an address seems awkward.

So since the SBValue is the thing that will have an allocation history, it seems reasonable to ask it directly. If it was made up to point to a random address, and that address has ASAN history, I'm not sure why it would be wrong to return that. And if the address had no history it would be fine to say so.

Jim

These would be API's on SBValue, not necessarily ValueObject.

That's fine. I was mostly concerned about the internal layer. It just didn't seem to me that a ValueObject would be the guy to ask that kind of question to.

The problem is that you have to hang them off something that the SB API's expose. We aren't exposing the SystemRuntime to the SB API's. For instance in the stuff Jason did for Queues, this just got hung off of threads. I think in general the APIs will be clearer if we, like Jason did, we hang information off the objects they naturally belong to rather than expose to the SB layer "how we got them..."

So we could either do something like:

size_t
SBProcess::GetNumHistoryFramesForAddress(SBAddress, const char *type);

size_t
SBProcess::GetHistoryFrameAtIndexForAddress(SBAddress, const char *type, size_t idx);

etc. Then you would get the history list passing in the address each time. That's kind of gross. We could make this a little better by returning an SBThreadList (what we ain't got yet).

It would probably be easy to make an SBThreadList for that matter
Also, what do you mean by type here? malloc/free? We should probably if at all have an enum here to describe "memory management event" rather than a char*

In general, I don't hate this process-based scheme. Address spaces are a property of a process so it seems to fit neatly within a textbook description of memory management at the process level.

Or we could hang it off of an address directly. But we want this to be convenient for Xcode to use as well, and it deals mostly with SBValues, so making it grab out an address seems awkward.

Not sure if it's exposed to SBValue but we do have a GetPointerValue() API on ValueObject that tends to do the right thing. But, again, I was more interested in why hanging the API off our internal ValueObject than the public SBValue.

So since the SBValue is the thing that will have an allocation history, it seems reasonable to ask it directly. If it was made up to point to a random address, and that address has ASAN history, I'm not sure why it would be wrong to return that.

It wouldn't. My argument was more along the lines of "don't use a ValueObject just because you hope it won't ever contain a bogus address". Whatever source you use to acquire it (random number generators included) if you give a valid address, you should get valid info back.

And if the address had no history it would be fine to say so.

Yes, totally.

(resend to include lldb-dev)

i don’t know if you are the person to ask, but git says you wrote TestMemoryHistory.py, so i thought i’d start with you.

when i run unit test TestMemoryHistory.py on OSX, the unit test won’t compile/link and complains that libclang_rt.asan_osx_dynamic.dylib can’t be found

if i remove "-fsanitize=address " (added from the Makefile in that directory) from the compile/link, the unit test builds fine

is there something i need to do to make this unit test work on OSX?

were you able to get this unit test to build on OSX?

is the install/build of compiler-rt a special step needed just for OSX?

i don’t seem to need a special build of compiler-rt to get TestMemoryHistory.py to compile on ubuntu.

also, http://lldb.llvm.org/build.html does not seem to list this as one of the requirements needed to build lldb

since lldb does not appear to assume that asan is present, maybe the unit test should be skipped if the asan library is not installed/present

doug

in the long run, that makes sense.

currently, i’m trying to figure out why TestMemoryHistory.py is not working on Ubuntu. I was trying to see how TestMemoryHistory.py functioned on OSX for comparison. for now, i may try to install compiler-rt on OSX, so that unit test will compile on OSX

on Ubuntu, TestMemoryHistory.py is compiling and running, but does not seem to be relaunching the process to insert its library. the first “thread list” after the initial “run” is reporting that the “stop reason = breakpoint”, not “stop reason = exec”

doug