Avoiding crash from dereferencing description in unknown state of object or memory

Greetings,

I am attempting to get the description of a value of a register from a valid frame and thread if it exists as suggested here, using frame.FindRegister(“x4”).description. This works quite well if there is definitely no description, or if there is a real description on a valid object. There is an edge case where it causes an exited with status = 9 (0x00000009) Terminated due to signal 9, and I think it is this case described by Jim Ingham in which it isn’t a valid object and dereferencing it enters unknown behavior. This also happens if I get the register value using frame.EvaluateExpression(including with setting options.SetCoerceResultToId(False)) and then check the result for validity – however I get something I can call description on, there is a rare in-between behavior between “this definitely isn’t an object, no description” and “this is an object and here is it’s description“. I don’t think it’s about volatility because it’s totally deterministic for a known register in a known procedure, but I think it may be a memory management side effect or similar.

Something I have noticed is that if, instead of using my python script, I break on this same procedure where it crashes using the CLI lldb only, I can successfully p this crashing procedure’s x4 but po will deterministically crash it. This makes sense to me given the symptoms.

My question is whether there is anything I can test the register for before calling description in my lldb python script that will avoid this crash. Checking it for None is sufficient to cause the crash. The goal of the part of the python script I’m debugging is to get a register’s description when there is one and it’s safe to attempt. Thank you for any assistance.

Register values are returned from the SB API as SBValues. There are two ways you might get an error from an API that returns an SBValue. If the request is ill-formed in such a way that we can’t meaningfully give you an answer, then SBValue.IsValid() will return false. That is what you get back when you ask for a register that isn’t part of the current register set:



>>> var = lldb.frame.FindRegister("no such register")

>>> var.IsValid()

False


It’s undetermined what happens if you ask questions of a not valid SBValue, though nothing should crash. I couldn’t make lldb crash poking at a not-valid value so if you have a way to do so, please file that as a issue with a reproducer.

The other way you tell whether the SBValue was able to realize the value you asked it for is to check my_sb_value.error.success. If that is false, then my_sb_value.error.description will tell you why the value couldn’t be constructed, but that’s all we can tell you about it.
You can ask questions of this value. If you ask for its children you might get something back - for instance if the type was fine but the memory was unreadable, you can fetch its children, but they will be in the error state as well.

This is slightly orthogonal, but having IsValid was, in hindsight, a bad idea. Since SBValues have a way to report their error, we should always make valid SBValues, with the error set when that’s appropriate.
At some point it would be good to go through lldb and have us never return invalid SBValues, so IsValid is always true even for a default constructed SBValue, but its error description is “invalid”. It’s awkward to have to check two ways to see if your SBValue is good.

1 Like

Hi @jingham ,

Thank you for your answer. Here is the function I’m passing an x4 to which is possible to “p $x4“ in the repl, where you can see both validity checks:

def print_register_description(frame, reg_name):
    options = lldb.SBExpressionOptions()options.SetCoerceResultToId(False)

    result = frame.EvaluateExpression(f"${reg_name}", options)

    if result.IsValid() and result.GetError().Success():
        print(f"Value: {result.GetValue()}")
        print(f"Type: {result.GetType().GetName()}")
        print(f"Summary: {result.GetSummary()}")
        print(f"Description: {result.GetObjectDescription()}")
    else:
        print(f"Error: {result.GetError()}")

I’ve tried it with and without the SetCoerceResultToId. I also have a version just directly using FindRegister.

Many other register values with no description are passed to this and simply return a nil description, but this one and just the occasional others cause the process to exit. I’ve never seen any of the registers passed not be valid or not have a True error.success state in thousands of passes.

As you say, GetObjectDescription() works by calling the object’s description method - which is actually running code the target. For instance, for ObjC, we directly call _NSPrintForDebugger, which ends up calling [(id) <register_value> description]. But of course, most of the time <register_value> isn’t an ObjC or CF object, so that code is going to do something unpredictable. If the unpredictable thing it does is crash, that should be fine, lldb can clean up after the crash and reset the state as it was before. But register values also can hold recently dead objects (left over from a previous call) so they might get far enough to get to a component that decides that the thing you passed is in bad enough shape that it should abort or call exit directly, which lldb can’t clean up after. I don’t think there’s any way you can detect “pointer to some memory that has a valid ISA in the first slot and utter garbage after that.”

We probably should surround the calls to the Object Description code by setting breakpoints on exit and abort and some others that we can catch before too much damage is done, in which case we could clean up after them.

I don’t know that we would want to do that for all expressions, you might want to trigger a signal handler so you can debug into it, for instance. But it’s pretty clear no one would want an object description method to exit.

1 Like

OK, thanks for the clarity on it and taking the time! It sounds like my options are living with the exits (not a terrible alternative), building in more resiliency to auto-resume debugserver after an exit while blacklisting that register and procedure for calling description on (since it’s deterministic), or skipping the description call entirely (but when it works, it’s potentially rich info, so that would be a loss). I doubt I can do something to handle a signal 9.

What I’ve described here is my approach for approximate replicating po. Just to confirm my understanding while I get to discuss it with you, what would be the equivalent in this code of calling p on the registers?

pis currently an alias to dwim-print, which tries to look up the value in the local variables (basically what v does, and then if that fails, does expr of the expression. So p $x0 is just a slow way to do lldb.frame.FindRegister(“x0”).

po is “take the result of p and call GetObjectDescription() on the resultant SBValue.

I’m not sure about blacklisting a particular register. The GetObjectDescription only cares about the value, it doesn’t care where that value came from. However, if you can figure out the value that is causing this exit to happen, you could replay what GetObjectDescription does for ObjC by running:

(lldb) break set -n __exit

(lldb) expr -i 0 -u 0 – (char *) _NSPrintForDebugger((id) <VALUE>)

That should show you who is causing the object description function to exit, and might help you figure out how to avoid this. At least you could find the name of the class with the errant description method and then you could use that to blacklist the values that end up being objects of that class.

1 Like

Hi @jingham ,

Thanks again for the pointers and confirmation that my understanding of what p is currently doing is accurate. I couldn’t break on the exit because it was apparently a system termination, but your suggestion prompted me to think about whether there were heuristics for the types of procedures this was happening with, and I was able to avoid about 95% of the problematic calls to description in the script by not doing them for anything that could be ascertained in advance to be a Swift metadata lookup/type demangle, or a thunk or mostly-thunk-like procedure (all of which have the virtue of being uninteresting for the purposes of this specific dynamic analysis as well). I don’t mind the occasional unplanned exit from this tool if most of them are sidestepped in advance. Appreciate your talking it through with me!