Stepping in a breakpoint callback with Python

Hello,

I’m trying to figure out how to write a simple tracer. Specifically, I want to
perform some 'step’s after a breakpoint callback fires. How do I do that in
async mode?

Here’s my attempt:
https://gist.github.com/nkaretnikov/6ee00afabf73332c5a89eacb610369c2

The problem is that pc is not updated (pc: 0xffffffffffffffff) when the loop
executes the second time.

I’ve also attempted to refactor this code based on:
http://llvm.org/svn/llvm-project/lldb/trunk/examples/python/process_events.py

My idea was to have a similar loop as the one that’s after ‘listener = …’. It
would sit there waiting for events, and I would somehow broadcast an event from
the callback. Maybe with ‘state == lldb.eStateStopped’ and a global variable
allowing me to distinguish this event type from the rest.

But I can’t find a way to do this. Either I get the same behavior as without
events or my events are ignored completely (likely because I use them
incorrectly).

I haven’t published the code of the event version, but I can do that later
unless someone proposes a working solution right away.

Thanks,
Nikita

lldb doesn’t currently support breakpoint callbacks that try to control running the target in the callback. I’m not sure how easy it would be to support that reliably, but anyway, that doesn’t work at present.

My model for this sort of tracing activity is that you are writing a fancy kind of “step” operation. You would write a fancy step & record plan that would proceed along however you wanted and log at each stop. Then your breakpoint callback would just queue up this step-and-trace plan as its last act. That’s what the Scripted ThreadPlans are for:

https://lldb.llvm.org/use/python-reference.html#using-the-python-api-to-create-custom-stepping-logic

I’m not quite sure what you are describing with the process events approach. Are you trying to do this while also running in the lldb driver, or can you write a stand-alone tool to do this? If the latter, then it should be enough to program everything at the event loop level. You will know when your breakpoint is hit, then you can issue the steps one by one and do whatever logging you want each time you stop. I’m not sure why this would need events sent from the breakpoint callback.

Jim

Jim,

My model for this sort of tracing activity is that you are writing a fancy kind
of “step” operation. You would write a fancy step & record plan that would
proceed along however you wanted and log at each stop. Then your breakpoint
callback would just queue up this step-and-trace plan as its last act. That’s
what the Scripted ThreadPlans are for:

https://lldb.llvm.org/use/python-reference.html#using-the-python-api-to-create-custom-stepping-logic

How do I queue up the thread plan in the callback?

Here’s my attempt:
https://gist.github.com/nkaretnikov/6ee00afabf73332c5a89eacb610369c2#file-scripted_step-py

This is based on:
https://github.com/llvm/llvm-project/blob/master/lldb/examples/python/scripted_step.py

The original ‘SimpleStep’ class implementation works when run as:
thread step-scripted -C scripted_step.SimpleStep

Not sure if my current version of ‘SimpleStep’ is correct (I’ve made a few
changes since testing via ‘thread step-scripted’), but nothing happens (no
prints on ‘init’) when I add the class via ‘StepUsingScriptedThreadPlan’ in
the callback.

What’s the proper way to do this?

Or do I need to mimic one of the classes with a ‘thread_plan.QueueThreadPlan*’
method call? I couldn’t figure out how to trace with anything besides the
‘SimpleStep’ class when running via ‘thread step-scripted’, so I stuck with it.

I’m not quite sure what you are describing with the process events approach.
Are you trying to do this while also running in the lldb driver, or can you
write a stand-alone tool to do this? If the latter, then it should be enough
to program everything at the event loop level. You will know when your
breakpoint is hit, then you can issue the steps one by one and do whatever
logging you want each time you stop. I’m not sure why this would need events
sent from the breakpoint callback.

Nevermind the events thing, then. I just run the lldb command and then do:
command script import $FILE

Ideally, I want to keep it this way so I could debug manually if needed.

Thanks,
Nikita

Jim,

> My model for this sort of tracing activity is that you are writing a fancy kind
> of “step” operation. You would write a fancy step & record plan that would
> proceed along however you wanted and log at each stop. Then your breakpoint
> callback would just queue up this step-and-trace plan as its last act. That’s
> what the Scripted ThreadPlans are for:
>
> https://lldb.llvm.org/use/python-reference.html#using-the-python-api-to-create-custom-stepping-logic

How do I queue up the thread plan in the callback?

Here's my attempt:
https://gist.github.com/nkaretnikov/6ee00afabf73332c5a89eacb610369c2#file-scripted_step-py

This is based on:
https://github.com/llvm/llvm-project/blob/master/lldb/examples/python/scripted_step.py

The original 'SimpleStep' class implementation works when run as:
thread step-scripted -C scripted_step.SimpleStep

Not sure if my current version of 'SimpleStep' is correct (I've made a few
changes since testing via 'thread step-scripted'), but nothing happens (no
prints on '__init__') when I add the class via 'StepUsingScriptedThreadPlan' in
the callback.

What's the proper way to do this?

What happens? Did you look at the "step" log to see what lldb thought it was doing - i.e. "log enable -f /tmp/lldb-step-log.txt lldb step". Also, what is lldb doing then nothing happens?

One thing I've recently found (in some other work I've been doing) is that if you run the step asynchronously, then the call that ran the thread plan still holds the API lock, and any calls the scripted thread plan makes block against the other plan holding the lock.

We already have a workaround for getting the "run lock" in Python code run for internal purposes. That is a bit of a hack, using the notion that the internal state thread is always running code on behalf of the agent that had the API lock - even though it is running on a separate thread. So we make a separate run lock for the internal state thread, and use that for code running on that thread. We probably need to do the same thing for the Target API lock.

Or do I need to mimic one of the classes with a 'thread_plan.QueueThreadPlan*'
method call? I couldn't figure out how to trace with anything besides the
'SimpleStep' class when running via 'thread step-scripted', so I stuck with it.

Both the command version (step-scripted) and the Python API will queue the initial plan for you. You shouldn't need to do that.

Jim

Jim,

Not sure if my current version of ‘SimpleStep’ is correct (I’ve made a few
changes since testing via ‘thread step-scripted’), but nothing happens (no
prints on ‘init’) when I add the class via ‘StepUsingScriptedThreadPlan’ in
the callback.

What’s the proper way to do this?

What happens? Did you look at the “step” log to see what lldb thought it was
doing - i.e. “log enable -f /tmp/lldb-step-log.txt lldb step”. Also, what is
lldb doing then nothing happens?

Apologies, by “nothing” I meant that the class code doesn’t seem to be invoked.
Here’s the output:

(lldb) log enable -f /tmp/lldb-step-log.txt lldb step
(lldb) breakpoint delete
About to delete all breakpoints, do you want to do that?: [Y/n] y
All breakpoints removed. (1 breakpoint)
(lldb) command script import scripted_step.py
Process 37849 stopped

  • thread #1, stop reason = signal SIGSTOP
    frame #0: 0x000000010255e000 dyld_dyld_start dyld_dyld_start:
    → 0x10255e000 <+0>: pop rdi
    0x10255e001 <+1>: push 0x0
    0x10255e003 <+3>: mov rbp, rsp
    0x10255e006 <+6>: and rsp, -0x10
    (lldb) trace
    trace done
    Hello, world!
    Process 37849 exited with status = 0 (0x00000000)

Note that the “trace” messages are printed, but nothing from the class in
between.

I’m also attaching the lldb step log for this run. The SimpleStep class is
referenced in the log, but it’s hard for me to understand what’s going on since
I’m not familiar with the internals of lldb. Do you see the problem?

Have you tried reproducing this? debugserver just runs a “Hello, world!” program:

debugserver localhost:8000 main

One thing I’ve recently found (in some other work I’ve been doing) is that if
you run the step asynchronously, then the call that ran the thread plan still
holds the API lock, and any calls the scripted thread plan makes block against
the other plan holding the lock.

We already have a workaround for getting the “run lock” in Python code run for
internal purposes. That is a bit of a hack, using the notion that the internal
state thread is always running code on behalf of the agent that had the API
lock - even though it is running on a separate thread. So we make a separate
run lock for the internal state thread, and use that for code running on that
thread. We probably need to do the same thing for the Target API lock.

If it turns out to be relevant, will you be able to share this patch? If you
don’t want to add it to the tree, can you at least share it on the list?

Thanks,
Nikita

lldb-step-log.txt (6.71 KB)

Nikita,

There was a lot of other stuff in that scripted_step.py beyond the plan class itself. I didn't follow what that did altogether. But if I removed all that stuff, and changed your plan slightly to not rely on the global TARGET and START_ADDRESS (I attached my simplified version below), then I can do either:

> lldb main -o "command script import /tmp/scripted_step.py" -o "break set -n main" -o run
(lldb) target create "main"
Current executable set to 'main' (x86_64).
(lldb) command script import /tmp/scripted_step.py
(lldb) break set -n main
Breakpoint 1: where = main`main + 15 at main.c:5:5, address = 0x0000000100000f5f
(lldb) run
Process 93318 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x0000000100000f5f main`main at main.c:5
   2
   3 int main()
   4 {
-> 5 printf("Hello, world!\n");
          ^
   6 return 0;
   7 }
   8
Target 0: (main) stopped.

Process 93318 launched: '/tmp/main' (x86_64)
(lldb) thread step-scripted -C scripted_step.SimpleStep
init

init done

should step

should stop

0x100000f66: mov al, 0x0 ; main:5

should stop done: False

should step

explains stop

explains stop done: True

should stop

0x100000f68: call 0x100000f7a ; main:5

should stop done: False

should step

explains stop

explains stop done: True

should stop

should stop done: True

Process 93318 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = Python thread plan implemented by class scripted_step.SimpleStep.
    frame #0: 0x0000000100000f7a main`printf
main`printf:
    0x100000f7a <+0>: jmpq *0x1080(%rip) ; (void *)0x0000000100000f90
    0x100000f80: leaq 0x1081(%rip), %r11 ; _dyld_private
    0x100000f87: pushq %r11
    0x100000f89: jmpq *0x71(%rip) ; (void *)0x00007fff69523184: dyld_stub_binder
Target 0: (main) stopped.

or using the Python API:

(lldb) script lldb.thread.StepUsingScriptedThreadPlan("scripted_step.SimpleStep")
init

init done

should step
...

with the same results. Does that simple use of the scripted plan also work for you?

Jim

scripted_step.py (2.02 KB)

Jim,

Does that simple use of the scripted plan also work for you?

No, that doesn’t work for me. My original script does more stuff, so I need to
pass the state via global variables.

However, I think I’ve figured out a way to do what I want:
https://gist.github.com/nkaretnikov/75c8a0b814246b322d562e7333eb4ba9

Let me know if you find a way to do this via the Python API.

Thanks,
Nikita

So the problem you are having is getting the scripted step started in "trace" with the SBThread::StepUsingScriptedStep? Did you try passing False for "resume_immediately" and then returning False from "trace".

The explanation for:

    # XXX: Doesn't seem to matter with 'thread step-scripted'.
    # return False # continue
    return True # stop

is that the return value from the breakpoint command only matters if you haven't already started the process running again, which you have with the "thread step-scripted" command. But if you pass resume_immediately -> False to StepUsingScriptedStep, then that will just queue the thread plan but not start the process going again. Then the restart will be in a more orderly place in lldb.

All the commands that can resume the process really need to have a "do it now" and a "request to do this" mode. In breakpoint commands and the like, they should use the latter mode (a) so we can do a better job of arbitrating "I hit two breakpoints, A wants to continue but B wants to stop" and (b) so we don't have to deal with the complexities of restarting when we aren't quite done with handling the last stop. But for now you have to be a little careful in breakpoint callbacks.

Jim

Jim,

So the problem you are having is getting the scripted step started in “trace”
with the SBThread::StepUsingScriptedStep? Did you try passing False for
“resume_immediately” and then returning False from “trace”.

Yes, I did. In fact, I tried all these combinations:

resume_immediately = True, return True
resume_immediately = True, return False
resume_immediately = False, return True
resume_immediately = False, return False

None of them produces the trace.

I think the ‘thread step-scripted’ hack should work for my usecase, but I yet
have to try it. I’ll let you know if I hit some limitation again.

Thanks,
Nikita