Lldb unable to break at _start

For example:

$ cat a.c
int main() { return 0; }
$ clang -static a.c -o a.out
$ lldb a.out -o "b _start" -o "r" 
(lldb) target create "a.out"
Current executable set to '/tmp/a.out' (x86_64).
(lldb) b _start
Breakpoint 1: where = a.out`_start, address = 0x0000000000401630
(lldb) r
Process 3302100 launched: '/usr/local/google/home/zequanwu/workspace/tmp/a.out' (x86_64)
Process 3302100 exited with status = 0 (0x00000000)

By comparing the gdb remote logging with lldb a.out -o "log enable gdb-remote packets" -o "process launch --stop-at-entry" which does stop at _start, the one with b _start has the following extra logging at the end:

lldb             < 111> send packet: $vRun;2f7573722f6c6f63616c2f676f6f676c652f686f6d652f7a657175616e77752f776f726b73706163652f746d702f612e6f7574#4f
lldb             < 617> read packet: $T13thread:p3263b9.3263b9;name:a.out;threads:3263b9;thread-pcs:0000000000401630;00:0000000000000000;01:0000000000000000;02:0000000000000000;03:0000000000000000;04:0000000000000000;05:0000000000000000;06:0000000000000000;07:80d5ffffff7f0000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;0f:0000000000000000;10:3016400000000000;11:0002000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;reason:signal;#0b
lldb             <  16> send packet: $qProcessInfo#dc
lldb             < 182> read packet: $pid:3263b9;parent-pid:326398;real-uid:9d619;real-gid:15f53;effective-uid:9d619;effective-gid:15f53;triple:7838365f36342d2d6c696e75782d676e75;ostype:linux;endian:little;ptrsize:8;#a3
lldb             <   5> send packet: $?#3f
lldb             < 617> read packet: $T13thread:p3263b9.3263b9;name:a.out;threads:3263b9;thread-pcs:0000000000401630;00:0000000000000000;01:0000000000000000;02:0000000000000000;03:0000000000000000;04:0000000000000000;05:0000000000000000;06:0000000000000000;07:80d5ffffff7f0000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;0f:0000000000000000;10:3016400000000000;11:0002000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;reason:signal;#0b
(the process firstly stopped at _start which is 0x401630 with stop reason being signal)
...
lldb             <  15> send packet: $Z0,401630,1#41
lldb             <   6> read packet: $OK#9a
lldb             <  25> send packet: $qThreadStopInfo3263b9#64
lldb             < 617> read packet: $T13thread:p3263b9.3263b9;name:a.out;threads:3263b9;thread-pcs:0000000000401630;00:0000000000000000;01:0000000000000000;02:0000000000000000;03:0000000000000000;04:0000000000000000;05:0000000000000000;06:0000000000000000;07:80d5ffffff7f0000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;0f:0000000000000000;10:3016400000000000;11:0002000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;reason:signal;#0b
lldb             <  15> send packet: $z0,401630,1#61
lldb             <   6> read packet: $OK#9a
b-remote.async>  <  26> send packet: $vCont;s:p3263b9.3263b9#62
b-remote.async>  < 616> read packet: $T05thread:p3263b9.3263b9;name:a.out;threads:3263b9;thread-pcs:0000000000401632;00:0000000000000000;01:0000000000000000;02:0000000000000000;03:0000000000000000;04:0000000000000000;05:0000000000000000;06:0000000000000000;07:80d5ffffff7f0000;08:0000000000000000;09:0000000000000000;0a:0000000000000000;0b:0000000000000000;0c:0000000000000000;0d:0000000000000000;0e:0000000000000000;0f:0000000000000000;10:3216400000000000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:0000000000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;reason:trace;#ab
intern-state     <  15> send packet: $Z0,401630,1#41
intern-state     <   6> read packet: $OK#9a
b-remote.async>  <  22> send packet: $vCont;c:p3263b9.-1#47
b-remote.async>  <  22> read packet: $W00;process:3263b9#94
Process 3302329 launched: '/tmp/a.out' (x86_64)
Process 3302329 exited with status = 0 (0x00000000)
(lldb) exit

It looks like because it’s using step-over breakpoint thread plan if it finds a breakpoint with the same address as current thread $pc address: llvm-project/lldb/source/Target/Thread.cpp at d8f1e5d2894f7f4edc2e85e63def456c7f430f34 · llvm/llvm-project · GitHub

If I change that line to push_step_over_bp_plan = GetStopReason() != eStopReasonSignal; to stop it from using step-over breakpoint plan if stopped by signal, it’s now able to break to _start. But it causes behaviour change (if the thread stopped at signal, it requires two n commands to move $pc to next instruction) on in the following example:

$ cat signal.cpp
#include <csignal>

namespace {
    volatile std::sig_atomic_t gSignalStatus;
}

void signal_handler(int signal) {
    gSignalStatus = signal;
}

int main() {
    // Install a signal handler
    std::signal(SIGINT, signal_handler);
    std::raise(SIGINT);
}
$ clang++ -static -g signal.cpp -o signal.static
$ lldb signal.static -o "log enable gdb-remote packets" -o "b main" -o "r" -o "b -a 0x404f1c" -o "c" -o "n"
...
Process 3305968 resuming
Process 3305968 stopped
* thread #1, name = 'signal.static', stop reason = signal SIGINT
    frame #0: 0x0000000000404f1c signal.static`__pthread_kill_implementation.constprop.0 + 252
signal.static`__pthread_kill_implementation.constprop.0:
->  0x404f1c <+252>: movl   %eax, %ebx
    0x404f1e <+254>: negl   %ebx
    0x404f20 <+256>: cmpl   $0xfffff000, %eax ; imm = 0xFFFFF000
    0x404f25 <+261>: movl   $0x0, %eax
(lldb) n
b-remote.async>  <  26> send packet: $vCont;s:p3271f0.3271f0#56
b-remote.async>  < 629> read packet: $T05thread:p3271f0.3271f0;name:signal.static;threads:3271f0;thread-pcs:0000000000404f1c;00:0000000000000000;01:f071320000000000;02:1c4f400000000000;03:0200000000000000;04:f071320000000000;05:f071320000000000;06:b0d3ffffff7f0000;07:60d3ffffff7f0000;08:00d3ffffff7f0000;09:1c00000000000000;0a:0800000000000000;0b:4602000000000000;0c:78d5ffffff7f0000;0d:0200000000000000;0e:0100000000000000;0f:0100000000000000;10:1c4f400000000000;11:4602000000000000;12:3300000000000000;13:0000000000000000;14:0000000000000000;15:2b00000000000000;16:80734a0000000000;17:0000000000000000;18:0000000000000000;19:0000000000000000;reason:breakpoint;#01
intern-state     <  21> send packet: $x7fffffffd200,200#2d
intern-state     < 516> read packet: $00000000000000006728470000000000e0d3ffffff7f000050d3ffffff7f000048d3ffffff7f00003dc2a0010000000008804a00000000000100000000000000000000000000000000f52994343bdd840000000000000000984040000000000060174000000000000200000000000000000000000000000000000000000000000000000000000000672847000000000098834a000000000000be490000000000e07f4a0000000000538f306800000000d8d3ffffff7f000010ea420000000000000000000000000000be4900000000000000000000000000000000000000000000000000000000000000001000000000000000000000000048d3ffffff7f000000000000000000000000000000000000040000000000000000d3fff7ff7f0000e07f4a000000000078d2ffffff7f000074d2ffffff7f0000000000000000000000000000000000006728470000000000e0d3ffffff7f00000e4f40000000000048d3ffffff7f000000f52994343bdd8408804a00000000000200000000000000b0d3ffffff7f000078d5ffffff7f0000e8bd490000000000024140000000000088d5ffffff7f00009f174000000000000100000000000000641c40000000000000114000000000008017400000000000711540000100000078d5ffffff7f000088d5ffffff7f0000b5c7b90063d96e7a78d5ffffff7f0000e8bd490000000000#ec
lldb             <  16> send packet: $jThreadsInfo#c1
lldb             < 163> read packet: $[{"name":"signal.static","reason":"breakpoint","registers":{"16":"1c4f400000000000","6":"b0d3ffffff7f0000","7":"60d3ffffff7f0000"}],"signal":5,"tid":3305968}]]#66
lldb             <  15> send packet: $x404e00,200#93
lldb             < 518> read packet: $0000004883c4085b5d415c415d415e415fc3c3662e0f1f8400000000000f1f0041554189f5415455534883ec1864488b042528000000488944240831c064483b3c25100000000f84b40000004989e44889fb31ff41ba080000004c89e2488d356c370700b80e0000000f0531c0488dab04090000ba01000000f00fb155000f85ac00000080bb0109000000744b31db31c087450083f8010f8fa300000041ba0800000031d24c89e6bf02000000b80e0000000f05488b44240864482b0425280000000f85850000004883c41889d85b5d415c415dc30f1f008b9bd0020000e8edb800004489ea89c789deb8ea0000000f053d00f0ffff769589c3f7dbeb916690b8ba0000000f0589c3e8c2b800004489ea89de89c7b8ea0000000f0589c3f7db3d00f0ffffb8000000000f46d8eb85904889efe8a8fcffffe947ffffff0f1f004889efe848fdffffe950ffffffe88ec7000066662e0f1f8400000000000f1f00e9bbfeffff66662e0f1f8400000000008d46e083f8017608e9a3feffff0f1f00b816000000c3662e0f1f84000000000064488b042510000000c3660f1f440000488b07c705dba10900010000004889059ca10900c366662e0f1f840000000000488b07c705bba10900010000008905ada10900c366662e0f1f84000000000090488b07c7059ba109000100000048890554a10900c366662e0f1f840000000000#3e
Process 3305968 stopped
* thread #1, name = 'signal.static', stop reason = breakpoint 2.1
    frame #0: 0x0000000000404f1c signal.static`__pthread_kill_implementation.constprop.0 + 252
signal.static`__pthread_kill_implementation.constprop.0:
->  0x404f1c <+252>: movl   %eax, %ebx
    0x404f1e <+254>: negl   %ebx
    0x404f20 <+256>: cmpl   $0xfffff000, %eax ; imm = 0xFFFFF000
    0x404f25 <+261>: movl   $0x0, %eax
(The first `n` doesn't move $pc because it's also a breakpoint address)
...

@jingham Do you have any thought on this?

@jingham asked me to look at a related problem, where (in general) when we are stopped at a breakpoint, we declare that the breakpoint has been hit, regardless of whether it has actually been executed or not yet. For instance,

(lldb) dis -c 3 -s $pc
a.out`main:
->  0x100003f08 <+28>: adrp   x0, 0
    0x100003f0c <+32>: add    x0, x0, #0xf9b ; "HI1"
    0x100003f10 <+36>: bl     0x100003f8c    ; symbol stub for: puts

(lldb) br s -a 0x100003f0c
 <  18> send packet: $Z0,100003f0c,4#33

(lldb) si
 <  18> send packet: $vCont;s:1003e7#52
 < 914> read packet: $T05thread:1003e7;threads:1003e7;thread-pcs:100003f0c; [...] metype:6;mecount:2;medata:1;medata:0;

{instruction step has completed, we are at pc 0x100003f0c}

* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 2.1

The actual reason we stopped was instruction-step. But when we see a breakpoint at stopped pc value, that stop reason overrides whatever actual stop reason there was and the thread claims that’s why it stopped. Why do this? You can see me adding this behavior in one of our codepaths in Feburary here [lldb] Correctly annotate threads at a bp site as hitting it by jasonmolenda · Pull Request #82709 · llvm/llvm-project · GitHub . The problem is that any thread sitting at a BreakpointSite will automatically have that BreakpointSite stepped over when we resume execution, as if we hit the breakpoint. In my PR, we had a multi-threaded program that hit a breakpoint on thread A, but on thread B was sitting at the breakpoint site and hadn’t hit the breakpoint yet. When we resume execution, both thread A and thread B are at BreakpointSites so we step over the breakpoints and resume execution. The test was counting how many times a breakpoint was hit, and the breakpoint on thread B was never “hit”, so my fix was to synthesize the breakpoint hit stop reason for that thread. There’s a number of places in lldb where we implement that behavior.

What Jim wants me to do is change this behavior so we record threads that have ACTUALLY hit a breakpoint when we stop, and the pc of the breakpoint it hit. When we resume execution, only threads that are still at the pc of the breakpoint that they hit will step over the breakpoint site and then resume execution.

This does mean that in your case, where you have a breakpoint on _start, you attach and the process is stopped at _start with a signal, the breakpoint has not yet been hit. When we resume execution, we will hit the breakpoint immediately and stop again. You’ll have to next twice here – the same thing you saw. This isn’t going to make for a happy user, but it does seem correct – we stopped because of a signal (it could have been an instruction step like above), we have not yet hit the breakpoint at this pc, and if we change lldb to stop making up breakpoint-hit stop reasons, this double-next will be the fallout of that.

To state it more shortly. Current lldb behavior:

  1. When a thread is at a BreakpointSite and we are resuming, instruction-step past the breakpoint and re-insert it silently.

  2. When a thread stops at a BreakpointSite, set the reason it stopped to breakpoint-hit.

What Jim has suggested, and I’m working on a bit right now, is

  1. When a thread is at a BreakpointSite, and we recorded hitting the breakpoint at this address when we stopped, instruction step past the BreakpointSite and re-insert it silently.

  2. Only set the stop-reason to breakpoint-hit when we hit the breakpoint at a pc, leave the stop reason unmodified if there is a breakpoint at the pc but we haven’t executed it yet. (this is a little tricky because the stop pc value is the same for “breakpoint was hit” and “sitting at a breakpoint instruction” - the pc is “unwound” when the kernel reports a breakpoint hit normally making them look the same) When a thread stopped with a breakpoint-hit reason, record the address of the breakpoint when it was hit.

We’ll be recording the pc of the breakpoint that was hit at a thread to handle cases where a user changes the pc manually while stopped, possibly changing it to a BreakpointSite. And to handle the case where a user adds a breakpoint at the current pc value while a thread is stopped. In both cases, we haven’t hit that breakpoint yet, so when we resume execution we will hit that breakpoint and stop again.

The important goal here is that when we stop with another stop reason – we hit a watchpoint (on aarch64 where we instruction step over the watchpoint to gather the value), or we instruction stepped or the thread stopped asynchronously while at a BreakpointSite that hasn’t executed yet, we will report the true stop reason (or no stop reason), and not overwrite it with a “breakpoint hit” stop reason isn’t accurate.

Instead of ‘-o “b _start” -o “r”’, try ‘-o “process launch -s”’. That will stop at the entrypoint.

What Jim has suggested, and I’m working on a bit right now, is

  1. When a thread is at a BreakpointSite, and we recorded hitting the breakpoint at this address when we stopped, instruction step past the BreakpointSite and re-insert it silently.
  2. Only set the stop-reason to breakpoint-hit when we hit the breakpoint at a pc, leave the stop reason unmodified if there is a breakpoint at the pc but we haven’t executed it yet. (this is a little tricky because the stop pc value is the same for “breakpoint was hit” and “sitting at a breakpoint instruction” - the pc is “unwound” when the kernel reports a breakpoint hit normally making them look the same) When a thread stopped with a breakpoint-hit reason, record the address of the breakpoint when it was hit.

It sounds like what you want to achieve here is to let lldb remembers the real stop reason when the pc is same as a breakpoint site, and report that to users instead of reasons always being breakpoint. Is it correct?

Back to the problem with unable to stop a _start, it’s caused by lldb using step-over breakpoint thread plan if the pc is same as a breakpoint site. The process is already stopped at _start when launched. Because of the rule of stepping over the breakpoint at pc, it steps over the first instruction of _start. Making lldb remembers the real stop reason doesn’t seem to solve it. It looks like just a super special case that lldb needs to take care of. Maybe do not step over the breakpoint site at pc if it’s just launched.

Oh, I see. Once we are able to get the real stop reason either due to breakpoint hit or others, when the thread is stopped at a pc that has the address as a breakpoint, we only let it resume if it indeed hits the breakpoint.

Are you working on this now?

Yeah, it’s going more slowly than I’d like so it’s a bit of a “background task”, but I spent a good chunk of yesterday afternoon working on a tricky bit in StopInfoMachException as a part of this change. The first thing that needs to be done is change any place where we set the stop reason to breakpoint-hit just because we’re at a BreakpointSite. There’s a bunch of places that do this. The next part will be changing the resume codepath that pushes the StepOverBreakpoint thread plan, and only do that when we’ve actually hit the breakpoint.

1 Like

That does work, but it means the user has to know about this quirk. The reason we’re looking into this is because a user tried to do b _start, expecting that it would stop at the entry point – but it did not.

to be clear, I expect the work I’m doing right now will handle the issue Zequan is mentioning here. We stopped with a signal stop-reason, and we are at a BreakpointSite but the breakpoint instruction has not yet executed. What my current idea plan is: (1) we’ll say the stop reason is a signal, and (2) when you resume execution, we will then hit that breakpoint and report a stop-reason breakpoint-hit and the user will need to resume execution again.

Jim and I briefly discussed a use case where someone next’s to a pc, then does break set -a $pc and then continues - with my design, this will immediately hit the breakpoint that they added at $pc. Whereas lldb’s current behavior is that we will silently step past that breakpoint.

You can imagine trying to capture “was there a BreakpointSite at $pc when this thread stopped”, but what if someone steps, then does p/x $pc = <addr-of-a-breakpoint> to change the pc value. We stopped originally with no BreakpointSite at our pc, but the pc was modified while stopped to an existing BreakpointSite. Old lldb behavior when the user continue’s - silently step past the breakpoint. My plans: stop and show that we hit the breakpoint.

I don’t think these cases are common, of course, but I was trying to work through all the different possible fallout from making a change like this. Zequan’s example is another one where we will be changing the behavior from how it is today, but losing stop-reasons is a real problem so it’s worth it, IMO.

(the original motivating bug was Jim had a test program that had a watchpoint and a breakpoint directly after the instruction that triggers the watchpoint on an AArch64 program. So lldb receives the watchpoint-hit trap, disables the watchpoint, instruction-step’s, re-inserts the watchpoint and now it is sitting at a BreakpointSite. And lldb’s logic is “pc at BreakpointSite means we hit a breakpoint” and we lost the watchpoint-hit stop reason. I showed an example with stepi earlier that’s easier to see the failure, but the watchpoint one was a more serious misbehavior.)