Weird stop stack while hitting breakpoint

Hi,

Our IDE(wrapping lldb using python) works fine on Linux for simple hello world cases. While trying a real world case, I found whenever we set a source line breakpoint, then trigger the code path, lldb will send a stopped state process event, with thread.GetStopReason() being None and with weird callstack. Any ideas why do I get this stop stack(code is listed at the end)? I have verified that if I do not set breakpoint and trigger the same code path does not cause this stop event to generate.

bt

  • thread #1: tid = 952490, 0x00007fd7cb2daa83 libc.so.6`_GI_epoll_wait + 51, name = 'biggrep_master
  • frame #0: 0x00007fd7cb2daa83 libc.so.6__GI_epoll_wait + 51 frame #1: 0x000000000271189f biggrep_master_server_asyncepoll_dispatch(base=0x00007fd7ca970800, arg=0x00007fd7ca62c1e0, tv=) + 127 at epoll.c:315
    frame #2: 0x000000000270f6d1 biggrep_master_server_asyncevent_base_loop(base=0x00007fd7ca970800, flags=<unavailable>) + 225 at event.c:524 frame #3: 0x00000000025f9378 biggrep_master_server_asyncfolly::EventBase::loopBody(this=0x00007fd7ca945180, flags=0) + 834 at EventBase.cpp:335
    frame #4: 0x00000000025f900b biggrep_master_server_asyncfolly::EventBase::loop(this=0x00007fd7ca945180) + 29 at EventBase.cpp:287 frame #5: 0x00000000025fa053 biggrep_master_server_asyncfolly::EventBase::loopForever(this=0x00007fd7ca945180) + 109 at EventBase.cpp:435
    frame #6: 0x0000000001e24b72 biggrep_master_server_asyncapache::thrift::ThriftServer::serve(this=0x00007fd7ca96d710) + 110 at ThriftServer.cpp:365 frame #7: 0x00000000004906bc biggrep_master_server_asyncfacebook::services::ServiceFramework::startFramework(this=0x00007ffc06776140, waitUntilStop=true) + 1942 at ServiceFramework.cpp:885
    frame #8: 0x000000000048fe6d biggrep_master_server_asyncfacebook::services::ServiceFramework::go(this=0x00007ffc06776140, waitUntilStop=true) + 35 at ServiceFramework.cpp:775 frame #9: 0x00000000004219a7 biggrep_master_server_asyncmain(argc=1, argv=0x00007ffc067769d8) + 2306 at BigGrepMasterServerAsync.cpp:134
    frame #10: 0x00007fd7cb1ed0f6 libc.so.6__libc_start_main + 246 frame #11: 0x0000000000420bfc biggrep_master_server_async_start + 41 at start.S:122

Here is the code snippet of handling code:

def _handle_process_event(self, event):

Ignore non-stopping events.

if lldb.SBProcess.GetRestartedFromEvent(event):
log_debug(‘Non stopping event: %s’ % str(event))
return

process = lldb.SBProcess.GetProcessFromEvent(event)

if process.state == lldb.eStateStopped:
self._send_paused_notification(process)
elif process.state == lldb.eStateExited:
exit_message = ‘Process(%d) exited with: %u’ % (
process.GetProcessID(),
process.GetExitStatus())
if process.GetExitDescription():
exit_message += (', ’ + process.GetExitDescription())
self._send_user_output(‘log’, exit_message)
self.should_quit = True
else:
self._send_notification(‘Debugger.resumed’, None)

event_type = event.GetType()
if event_type == lldb.SBProcess.eBroadcastBitSTDOUT:

Read stdout from inferior.

process_output = ‘’
while True:
output_part = process.GetSTDOUT(1024)
if not output_part or len(output_part) == 0:
break
process_output += output_part
self._send_user_output(‘log’, process_output)

Btw: the breakpoint I set is:
“b BigGrepMasterAsync.cpp:171” which is not in any of the stopped stack frames.

You only show one thread in your example. Did another thread have a valid stop reason? lldb shouldn’t be stopping for no reason anywhere…

Jim

Hmm, interesting, I got the stop reason from the lldb.SBProcess.GetProcessFromEvent(event).GetSelectedThread().GetStopReason(). Is that thread not the one that stopped? But you are right, the breakpoint hits in another thread:

thread #87: tid = 1006769, 0x000000000042eacd biggrep_master_server_async`facebook::biggrep::BigGrepMasterAsync::future_find(this=0x00007f3ea2d74fd0, corpus=error: summary string parsing error, needle=error: summary string parsing error, options=0x00007f3e899fc7e0) + 51 at BigGrepMasterAsync.cpp:171, name = ‘BigGrep-pri3-32’, stop reason = breakpoint 1.1

You iterate over all the threads and ask each thread what its stop reason is.

On many platforms (OS X for sure) there’s no guarantee that when you stop you will only have hit one breakpoint on one thread. On OS X in multithreaded programs, it is not at all uncommon to have many threads hit breakpoint(s) by the the time the stop gets reported. So you just have to iterate over all the threads and see what their stop reasons are. Note that it isn’t just breakpoints, you might have been stepping on thread A, and when you stop, thread A will have stopped with “plan complete” for the step operation, and thread B for some other breakpoint.

So when you get a stop event you have to iterate over the threads and see why they have stopped.

LLDB will set one of the threads as the selected thread, using some heuristics (if you were stepping on thread A & threads A & B stopped with breakpoints, thread A will be the selected thread, etc…) So you could just show the selected thread, but really you want to figure out what all the threads are doing.

Jim

Thanks for the info. I understand the multiple threads stopping at the same time issue. But I would think we should at least pick one stopped thread and set it as selected thread instead of some random thread with stop reason None. Also, in my repro case, there is only one thread that has stop reason, so the heuristics should be pretty trivial to set selected thread to that one.
I have workaround this issue with the suggestion but I think there is a bug(on Linux) here.

The selected thread should be getting set. You didn’t include the code for _send_paused_notification so I don’t know what that does, but if SBProcess::GetSelectedThread wasn’t returning a thread with a valid stop reason, then there’s some bug somewhere. That’s all done in generic code, however, so I’m not sure how that would happen.

Jim

It is really up to the IDE to decide this so the logic belongs in your IDE. We do things as follows:

If no thread was selected before, display the first thread that has a stop reason other than none. If no threads have stop reasons, select the first thread. If a thread was selected before, then see if that same thread is stopped with a reason the next time you stop and select that one, regardless if it is the first thread with a stop reason. The idea is, if you were stepping or doing something in a thread, and then stop again, you don't want the IDE changing away from your current thread if this thread has a stop reason. If this thread doesn't have a stop reason, then select the first one that does. If not threads have stop reasons, then display the same thread as before.

All this logic is handled in Process::HandleProcessStateChangedEvent (see around line 1215 in Process.cpp) You shouldn’t have to reimplement the logic for setting the selected thread unless you don’t like our heuristics. Note, that’s in generic code, so I don’t know why it wouldn’t be working right on Linux.

Jim

If you send me a small repro case, I can try to look at why is Linux
different here.

Thanks guys. I tried our IDE against a sample multithreading program on Mac, it correctly switches selected thread to the worker thread that triggers breakpoint, while on Linux(CentOS release 6.7) it failed to do that. Repro code:

=========================Output**=========================**

Launch result: success
Listening Thread ID: 139749072582400
running_signal wait
stopped_signal wait
Target event: ModulesLoaded
Target event: ModulesLoaded
Target event: ModulesLoaded
Target event: ModulesLoaded
Target event: ModulesLoaded
Target event: ModulesLoaded
Target event: ModulesLoaded
Target event: ModulesLoaded
Non stopping event: <lldb.SBEvent; proxy of <Swig Object of type ‘lldb::SBEvent *’ at 0x7f19e0975990> >
Process event: StateChanged, Running
Stop reason: 1
Process event: Stdout, Running
Stop reason: 1
Stdout:
main() : creating thread, 0
Process event: StateChanged, Stopped

after wait_for_process_run_then_stop
frame #0: 0x00007fcb7259ceb0 ld-linux-x86-64.so.2__GI__dl_debug_state frame #1: 0x00007fcb725a0c53 ld-linux-x86-64.so.2dl_open_worker + 499
frame #2: 0x00007fcb7259c286 ld-linux-x86-64.so.2_dl_catch_error + 102 frame #3: 0x00007fcb725a063a ld-linux-x86-64.so.2_dl_open + 186
frame #4: 0x00007fcb71963c60 libc.so.6do_dlopen + 64 frame #5: 0x00007fcb7259c286 ld-linux-x86-64.so.2_dl_catch_error + 102
frame #6: 0x00007fcb71963db7 libc.so.6__GI___libc_dlopen_mode + 71 frame #7: 0x00007fcb71be0eec libpthread.so.0pthread_cancel_init + 76
frame #8: 0x00007fcb71be104c libpthread.so.0_Unwind_ForcedUnwind + 60 frame #9: 0x00007fcb71bdef60 libpthread.so.0__GI___pthread_unwind + 64
frame #10: 0x00007fcb71bd9175 libpthread.so.0__pthread_exit + 37 frame #11: 0x0000000000400ac0 threadsmain + 195 at threads.cpp:31
frame #12: 0x00007fcb7185bd5d libc.so.6`__libc_start_main + 253
frame #13: 0x00000000004008f9 threads
Exiting listener thread

=========================Inferior**=========================**

#include
#include
#include <pthread.h>

using namespace std;

#define NUM_THREADS 1

void *PrintHello(void *threadid)
{
long tid;
tid = (long)threadid;
cout << "Hello World! Thread ID, " << tid << endl;
pthread_exit(NULL);
}

int main ()
{
pthread_t threads[NUM_THREADS];
int rc;
int i;
for( i=0; i < NUM_THREADS; i++ ){
cout << "main() : creating thread, " << i << endl;
rc = pthread_create(&threads[i], NULL,
PrintHello, (void *)i);
if (rc){
cout << “Error:unable to create thread,” << rc << endl;
exit(-1);
}
}
pthread_exit(NULL);
}

=========================LLDB python automation code=========================
main.py

Should be first for LLDB package to be added to search path.

from find_lldb import lldb
import sys
import os
import time
from sys import stdin, stdout
from event_thread import LLDBListenerThread
import threading

def wait_for_process_run_then_stop(running_signal, stopped_signal):
print ‘running_signal wait’
running_signal.wait()
running_signal.clear()
print ‘stopped_signal wait’
stopped_signal.wait()
stopped_signal.clear()

def do_test():
debugger = lldb.SBDebugger.Create()
debugger.SetAsync(True)
executable_path = ‘~/personal/cpp/temp/threads’
target = debugger.CreateTargetWithFileAndArch(executable_path, lldb.LLDB_ARCH_DEFAULT)
target.BreakpointCreateByName(‘PrintHello’)

listener = lldb.SBListener(‘Event Listener’)
error = lldb.SBError()
process = target.Launch (listener,
None, # argv
None, # envp
None, # stdin_path
None, # stdout_path
None, # stderr_path
None, # working directory
0, # launch flags
False, # Stop at entry
error) # error
print ‘Launch result: %s’ % str(error)

running_signal = threading.Event()
stopped_signal = threading.Event()
running_signal.set()
event_thread = LLDBListenerThread(debugger, running_signal, stopped_signal)
event_thread.start()

wait_for_process_run_then_stop(running_signal, stopped_signal)

print ‘after wait_for_process_run_then_stop’
activeThread = process.GetSelectedThread()
for frame in activeThread.frames:
print frame

event_thread.should_quit = True
event_thread.join()

lldb.SBDebugger.Destroy(debugger)
return debugger

def main():
debugger = do_test()

if name == ‘main’:
main()

===========event_thread.py============

import lldb
from threading import Thread
from sys import stdout
import thread
import threading

target_event_type_to_name_map = {
lldb.SBTarget.eBroadcastBitBreakpointChanged: ‘BreakpointChanged’,
lldb.SBTarget.eBroadcastBitWatchpointChanged: ‘WatchpointChanged’,
lldb.SBTarget.eBroadcastBitModulesLoaded: ‘ModulesLoaded’,
lldb.SBTarget.eBroadcastBitModulesUnloaded: ‘ModulesUnloaded’,
lldb.SBTarget.eBroadcastBitSymbolsLoaded: ‘SymbolsLoaded’,
}

process_event_type_to_name_map = {
lldb.SBProcess.eBroadcastBitStateChanged: ‘StateChanged’,
lldb.SBProcess.eBroadcastBitSTDOUT: ‘Stdout’,
lldb.SBProcess.eBroadcastBitSTDERR: ‘Stderr’,
lldb.SBProcess.eBroadcastBitInterrupt: ‘Interupt’,
}

breakpoint_event_type_to_name_map = {
lldb.eBreakpointEventTypeAdded: ‘Added’,
lldb.eBreakpointEventTypeCommandChanged: ‘Command Changed’,
lldb.eBreakpointEventTypeConditionChanged: ‘Condition Changed’,
lldb.eBreakpointEventTypeDisabled: ‘Disabled’,
lldb.eBreakpointEventTypeEnabled: ‘Enabled’,
lldb.eBreakpointEventTypeIgnoreChanged: ‘Ignore Changed’,
lldb.eBreakpointEventTypeInvalidType: ‘Invalid Type’,
lldb.eBreakpointEventTypeLocationsAdded: ‘Location Added’,
lldb.eBreakpointEventTypeLocationsRemoved: ‘Location Removed’,
lldb.eBreakpointEventTypeLocationsResolved: ‘Location Resolved’,
lldb.eBreakpointEventTypeRemoved: ‘Removed’,
lldb.eBreakpointEventTypeThreadChanged: ‘Thread Changed’,
}

process_state_name_map = {
lldb.eStateRunning: ‘Running’,
lldb.eStateStepping: ‘Stepping’,
lldb.eStateAttaching: ‘Attaching’,
lldb.eStateConnected: ‘Connected’,
lldb.eStateCrashed: ‘Crashed’,
lldb.eStateDetached: ‘Detached’,
lldb.eStateExited: ‘Exited’,
lldb.eStateInvalid: ‘Invalid’,
lldb.eStateLaunching: ‘Launching’,
lldb.eStateStopped: ‘Stopped’,
lldb.eStateSuspended: ‘Suspended’,
lldb.eStateUnloaded: ‘Unloaded’,
}

class LLDBListenerThread(Thread):
should_quit = False

def init(self, debugger, running_signal=None, stopped_sigal=None):
Thread.init(self)
self._running_signal = running_signal
self._stopped_sigal = stopped_sigal
process = debugger.GetSelectedTarget().process
self.listener = debugger.GetListener()
self._add_listener_to_process(process)
self._add_listener_to_target(process.target)

‘’‘self.listener.StartListeningForEventClass(debugger, lldb.SBTarget.GetBroadcasterClassName(),
lldb.SBTarget.eBroadcastBitBreakpointChanged |
lldb.SBTarget.eBroadcastBitWatchpointChanged |
lldb.SBTarget.eBroadcastBitModulesLoaded |
lldb.SBTarget.eBroadcastBitModulesUnloaded |
lldb.SBTarget.eBroadcastBitSymbolsLoaded)
self.listener.StartListeningForEventClass(debugger, lldb.SBProcess.GetBroadcasterClassName(),
lldb.SBProcess.eBroadcastBitStateChanged |
lldb.SBProcess.eBroadcastBitSTDOUT |
lldb.SBProcess.eBroadcastBitSTDERR |
lldb.SBProcess.eBroadcastBitInterrupt)’‘’
‘’‘self.listener.StartListeningForEventClass(debugger, lldb.SBThread.GetBroadcasterClassName(),
lldb.SBThread.eBroadcastBitStackChanged | lldb.SBThread.eBroadcastBitThreadSuspended |
lldb.SBThread.eBroadcastBitThreadResumed | lldb.SBThread.eBroadcastBitSelectedFrameChanged | lldb.SBThread.eBroadcastBitThreadSelected)’‘’

def _add_listener_to_target(self, target):

Listen for breakpoint/watchpoint events (Added/Removed/Disabled/etc).

broadcaster = target.GetBroadcaster()
mask = lldb.SBTarget.eBroadcastBitBreakpointChanged | lldb.SBTarget.eBroadcastBitWatchpointChanged | lldb.SBTarget.eBroadcastBitModulesLoaded
broadcaster.AddListener(self.listener, mask)

def _add_listener_to_process(self, process):

Listen for process events (Start/Stop/Interrupt/etc).

broadcaster = process.GetBroadcaster()
mask = lldb.SBProcess.eBroadcastBitStateChanged | lldb.SBProcess.eBroadcastBitSTDOUT | lldb.SBProcess.eBroadcastBitSTDERR | lldb.SBProcess.eBroadcastBitInterrupt
broadcaster.AddListener(self.listener, mask)

def run(self):
print ‘ Listening Thread ID: %d’ % thread.get_ident()
while not self.should_quit:
event = lldb.SBEvent()
if self.listener.WaitForEvent(1, event):
if lldb.SBTarget.EventIsTargetEvent(event):
self._handle_target_event(event)
elif lldb.SBProcess.EventIsProcessEvent(event):
self._handle_process_event(event)
elif lldb.SBBreakpoint.EventIsBreakpointEvent(event):
self._handle_breakpoint_event(event)
elif lldb.SBThread.EventIsThreadEvent(event):
self._handle_thread_event(event)
else:
self._handle_unknown_event(event)
print ‘ Exiting listener thread’

def _handle_target_event(self, event):
event_type = event.GetType()
print ‘Target event: %s’ % target_event_type_to_name_map[event_type]

def _handle_process_event(self, event):
if lldb.SBProcess.GetRestartedFromEvent(event):
print ‘Non stopping event: %s’ % str(event)
return
process = lldb.SBProcess.GetProcessFromEvent(event)
event_type = event.GetType()
print ‘Process event: %s, %s’ % (process_event_type_to_name_map[event_type], process_state_name_map[process.state])
if process.state == lldb.eStateExited:
self.should_quit = True
elif process.state == lldb.eStateStopped:
if self._stopped_sigal:
self._stopped_sigal.set()
else:
if self._running_signal:
self._running_signal.set()

thread = process.selected_thread
print ‘Stop reason: %d’ % thread.GetStopReason()
if event_type == lldb.SBProcess.eBroadcastBitSTDOUT:
print ‘Stdout:’
while True:
output = process.GetSTDOUT(1024)
if output is None or len(output) == 0:
break
stdout.write(output)

def _handle_breakpoint_event(self, event):
breakpoint = lldb.SBBreakpoint.GetBreakpointFromEvent(event)
event_type = lldb.SBBreakpoint.GetBreakpointEventTypeFromEvent(event)
print 'Breakpoint event: [%s] %s ’ % (
breakpoint_event_type_to_name_map[event_type],
self._get_description_from_object(breakpoint))

def _handle_unknown_event(self, event):
print(‘Unknown event: %d %s %s’ % (
event.GetType(),
lldb.SBEvent.GetCStringFromEvent(event),
self._get_description_from_object(event)))

def _get_description_from_object(self, lldb_object):
description_stream = lldb.SBStream()
lldb_object.GetDescription(description_stream)
return description_stream.GetData()

Ok, so the reason for this behavior seems to be that the process hits
two breakpoints simultaneously:
- the breakpoint you have set and you are expecting to hit
- an internal shared library breakpoint we use to get notified of new
shared libraries

* thread #1: tid = 33390, 0x00007ff7dcd9f970
ld-linux-x86-64.so.2`_dl_debug_state, name = 'a.out', stop reason =
shared-library-event
  * frame #0: 0x00007ff7dcd9f970 ld-linux-x86-64.so.2`_dl_debug_state
    frame #1: 0x00007ff7dcda3b05
ld-linux-x86-64.so.2`___lldb_unnamed_symbol90$$ld-linux-x86-64.so.2 +
357
    frame #2: 0x00007ff7dcd9eff4
ld-linux-x86-64.so.2`___lldb_unnamed_symbol59$$ld-linux-x86-64.so.2 +
116
    frame #3: 0x00007ff7dcda33bb
ld-linux-x86-64.so.2`___lldb_unnamed_symbol88$$ld-linux-x86-64.so.2 +
171
    frame #4: 0x00007ff7dc5de0f2
libc.so.6`do_dlopen(ptr=0x00007ffed953c6f0) + 66 at dl-libc.c:87
    frame #5: 0x00007ff7dcd9eff4
ld-linux-x86-64.so.2`___lldb_unnamed_symbol59$$ld-linux-x86-64.so.2 +
116
    frame #6: 0x00007ff7dc5de1b2 libc.so.6`__GI___libc_dlopen_mode +
47 at dl-libc.c:46
    frame #7: 0x00007ff7dc5de183
libc.so.6`__GI___libc_dlopen_mode(name=<unavailable>,
mode=<unavailable>) + 35 at dl-libc.c:163
    frame #8: 0x00007ff7dc87da43 libpthread.so.0`pthread_cancel_init +
35 at unwind-forcedunwind.c:52
    frame #9: 0x00007ff7dc87dc0c
libpthread.so.0`_Unwind_ForcedUnwind(exc=0x00007ff7dcf8ddf0,
stop=(libpthread.so.0`unwind_stop at unwind.c:44),
stop_argument=0x00007ffed953c7f0) + 60 at unwind-forcedunwind.c:129
    frame #10: 0x00007ff7dc87bd40
libpthread.so.0`__GI___pthread_unwind(buf=<unavailable>) + 64 at
unwind.c:129
    frame #11: 0x00007ff7dc876535 libpthread.so.0`__pthread_exit + 37
at pthreadP.h:280
    frame #12: 0x00007ff7dc87651d
libpthread.so.0`__pthread_exit(value=<unavailable>) + 13 at
pthread_exit.c:29
    frame #13: 0x0000000000400b00 a.out`main + 186
    frame #14: 0x00007ff7dc4c9ec5
libc.so.6`__libc_start_main(main=(a.out`main), argc=1,
argv=0x00007ffed953c8a8, init=<unavailable>, fini=<unavailable>,
rtld_fini=<unavailable>, stack_end=0x00007ffed953c898) + 245 at
libc-start.c:287
    frame #15: 0x0000000000400939 a.out`_start + 41

  thread #2: tid = 35166, 0x00000000004009fd a.out`PrintHello(void*),
name = 'a.out', stop reason = breakpoint 1.1
    frame #0: 0x00000000004009fd a.out`PrintHello(void*)
    frame #1: 0x00007ff7dc875182
libpthread.so.0`start_thread(arg=0x00007ff7dbf8b700) + 194 at
pthread_create.c:312
    frame #2: 0x00007ff7dc5a247d libc.so.6`__clone + 109 at clone.S:111

These internal breakpoints are normally ignored, but I can certainly
imagine that if you hit them concurrently with a regular breakpoint,
the thread-selecting machinery will get confused. I don't have time to
look into this more right now, but I'll get to it eventually if
someone doesn't beat me to it...

PS: Your inferior is quite strange. I am not sure you are allowed to
exit the main thread via pthread_exit(), especially when you still
have non-detached threads running...

pl

Thank you Pavel! That’s interesting.
Not that urgent, I have a workaround for this issue. For the inferior, I just copied some random multithreading code from internet to demonstrate this issue, definitely a good way exit main thread :-).
Btw: I was surprised on Mac/Linux the whole process is still alive after main thread exits, I thought main thread C runtime will kill whole process, maybe it is pthread_exit() that prevent C runtime from killing process.(Sorry, newbie to Mac/Linux)

Jeffrey