LLDB deadlocks trying to unwind

LLDB, connected to a stub I’m building, is trying to produce a stack trace of my stopped program but deadlocks instead:

#0 __psynch_mutexwait ()
#1 pthread_mutex_lock ()
#2 lldb_private::Mutex::Lock() at lldb/source/Host/common/Mutex.cpp:281
#3 lldb_private::Mutex::Locker::Lock(lldb_private::Mutex&) at lldb/source/Host/common/Mutex.cpp:152
#4 lldb_private::Mutex::Locker::Locker(lldb_private::Mutex&) at lldb/source/Host/common/Mutex.cpp:112
#5 lldb_private::Mutex::Locker::Locker(lldb_private::Mutex&) at lldb/source/Host/common/Mutex.cpp:113
#6 lldb_private::Unwind::GetFrameCount() at lldb/include/lldb/Target/Unwind.h:51
#7 UnwindMacOSXFrameBackchain::DoGetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/source/Plugins/Process/Utility/UnwindMacOSXFrameBackchain.cpp:59
#8 lldb_private::Unwind::GetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/include/lldb/Target/Unwind.h:78
#9 lldb_private::StackFrameList::GetFramesUpTo(unsigned int) at lldb/source/Target/StackFrameList.cpp:304
#10 lldb_private::StackFrameList::ResetCurrentInlinedDepth() at lldb/source/Target/StackFrameList.cpp:110
#11 lldb_private::StackFrameList::CalculateCurrentInlinedDepth() at lldb/source/Target/StackFrameList.cpp:79
#12 lldb_private::thread::ShouldStop(lldb_private::Event*) at lldb/source/Target/Thread.cpp:752
#13 lldb_private::ThreadList::ShouldStop(lldb_private::Event*) at lldb/source/Target/ThreadList.cpp:298
#14 lldb_private::Process::ShouldBroadcastEvent(lldb_private::Event*) at lldb/source/Target/Process.cpp:3707
#15 lldb_private::Process::HandlePrivateEvent(std::__1::shared_ptr<lldb_private::Event>&) at lldb/source/Target/Process.cpp:3936
#16 lldb_private::Process::ConnectRemote(lldb_private::Stream*, char const*) at lldb/source/Target/Process.cpp:3257
#17 CommandObjectProcessConnect::DoExecute(lldb_private::Args&, lldb_private::CommandReturnObject&) at lldb/source/Commands/CommandObjectProcess.cpp:1173
#18 lldb_private::CommandObjectParsed::Execute(char const*, lldb_private::CommandReturnObject&) at lldb/source/Interpreter/CommandObject.cpp:1038
#19 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, lldb_private::ExecutionContext*, bool, bool) at lldb/source/Interpreter/CommandInterpreter.cpp:1825
#20 lldb_private::CommandObjectRegexCommand::DoExecute(char const*, lldb_private::CommandReturnObject&) at lldb/source/Interpreter/CommandObjectRegexCommand.cpp:89
#21 lldb_private::CommandObjectRaw::Execute(char const*, lldb_private::CommandReturnObject&) at lldb/source/Interpreter/CommandObject.cpp:1064
#22 lldb_private::CommandInterpreter::HandleCommand(char const*, lldb_private::LazyBool, lldb_private::CommandReturnObject&, lldb_private::ExecutionContext*, bool, bool) at lldb/source/Interpreter/CommandInterpreter.cpp:1825
#23 lldb::SBCommandInterpreter::HandleCommand(char const*, lldb::SBCommandReturnObject&, bool) at lldb/source/API/SBCommandInterpreter.cpp:122
#24 Driver::HandleIOEvent(lldb::SBEvent const&) at lldb/tools/driver/Driver.cpp:1083
#25 Driver::MainLoop() at lldb/tools/driver/Driver.cpp:1556
#26 main at lldb/tools/driver/Driver.cpp:1727
#27 start ()

#8 locks m_unwind_mutex and #6 tries to do the same thing.

The code path is unconditional (calling GetFrameInfoAtIndex with a UnwindMacOSXFrameBackchain can only end in a deadlock, from what I can see), so I’m not sure how I can prevent this from happening.

Félix

Hmm, based on the stack trace you provided, it seems that m_unwind_mutex should be initialized as a recursive mutex, just like Process::m_thread_mutex.

Can you try to change the initialization of m_unwind_mutex in Unwind.h to match that of Process:m_thread_mutex, and see if it helps? If not, then it means some other thread is holding that mutex and more investigation is required…

Cheers,
Dan

It's obviously the only thread, since frame #8 acquired the mutex first and frame #6 tries to acquire it again.

I changed it to a recursive mutex, and now it crashes on a stack overflow instead of deadlocking. Here's an excerpt of the stack, from the initially calling frame (StackFrameList::GetFramesUpTo):

#12543 UnwindMacOSXFrameBackchain::DoGetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/source/Plugins/Process/Utility/UnwindMacOSXFrameBackchain.cpp:59
#12544 lldb_private::Unwind::GetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/include/lldb/Target/Unwind.h:78
#12545 lldb_private::StackFrameList::GetFramesUpTo(unsigned int) at lldb/source/Target/StackFrameList.cpp:332
#12546 lldb_private::StackFrameList::GetFrameAtIndex(unsigned int) at lldb/source/Target/StackFrameList.cpp:521
#12547 lldb_private::thread::GetStackFrameAtIndex(unsigned int) at lldb/include/lldb/Target/Thread.h:357
#12548 UnwindMacOSXFrameBackchain::DoGetFrameCount() at lldb/source/Plugins/Process/Utility/UnwindMacOSXFrameBackchain.cpp:45
#12549 lldb_private::Unwind::GetFrameCount() at lldb/include/lldb/Target/Unwind.h:52
#12550 UnwindMacOSXFrameBackchain::DoGetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/source/Plugins/Process/Utility/UnwindMacOSXFrameBackchain.cpp:59
#12551 lldb_private::Unwind::GetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/include/lldb/Target/Unwind.h:78
#12552 lldb_private::StackFrameList::GetFramesUpTo(unsigned int) at lldb/source/Target/StackFrameList.cpp:332
#12553 lldb_private::StackFrameList::GetFrameAtIndex(unsigned int) at lldb/source/Target/StackFrameList.cpp:521
#12554 lldb_private::thread::GetStackFrameAtIndex(unsigned int) at lldb/include/lldb/Target/Thread.h:357
#12555 UnwindMacOSXFrameBackchain::DoGetFrameCount() at lldb/source/Plugins/Process/Utility/UnwindMacOSXFrameBackchain.cpp:45
#12556 lldb_private::Unwind::GetFrameCount() at lldb/include/lldb/Target/Unwind.h:52
#12557 UnwindMacOSXFrameBackchain::DoGetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/source/Plugins/Process/Utility/UnwindMacOSXFrameBackchain.cpp:59
#12558 lldb_private::Unwind::GetFrameInfoAtIndex(unsigned int, unsigned long long&, unsigned long long&) at lldb/include/lldb/Target/Unwind.h:78
#12559 lldb_private::StackFrameList::GetFramesUpTo(unsigned int) at lldb/source/Target/StackFrameList.cpp:304

Ah, I see. I have not seen such an overflow, although I do most of my work
on Linux..

Jason is more experienced with the unwinder than I am...Jason, any
thoughts?

Dan

I'll look at this a bit.

The fact that you're picking up the UnwindMacOSXFrameBackchain is probably the source of the problem. This was an early unwinder written by Greg back before we had UnwindLLDB and RegisterContextLLDB - it hasn't been modified in a couple of years except for mechanical changes made across the source base.

What architecture are you debugging? Thread::GetUnwinder() should use UnwindLLDB for x86_64, i386, arm and thumb. It will use UnwindMacOSXFrameBackchain for any other architectures .... but it should probably just fail instead.

I'm making a stub for an emulator that executes PPC code.

Félix

You'll want to use UnwindLLDB.

Your ABI plugin may need to define the following methods (used by UnwindLLDB, RegisterContextLLDB):

CallFrameAddressIsValid
CodeAddressIsValid
StackUsesFrames
FixCodeAddress
CreateDefaultUnwindPlan (*)
FunctionCallsChangeCFA
CreateFunctionEntryUnwindPlan (*)
RegisterIsVolatile (*)

The (*) ones are important for unwinding.

For CreateDefaultUnwindPlan, the "default unwind plan" is how lldb will unwind a stack frame by default. e.g. on architectures where there is a frame pointer, dereferencing the frame pointer address often gives you the calling frame's frame pointer, and one word off of that you'll find the caller's saved pc value.

For CreateFunctionEntryUnwindPlan, this is the unwind plan which tells lldb how to find the caller function when it is sitting on the first instruction of a function. e.g. it was single-instruction-stepping along, it stepped into a new function, and it needs to find the caller.

RegisterIsVolatile tells lldb whether a register is callee-saved or not. Whether it is preserved or not. Whether it is spilled or not. Whether it is volatile or not. Different people use different terms. Your ABI doc will specify which registers are volatile or not. NB: on architectures that use registers for argument passing, the argument registers are "volatile" because they are assumed to be reused/overwritten on any function call.

Both x86 and arm have UnwindAssembly implementations (e.g. source/Plugins/UnwindAssembly/x86/UnwindAssembly-x86.cpp) - the UnwindAssembly class looks at the assembly instructions of a function and creates an UnwindPlan based on them. It is done statically so flow control is ignored. It's not ideal but the unwind instructions in the eh_frame section are not necessary valid at non-call sites in a program, and we found that we can get more accurate debugger behavior by inspecting the instructions manually than by following eh_frame.

The UnwindAssembly-generated UnwindPlan is used when unwinding the current frame (frame 0) in a program.

Above frame 0, we are guaranteed to be at a call site and so we will prefer to use eh_frame instructions if they are available.

For you, starting with the CreateDefaultUnwindPlan and CreateFunctionEntryUnwindPlan should be enough to bootstrap. If your environment has eh_frame, using that would be a good idea. Implementing a UnwindAssembly for ppc may be some real work - I would not take that on initially.

You may want to just bootstrap up with the super-simplistic UnwindMacOSXFrameBackchain() -- but it looks like you'll need to do a little cleanup to get it working again. I would not plan on living on UnwindMacOSXFrameBackchain() for long, it is a very simple way of doing backtraces and it will get many complicated things wrong.

Okay, I'll look into that. Thanks!

Félix