Improve single thread stepping

Thank you for the insights. This is incredibly helpful!

This is my first time delving into the ThreadPlan logic, and I’ve only done a brief read on single-thread stepping. Regarding the performance bottleneck, the biggest challenge lies in determining whether it’s latency or CPU-bound. Based on my investigation of a step operation taking 8-10 seconds, it appears that there are 8 internal/private stops, 3 of which involve stopping other threads, while the other 5 require resuming all threads.

I’ve profiled LLDB during the stepping process using the Linux perf tool, which reports various CPU bottlenecks at around 75%. The most critical path appears to be in ProcessGDBRemote::GetThreadStopInfoFromJSON(), called from ProcessGDBRemote::CalculateThreadStopInfo(). It seems that we enumerate through each thread and attempt to find its stop information based on TID from m_jstopinfo_sp. In the case of stopping all threads, the m_jstopinfo_sp JSON can become quite large, potentially O(N^2). There’s a similar issue in ThreadList::GetBackingThread(). I’ve made a quick prototype using a hash table to map <TID => stop_info>, and it seems to improve stepping performance by around 10-20%, although it’s not as significant as single-thread stepping. I’ve also noticed various JSON parsing operations in ProcessGDBRemote::SetThreadStopInfo(). It seems that the size of jstopinfo is much larger when stopping all threads.

Another possibility for the slowness is latency-bound, as you’ve mentioned. Resuming/pausing 3000+ threads and synchronously waiting for them can indeed take some time.

I’ve also experimented with setting set target.process.experimental.os-plugin-reports-all-threads to false, hoping it would reduce jstopinfo, but I haven’t seen any performance improvement with this setting. Currently, it’s not entirely clear to me what is causing the slowness, but if you ask me, it’s most likely due to the latency issue mentioned above.

The second and third responses are quite intriguing. I might explore some of them if this turns out to be a high-priority issue. I had the same question in my mind while examining the profile trace—whether we really need to update the stop info during private/internal stops. However, I haven’t delved deep enough to propose any solutions. It’s great that you and Pavel have already explored this to some extent. I wonder if, to reduce the size of jstopinfo, setting target.process.experimental.os-plugin-reports-all-threads to false is sufficient or if further optimizations are needed?

Snippet of the profile trace:

|--49.81%--lldb_private::ThreadList::ShouldStop(lldb_private::Event*)
                          |          |
                          |           --49.43%--lldb_private::Thread::GetStopInfo()
                          |                     |
                          |                      --49.35%--lldb_private::Thread::GetPrivateStopInfo(bool)
                          |                                |
                          |                                 --49.30%--lldb_private::process_gdb_remote::ThreadGDBRemote::CalculateStopInfo()
                          |                                           lldb_private::process_gdb_remote::ProcessGDBRemote::CalculateThreadStopInfo(lldb_private::process_gdb_remote::ThreadGDBRemote*)
                          |                                           |
                          |                                            --48.63%--lldb_private::process_gdb_remote::ProcessGDBRemote::GetThreadStopInfoFromJSON(lldb_private::process_gdb_remote::ThreadGDBRemote*, std::shared_ptr<lldb_private::StructuredData::Object> const&)
                          |                                                      |
                          |                                                      |--12.24%--lldb_private::process_gdb_remote::ProcessGDBRemote::SetThreadStopInfo(lldb_private::StructuredData::Dictionary*)
                          |                                                      |          |
                          |                                                      |           --12.15%--lldb_private::process_gdb_remote::ProcessGDBRemote::SetThreadStopInfo(unsigned long, std::map<unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<unsigned int>, std::allocator<std::pair<unsigned int const, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, unsigned char, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, unsigned int, std::vector<unsigned long, std::allocator<unsigned long> > const&, unsigned long, bool, lldb_private::LazyBool, unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&, lldb::QueueKind, unsigned long)
                          |                                                      |                     |
                          |                                                      |                     |--5.16%--lldb_private::ThreadList::GetBackingThread(std::shared_ptr<lldb_private::Thread> const&)
                          |                                                      |                     |
                          |                                                      |                     |--2.94%--lldb_private::ThreadList::FindThreadByProtocolID(unsigned long, bool)
                          |                                                      |                     |

|--24.73%--lldb_private::process_gdb_remote::ProcessGDBRemote::RefreshStateAfterStop()
                          |          |
                          |          |--19.68%--lldb_private::Process::UpdateThreadListIfNeeded()
                          |          |          |
                          |          |          |--11.50%--lldb_private::ThreadList::Update(lldb_private::ThreadList&)
                          |          |          |          |
                          |          |          |           --0.89%--lldb_private::Thread::GetBackingThread() const
                          |          |          |
                          |          |          |--3.08%--lldb_private::ThreadPlanStackMap::Update(lldb_private::ThreadList&, bool, bool)
                          |          |          |          |
                          |          |          |           --2.95%--lldb_private::ThreadList::FindThreadByID(unsigned long, bool)
                          |          |          |
                          |          |          |--2.76%--lldb_private::Thread::GetBackingThread() const