Lldb-server significant slowdown with 3000 threads

jeffreytan81 · March 21, 2023, 4:21am

Hi,

We are noticing a significant performance issue while using lldb debugging an internal service. Turns out the performance issue is caused by the service using multiple thread pools creating 4000~5000 threads (yeah, pretty bad ). Benchmarking with and without debugger shows around 6~7 minutes slow down caused by lldb attaching while gdb incurs very minimum slow down.

Further digging shows, the bottleneck is caused by the following code calling waitpid() repeatedly in NativeProcessLinux::SigchldHandler:

bool checked_main_thread = false;
  for (const auto &thread_up : m_threads) {
    if (thread_up->GetID() == GetID())
      checked_main_thread = true;
 
    if (std::optional<WaitStatus> status = HandlePid(thread_up->GetID()))
      tid_events.try_emplace(thread_up->GetID(), *status);
  }

You can reproduce the slow down with the code below:

gist.github.com

https://gist.github.com/jeffreytan81/f0bcb4e56eb138d37671967acf2d8ed4

gistfile1.txt


std::mutex m;
std::condition_variable cv;
bool ready = false;
bool processed = false;

void worker_thread() {
    // Wait until main() sends data
    std::unique_lock<std::mutex> lk(m);
    cv.wait(lk, []{return ready;});

This file has been truncated. show original

Without debugger, it finishes under 2 seconds while, with debugger, it costs more than 10 minutes.

This change seems to be introduced in ⚙ D116372 [lldb-server/linux] Fix waitpid for multithreaded forks. After reverting the change, the lldb wall-time drops from 10minutes to under 3 seconds.

@labath, what do you think how to solve it? Thanks. cc @clayborg

Jeffrey

labath · March 21, 2023, 8:50am

Hello Jeffrey,

thanks for bringing this to my attention. As I mentioned in D116372 [lldb-server/linux] Fix waitpid for multithreaded forks, there are basically two ways to solve the problem that it is trying to solve (multiple NativeProcess instances stealing waitpid events from one another). The patch implemented the second one because it made the code cleaner. I did not expect it to have that much of a performance impact, though in retrospect, I probably should have.

I think this means we need to implement the first option instead. There is nothing fundamentally hard there, it just requires redesigning some of the interfaces around the NativeProcess class to enable us to centrally listen for waitpid events and then dispatch them to the appropriate process. The main thing which makes that complicated is that there the notifications for clone child threads can’t be associated with a process without the corresponding notification on the parent thread (and the two can come in any order). This means one has to have some sort of a central repository of threads that will be assigned to a specific process once their parent is known. Nothing impossible – just work.

I think I can take a stab at this next week, but if you want, you can try to implement something sooner.

jeffreytan81 · March 22, 2023, 6:56am

@labath, make sense. I am not an expert in ptrace or the code around this so you will be a better person to fix it. Let me know how it goes.

Thanks
Jeffrey

labath · March 27, 2023, 3:53pm

Hello Jeffrey,

please check out D146977. It actually turned out easier than I expected since I could reuse the existing Factory class as the “central thread repository”. I haven’t tested the performance, but I would expect it to be roughly on par with the pre-D116372 world.

(Technically the algorithm is still quadratic, but now the only quadratic part is the iteration through the NativeProcessProtocol thread list. We could also fix that by using a different data structure to store the threads.)

Topic		Replies	Views
Lack of parallelism LLDB	5	71	May 2, 2017
Breakpoint + callback performance ... Can it be faster? LLDB	5	79	August 17, 2016
[Bug 37294] New: lldb-server can deadlock in ack-mode LLDB	0	73	April 30, 2018
Debug on iOS device very slow with swift project and golang lib LLDB	1	165	February 20, 2024
Improve single thread stepping LLDB	11	523	April 25, 2024

Lldb-server significant slowdown with 3000 threads

Related Topics