As I was writing the description for PR #131404 I realized this change warrants a bigger discussion which is why I’m putting together this mini RFC.
Background & Problem Statement
TestDAP_breakpointEvents.py has been failing nondeterministically and we tracked it down to a race condition in lldb-dap: Issue #131242. The problem is that lldb-dap is doing multiple SB API calls, which individually are protected by a mutex, but lack atomicity as a whole. The result is that other threads (such as the event thread) can find LLDB in an incoherent state. The problem described above is not unique to lldb-dap. Anyone using the LLDB SB API risks running into this problem.
Solutions
The most obvious solution to this problem is to add locking at the DAP level. More generally, that means saying that this is the responsibility of the SB API’s clients. In a multithreaded environment, every SB API call, or group of SB API calls, needs to be protected by a mutex. The upside of this approach is that it keeps the SB API small and makes this the user’s responsibility.
An alternative approach is to reuse the Target API lock and expose it through the SB API. The problem we’re faced with here is similar to the one the SB API is faced with when calling lldb_private methods. When I said earlier that the individual calls are protected by a mutex, I was talking about the target API lock. Various SB APIs take the Target API lock to provide thread safety. The upside of this approach is that a bunch of SB API calls are already properly protected, and you only need to lock the API lock when you want to execute multiple API calls atomically.
I wonder if we could somehow also make this usable in Python? I guess the most idiomatic way approach for Python would be to implement the __enter__ and __exit__ methods, such that we can write something like
with my_target.lock():
# Do multiple actions as an atomic block, while holding the target lock
This may be my lack of understanding on some parts of the lldb internals, but how thread safe is SBDebugger? Does that also need a lock accessible from the SB API?
I am thinking about the use case of the ‘cancel’ DAP request. To support that in my prototype I had moved the lldb-dap input reader into its own thread to allow the DAP session to receive events into a queue and then execute items one at a time from the queue. To support cancelling an active request, when I see a ‘cancel’ request for the currently running command I run SBDebugger.RequestInterrupt() to make a best effort attempt at interrupting the current action. This would be coming from a different thread then the one processing requests.
In lldb-dap we do access and use the SBDebugger instance from multiple threads today (the request processing thread, an event monitoring thread and a progress event monitoring thread). I don’t think we have any locks on the lldb_dap::DAP today.
Most of the operation at the debugger level are thread safe, either because the underlying data structures they operate on are (e.g. OptionValue), because there’s a dedicated lock in lldb_private::debugger (e.g. the callbacks) or because it acquires the target’s API lock (e.g. SBDebugger::HandleCommand).
I think the answer to me is no, because there isn’t an equivalent to the Target API lock at the debugger level. That said, it does raise a good question: if we want to synchronize multiple SB API calls at the debugger level, we still need to synchronize using our own lock, or we could resort to using the dummy target’s API lock, which admittedly isn’t all that intuitive. If we’d like to go with the latter option, that is something we could expose at the debugger level.
This operation is thread safe and guarded by the IO handler stack mutex.