Thanks again for the detailed explanation of this dilly of a pickle. I
thought about this problem a little more.. It would be really nice to
resolve this soon, since the different locking behaviour on MacOSX vs
other platforms is something I think we all want to excise as quickly as
My thoughts are inline below:
No, a python class gets instantiated when a process starts up and it gets
a chance to run python code in order to make new threads lists up and
relay them back to LLDB. The python code must use the public SB API.
Oh, that's somewhat surprising. Where is this class? I'm curious about
what it does that's done in Python.. Aren't you worried about the overhead
the Global Interpreter Lock might add to normal operations like step-over?
I have no data to back that up, but putting python stuff on the critical
path seems like it can't be good for performance. Also, not that it's
important to us, but does this also mean that building a working LLDB
without Python support is no longer an option?
To confirm my understanding, it seems that "regular"
python code (like the test cases that drive the debugger) which uses SB*
ends up locking the "public" locks whereas "callback" code that I
gets invoked via OSPython ends up locking the "private" locks -- is this
Only when the private state thread calls into the python code which then
wants to use the public API to lookup variables, evaluate expressions,
etc in order to make the new thread list.
Are these two use-cases really running on different (main vs.
When the private state thread is doing stuff, the process run lock is
taken. Any public API access that requires a stopped process won't work
when being done from the private state thread since the process is
implementing some sort of run.
Right-O. This makes sense.
If so, once the locks are acquired, don't both of these use-cases end up
using the *same* shared Process data? Is there any coupling between the
data that the locks are protecting and the locks themselves? Based on
Process::GetRunLock() it seems like an arbitrary lock is chosen to
all of Process' private data.
No, the read/write locks don't protect the data, they just protect access
to the process to make sure it stays stopped while clients access things,
or make sure clients that want to read, stay locked out while process
control is going on. See below for more detail.
OK, here's where I think things start going sideways. It might be possible
to accomplish the synchronization that's needed between the public clients
and internal callers with two R/W locks, but I'm having trouble seeing how
they are to be used in tandem. I'll attempt to explain (what I think is) a
simpler approach below.
If the public/public run locks *are* coupled to the public/private
of the process respectively, why is it that the both write locks need to
be kept for long periods of time? I can understand the approach of
an internal (private) "run" write lock being held while the process is
running, but I'm having trouble conceptualizing why the public run lock
needs to be held for similar durations. It seems any synchronization we
for the public state should be very short-held "critical section" type
locks while functions query the last-known process state, or just return
an error because the process is running. Any SB function that results
public lock being held for longer than the duration of that function
like a bug in the SB code -- does that make sense or am I way off?
Since we do async process control, so when a user says "step over", this
might result in the process being stopped and started 100 times. Before
doing any process control, we take the public process run lock by
acquiring the write lock. This lets any people currently in the process
with the public run lock that was acquired for reading, to get out. Once
the public run lock is taken, we keep it until the process run control
action ("step over") is complete. We must lock down access to the process
for the entire duration because it wouldn't be a good time for someone to
come in and ask for the stack frames of a thread in between run 56 and
57. It stack frames would be meaningless from a public perspective.
Internally though, for any blessed sections of code (like
OperatingSystemPython) which we know will need access to the public API
from the private state thread, we allow them to not be locked out by
using the private run lock.
Hmm, OK. So, in the two R/W lock approach, there's places where both locks
need to be write-acquired though, right? I'm not sure I can think of all
the other interactions between the locks though. It might be equivalent to
the thing I propose below -- can you confirm if this is true or not?
If we make new threads for callbacks, then we would either be locked out
from accessing the process, or we would need to let clients access the
process in between intermediate steps of the current process run
control, which is not what anyone wants.
Agreed that those two things are bad, but I think a third option exists:
What we need is akin to a condition variable (that knows how to relinquish
a lock during wait) so that we don't "lock ourselves out" and a shared
lock to protect the "public" interfaces that do anything with the process'
state, with some extra smartness about "long running" operations that
require nobody mess around with the process state.
Here's what I'm thinking Process needs to have in order to solve this, in
a little more concrete terms:
1. One bool (m_state_in_transition) -- true when we have a long-running
operation in progress
2. One Condition Variable bool (m_callback_in_progress) -- this remains
true when a python thread that has SB access is launched
3. One Mutex (m_lock)* -- to protect every function inside process that
The rules of the game are:
- Every function in Process needs to acquire the m_lock before any
reads/writes happen. This is somewhat shitty because multiple reads would
be serialized, but more on this later*
- Any (non-const) function in Process (that modifies state like Resume(),
Step(), etc) needs to error out if m_state_in_transition == true. These
are forbidden in callbacks.
- Any const function in Process (that reads state but does not change it)
needs to acquire m_lock
- Any time an internal Process function needs to invoke some code that has
access to SB APIs, the procedure is:
-- set m_state_in_transition AND m_callback_in_progress to true
-- spawn a new thread that¹s going to use the public APIs
-- wait for m_callback_in_progress to turn false WITH m_mutex,
relinquishing the lock so others can enter Process internals, BUT
m_state_in_transition prevents any badness.
- When any "long-running" function exits, m_state_in_transition is set to
Aside: As an implementation detail, the first and the last steps can
probably be combined in a single scoped object.
In concert with the above approach, any "callback" (or, 'blessed' I think
you call it) thread that has access to SB but may run during one of the
Process internals needs to set m_callback_in_progress to false when done;
thereby joining itself with the Process internal function that invoked it.
* Based on the existing design, it seems you want multiple reads to happen
concurrently (for Xcode performance reasons). To allow multiple readers to
access Process at once, upgrade the m_lock from a run-of-the-mill Mutex to
a R/W lock. The problem with this is that we need the condition variable
to play nice (i.e. unlock the R/W lock during wait().) Pthread doesn't
support this with the rw_lock type as-is, but I'm sure it can be done if
we implement ReadWriteLock the "hard" way with lower-level primitives. See
-with-rwlock for inspiration.
The natural benefit of the above approach is that all the
locking/unlocking magic happen inside Process, and SB has to have 0
knowledge about Process locks.
Is it not possible to have both "driving" code and "callback"
code go through exactly the same code path the SB* layer as well as the
You mean using the same locks? I am confused by your question. All code
uses the same public API through the SB* layer. The SB layer locks you
out when it is dangerous to do things while the process is running. Many
of the issues we ran into stemmed from Xcode accessing the SB* API from
multiple threads and all of the locks (the target lock and the process
read/write (stopped/running) locks are needed.
Sorry I was not too clear -- basically I'm worried about the complexity of
having two discrete R/W locks here, and separate behaviour based on
internal/external threads. Since processes have one state, it "feels" like
there should be one lock.
Let me know what you think after reading the above explanation. The locks
are needed and necessary and we haven't been able to come up with another
way to control access to the things we need to do from the various
threads. We currently have two clients:
- public clients
- private process state thread clients
public clients get locked out, as they should, when the process is
running (continue, or stepping) so we don't get useless answers.
Do you really want to evaluate the value for "x" when we are in the
middle of a single step? No you want to get an error stating "error:
process is running".
Right, expression evaluation would be one of those "non-const" cases that
should error out if Process m_in_transition is true.
Do you really want stack frames for all threads in a process when the
process is in the middle of a single step? No you want to get an error
stating "error: process is running".
The private clients though are limited to OperatingSystemPython right
now, but could be expanded to breakpoint actions in the near future. A
breakpoint might say it wants to have python run when the breakpoint gets
hit and it would be great to not have to do this on another thread just
so we can see that "x != 12", or "rax != 123".
I'm not sure another thread for the callback will add a lot of overhead
compared to the overhead of the Python GIL. But I'm open here -- maybe
there is a way to do the condition variable thing (or equivalent) without
spawning another thread. We'd need a mutex-unlock-and-invoke-function sort
of functionality, but the general principle would be the same; when
non-Process code is running, all locks should be released, but the Process
object should be in a state whereby it cannot be changed (I.e. certain
functions are forbidden).
Another thing we have to watch out for is that there are limitations on
things that can be done on the private state thread. For example you
wouldn't want to try to evaluate an expression (run the target) while you
are currently running the target for another expression. There is special
code for this for one special case (calling "mmap" to allocate memory by
manually calling a function without using the expression parser), but
those don't use any of the special locking we have implemented.
Since evaluating an expression can modify the process (for example
'expression -- i++') it should be treated as non-const unless some IR
interpreter can guarantee otherwise (in a future optimization utopia.) It
seems safe to turn the process m_in_transition to be true during
evaluation of any expression, thereby preventing anything else from
modifying or reading from Process. Of course, internally, the
expression-evaluation would need to have it's own UncheckedResume()
functionality that resumes the inferior without checking m_in_transition
like the API Resume() would.
Another part that complicates reviewing this code is the external
-- wouldn't it be way better to encapsulate all the locking/unlocking
inside the Process class? I like code that keeps the lock code close to
the data they protect; what's the benefit of having SB* API classes
acquire and maintain internal Process locks?
You might need to do multiple things. Like to get a backtrace for all
threads. Do you want to ask the process for thread at index 0, then
backtrace it. Then another thread resumes the process and it stops
somewhere else. Now you ask for thread at index 1 and you get a different
thread because the thread list has changed. Not a great experience. This
is why we externally lock.
Agreed; it would be a terrible bug.
I'd prefer to see this whole Process locking thing redesigned
with internal rather than external locking. I think this will help nail
down exactly what cases are causing the double-locking thing to
This works great for single access functions, but falls down for complex
actions (like backtracking all threads).
I don't think external locking is the only way to accomplish this. What's
wrong with a Process::BacktraceAllThreads() (or even
Process::BacktraceSomeThreads(ThreadIDs)) function that internally
acquires the run lock and does its business without fear of somebody
continuing the process in the meantime? In BacktraceAllThreads(), if the
process is in transition, that function should just fail. If the process
is stopped, any other non-const calls (to Resume() or
expression-evaluation) should block until BacktraceAllThreads() returns,
or error out. Glancing at the code that calls GetRunLock() in the SB
layer, it seems like it all belongs in the Process class. I understand the
hesitation to add more stuff to Process, since the class is already
getting kinda large, but I think we should worry about splitting it up
into more more bite-sized components after this deadlock business is
We are open to any ideas you have for robust locking. We have tossed
around a few ideas for the process control, but haven't come up with one
that works better than what we have. Having a multi-threaded API sure
does complicate things and is something other debuggers don't have to
Definitely does complicate things just a smidge I suppose it's too late
to require that Xcode uses lldb in a more disciplined single-threaded