Locking issues on windows

I'm trying to update the Windows branch to the latest and greatest and found these locking issues (not sure if they're relevant for posix too):

When I attach a process (I only use the gdb remote) the first even I get is "stopped" which tries to unlock m_private_run_lock, however this one is never locked in the first place. Windows' writelock doesn't appreciate that; as a workaround I added a
m_private_run_lock.WriteLock(); in Process' constructor, which seems to fix that.

The second issue occurs when when trying to cause a "Stop" when it's already paused on internal breakpoints; for me this is during slow symbol load. When happens is that the loading (which happens from within Process::ShouldBroadcastEvent) resumes it, then the process exits properly (triggers the ShouldBroadcastEvent again) however:

ProcessEventData::DoOnRemoval(lldb_private::Event * event_ptr)
called by Listener::FindNextEventInternal.

The resume call is in this condition:
   if (state != eStateRunning)

Changing that to:
lldb::StateType state = m_process_sp->GetPrivateState();
if (state != eStateRunning && state != eStateCrashed && state != eStateDetached && state != eStateExited)

Seems to fix it, as there's no reason to try & resume a process that's not running in the first place (and since exiting doesn't unlock a process this causes a deadlock)

The last issue is this:
void * Process::RunPrivateStateThread ()
does : m_public_run_lock.WriteUnlock(); when it's done. The Finalize also unlocks that same lock, which Windows crashes on.
commenting that out and it seems to work stable.

Anyone see any issues in all of this? (might make sense to apply this to trunk too; it's never good to have unbalanced lock/unlocks)

I'm trying to update the Windows branch to the latest and greatest and found these locking issues (not sure if they're relevant for posix too):

When I attach a process (I only use the gdb remote) the first even I get is "stopped" which tries to unlock m_private_run_lock, however this one is never locked in the first place. Windows' writelock doesn't appreciate that; as a workaround I added a
m_private_run_lock.WriteLock(); in Process' constructor, which seems to fix that.

We need to fix this better by locking the private run lock when attaching if all goes well.

The second issue occurs when when trying to cause a "Stop" when it's already paused on internal breakpoints; for me this is during slow symbol load. When happens is that the loading (which happens from within Process::ShouldBroadcastEvent) resumes it, then the process exits properly (triggers the ShouldBroadcastEvent again) however:

ProcessEventData::DoOnRemoval(lldb_private::Event * event_ptr)
called by Listener::FindNextEventInternal.

The resume call is in this condition:
if (state != eStateRunning)

Where is the above "if (state != eStateRunning)"?

Changing that to:
lldb::StateType state = m_process_sp->GetPrivateState();
if (state != eStateRunning && state != eStateCrashed && state != eStateDetached && state != eStateExited)

There are functions that indicate if the function is stopped or running. We should use those functions. (search for "StateIsStopped").

Seems to fix it, as there's no reason to try & resume a process that's not running in the first place (and since exiting doesn't unlock a process this causes a deadlock)

The last issue is this:
void * Process::RunPrivateStateThread ()
does : m_public_run_lock.WriteUnlock(); when it's done. The Finalize also unlocks that same lock, which Windows crashes on.
commenting that out and it seems to work stable.

We need to build in some smarts into our Read/Write locking class to know if the read/write lock is taken and only unlock if the corresponding read/write lock is locked. I will make this change today.

Op 17-4-2013 18:47, Greg Clayton schreef:

I'm trying to update the Windows branch to the latest and greatest and found these locking issues (not sure if they're relevant for posix too):

When I attach a process (I only use the gdb remote) the first even I get is "stopped" which tries to unlock m_private_run_lock, however this one is never locked in the first place. Windows' writelock doesn't appreciate that; as a workaround I added a
m_private_run_lock.WriteLock(); in Process' constructor, which seems to fix that.

We need to fix this better by locking the private run lock when attaching if all goes well.

The second issue occurs when when trying to cause a "Stop" when it's already paused on internal breakpoints; for me this is during slow symbol load. When happens is that the loading (which happens from within Process::ShouldBroadcastEvent) resumes it, then the process exits properly (triggers the ShouldBroadcastEvent again) however:

ProcessEventData::DoOnRemoval(lldb_private::Event * event_ptr)
called by Listener::FindNextEventInternal.

The resume call is in this condition:
  if (state != eStateRunning)

Where is the above "if (state != eStateRunning)"?

Process.cpp

void
Process::ProcessEventData::DoOnRemoval (Event *event_ptr)

fwiw the changes I mentioned make it balanced.

I am seeing a locking issue in Ubuntu too. I havent debugged to
confirm whether it is same as the Windows issue.

How to reproduce the issue:
1) /trunk/lldb hello_world_program
2) (lldb) run
3) Hang

Attached gdb and got the following backtrace.

(gdb) bt
#0 0x00007ffff344c53d in pthread_rwlock_wrlock () from
/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff4466135 in lldb_private::ReadWriteLock::WriteLock (this=0x6bfe50)
    at /dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/Target/../../include/lldb/Host/ReadWriteLock.h:77
#2 0x00007ffff445a07a in lldb_private::Process::PrivateResume
(this=0x6bf500) at
/dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/Target/Process.cpp:3201
#3 0x00007ffff4459f18 in lldb_private::Process::Resume
(this=0x6bf500) at
/dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/Target/Process.cpp:1657
#4 0x00007ffff3f20fac in CommandObjectProcessLaunch::DoExecute
(this=0x6383d0, launch_args=..., result=...)
    at /dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/Commands/CommandObjectProcess.cpp:308
#5 0x00007ffff4166fd0 in lldb_private::CommandObjectParsed::Execute
(this=0x6383d0, args_string=0x7ffff2d264d8 "", result=...)
    at /dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/Interpreter/CommandObject.cpp:969
#6 0x00007ffff4159896 in
lldb_private::CommandInterpreter::HandleCommand (this=0x6285a0,
command_line=0x7fffe4000e58 "process launch",
    lazy_add_to_history=lldb_private::eLazyBoolYes, result=...,
override_context=0x0, repeat_on_empty_command=true,
no_context_switching=false)
    at /dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/Interpreter/CommandInterpreter.cpp:1810
#7 0x00007ffff3e5b714 in lldb::SBCommandInterpreter::HandleCommand
(this=0x7fffffffb600, command_line=0x7fffe4000e58 "process launch",
result=...,
    add_to_history=true) at
/dbc/pa-dbc1117/jacobs/llvm/tools/lldb/source/API/SBCommandInterpreter.cpp:122
#8 0x000000000040d286 in Driver::HandleIOEvent (this=0x7fffffffe2b0,
event=...) at /dbc/pa-dbc1117/jacobs/llvm/tools/lldb/tools/driver/Driver.cpp:1072
#9 0x000000000040ebaa in Driver::MainLoop (this=0x7fffffffe2b0) at
/dbc/pa-dbc1117/jacobs/llvm/tools/lldb/tools/driver/Driver.cpp:1537
#10 0x000000000040f37b in main (argc=2, argv=0x7fffffffe4b8,
envp=0x7fffffffe4d0) at
/dbc/pa-dbc1117/jacobs/llvm/tools/lldb/tools/driver/Driver.cpp:1702

Carlo, awesome work!

Are your fixes on the Windows branch? If not, could you provide a patch
with your changes? I'd love to try them out to see if it fixes the hangs
in question on Linux. Multiple people seem affected by the problem.

Dan

Op 17-4-2013 19:34, Malea, Daniel schreef:

Carlo, awesome work!

Are your fixes on the Windows branch? If not, could you provide a patch
with your changes? I'd love to try them out to see if it fixes the hangs
in question on Linux. Multiple people seem affected by the problem.

They are on the windows branch but part of the same commit. I can extract them though. All in process, note: not a proper patch but the line info should be good, had to manually edit it:

Index: C:/Projects/oxygene-nougat-llvm/lldb/source/Target/Process.cpp

So, it looks like the locks are going awry in several places.

Carlo, I can confirm that your fix resolves some of the hangs that
everyone is experiencing but not all. Specifically, the
TestInlineStepping.py seems to still deadlock on the acquisition of one of
the Process (public) locks during a Resume(). That said, toggling the lock
in the constructor doesn't seem like a sound workaround..

Greg,

179329 is the commit that seems to have made things go all sideways. After
that commit, no Debian users can install a package that doesn't deadlock
on startup, we have no visibility into the testing status on the
buildbots, and the commit itself seems scary as it exposes a reference to
one of two internal locks to users based on what thread they're running in.

After briefly studying the Process class, I'm a little worried about the
complexity of the design. Could you explain the reason 2 different R/W
locks are needed? I understand why one R/W lock makes sense in the class,
but two seem overly complicated.

You mentioned that you'll improve the R/W (scoped?) locking classes.. Any
reason to not use boost (or some other C++11 library) for this? If we do
have to roll our own in LLDB, the lack of tests is worrisome.

If the improvements to the R/W locker classes you've got in progress don't
allow the test suite to run to completion, could you please revert 179329
until we have something that allows us to run the tests? Lots of patches
are backed up atm due to the LLVM practice of not committing on top of a
broken trunk.

Dan

PS. The hanging buildbots to watch are:

http://lab.llvm.org:8011/builders/lldb-x86_64-darwin11/builds/1890
http://lab.llvm.org:8011/builders/lldb-x86_64-debian-clang

http://lab.llvm.org:8011/builders/lldb-x86_64-linux

So, it looks like the locks are going awry in several places.

Carlo, I can confirm that your fix resolves some of the hangs that
everyone is experiencing but not all. Specifically, the
TestInlineStepping.py seems to still deadlock on the acquisition of one of
the Process (public) locks during a Resume(). That said, toggling the lock
in the constructor doesn't seem like a sound workaround..

Agreed, this shouldn't be the fix we use. We should track when we are doing an attach and lock it when the attach starts.

Greg,

179329 is the commit that seems to have made things go all sideways. After
that commit, no Debian users can install a package that doesn't deadlock
on startup, we have no visibility into the testing status on the
buildbots, and the commit itself seems scary as it exposes a reference to
one of two internal locks to users based on what thread they're running in.

After briefly studying the Process class, I'm a little worried about the
complexity of the design. Could you explain the reason 2 different R/W
locks are needed? I understand why one R/W lock makes sense in the class,
but two seem overly complicated.

We currently need to avoid doing things while the process is running. There are two cases we care about:
- the public state tracking when we are running
- the private state tracking when we are running

The main reason we need this is the private process state thread handles some complex things for us when it is handling the process. One example is the OperatingSystemPlugins (like OperatingSystemPython) where it may get called from the private process state thread to update the thread list. A common thing to do in the OperatingSystemPython is to read a global list in the kernel that contains the thread list and follow a linked list. If we run and need to determine if we should stop, we often need to update our thread list. This update will happen on the private process thread. So the flow goes like this:

The old problem was:

1 - (main thread) user says "step over"
2 - (main thread) initiates the process control and the public process write lock is taken
3 - (private process thread) run and stop after each "trace" while doing the single step
4 - (private process thread) updates the thread list which calls into the OperatingSystemPython which wants to use the public LLDB API
5 - (private process thread) goto 3 until step is done

The problem is step 4 fails because the OperatingSystemPython used lldb::SB API's that require the public process write lock in order to evaluate expressions and use anything that requires that the process is stopped.

To get around this we introduced the private read/write process lock to track when the process state thread is stopped so we can actually use the public APIs. So the flow is now:

1 - (main thread) user says "step over"
2 - (main thread) initiates the process control and the public process write lock is taken
3 - (private process thread) lock private process write lock
4 - (private process thread) run and stop after each "trace" while doing the single step
5 - (private process thread) unlock private process write lock
6 - (private process thread) updates the thread list which calls into the OperatingSystemPython which wants to use the public LLDB API
7 - (private process thread) goto 3 until the step is done

This lets us use the public APIs by allowing the private process state thread to lock a different lock and manage when the private state thread is locked.

This is a problem for other things that use python during the lifetime of the process. For instance, we want to eventually have some python code that gets called when a process is about the resume, or just after it stops. We would like to simplify the code for breakpoints that have commands that get run when the breakpoint is hit (right now we defer any actions until the user consumes the public stop event).

You mentioned that you'll improve the R/W (scoped?) locking classes.. Any
reason to not use boost (or some other C++11 library) for this? If we do
have to roll our own in LLDB, the lack of tests is worrisome.

I am not a big fan of boost as it bloats the C++ program debug info to be so large that it often makes debugging the boost programs very difficult due to the shear size of the debug info. Most of what we cared about from boost is now in C++11. Even if we did use boost, would it actually check to see if the lock was taken prior to trying to release it? The APIs on read/write locks are dead simple, so I don't see this is a reason to use boost.

If the improvements to the R/W locker classes you've got in progress don't
allow the test suite to run to completion, could you please revert 179329
until we have something that allows us to run the tests? Lots of patches
are backed up atm due to the LLVM practice of not committing on top of a
broken trunk.

Yes, I am trying to get us access to a linux machine that we can all use here at Apple so we can debug and fix the things we break.

I spent a large part of the weekend trying to get Ubuntu 12.04 (using Parallels Desktop (virtualization software)) building llvm/clang/lldb so that I can fix these issues. I wasn't able to get clang to build as the link stage would always get killed with a signal 9. Not sure why, maybe the virtualization software was running out of RAM or resources. The build instructions up on the web for Linux don't actually work on a fresh install of Ubuntu. I needed to install new packages for tools essentials and also install gcc-4.7 and try to figure out how to get LLVM to use these compilers to get things to build with C++11, otherwise the build wouldn't even configure with gcc-4.6 due to the --enable-libcpp quickly stating of of the options wasn't supported by the compiler.

So the linux builds are frustrating to try and get working, but I do want everyone to know that I am trying.

What compiler do you build with on linux? Are there packages to install for a suitable version of clang? I finally gave up after many many hours of trying to get lldb to build.

Greg

Are there packages to install for a suitable version of clang?

FYI Greg, you can download a freshly built packages for Debian or Ubuntu using the instructions at http://llvm.org/apt/. Basically, for Ubuntu 12.04, you can just add the following line to /etc/apt/sources.list:

deb http://llvm.org/apt/precise/ llvm-toolchain-precise main

and then run

sudo apt-get install clang-3.3

That should give you a tool-chain to build llvm, clang and lldb from source. Thanks in advance for all your efforts to get a Linux machine in operation. Say, is this something that you plan to include in pre-commit testing when operational?

could you please revert 179329 until we have something that allows us to run the tests?

With your fixes in r179378, the test suite runs to completion on one of my test machines. However, both lldb buildbots for Linux continue to timeout when running the test suite. The buildbots are clearly helpful to identify the commits that introduce new regressions. We’ll look into the buildbots more tomorrow.

Let us know if you have any concerns with reverting the two lock-related commits (if needed) until we have something more stable.

Thanks!

  • Ashok

Ø With your fixes in r179378, the test suite runs to completion on one of my test machines.

This worked consistently when using ssh over a slower VPN connection. The suite hangs consistently when working at the test machine. I see that the Darwin buildbot also reports hangs.

  • Ashok

> Are there packages to install for a suitable version of clang?

FYI Greg, you can download a freshly built packages for Debian or Ubuntu using the instructions at http://llvm.org/apt/. Basically, for Ubuntu 12.04, you can just add the following line to /etc/apt/sources.list:
      deb http://llvm.org/apt/precise/ llvm-toolchain-precise main
and then run
      sudo apt-get install clang-3.3

Great, I will try this today.

That should give you a tool-chain to build llvm, clang and lldb from source. Thanks in advance for all your efforts to get a Linux machine in operation. Say, is this something that you plan to include in pre-commit testing when operational?

Not until we get a real machine we can use. Parallels desktop emulated linux is just too slow right now, but I am going to try and get a dedicated machine for our group.

> could you please revert 179329 until we have something that allows us to run the tests?

With your fixes in r179378, the test suite runs to completion on one of my test machines. However, both lldb buildbots for Linux continue to timeout when running the test suite. The buildbots are clearly helpful to identify the commits that introduce new regressions. We’ll look into the buildbots more tomorrow.

Let us know if you have any concerns with reverting the two lock-related commits (if needed) until we have something more stable.

No worries, this needed to be done to get things rolling again.

I am going to put the 179329 back in, but I will #ifdef __APPLE__ it out so it doesn't affect other platforms until we can get the kinks worked out. I will add more DEBUG build assertions to make sure that no errors are returned from these functions to hopefully get all of the ref counting done correctly.

Hi Greg, thanks for the explanation.

One part I am not sure of is where OperatingSystemPython uses the SB
APIs.. You're referring to user-code, right? I am not seeing anything in
the OperatingSystemPython.[cpp|h] (even the update thread list function)
that uses SB* stuff. To confirm my understanding, it seems that "regular"
python code (like the test cases that drive the debugger) which uses SB*
ends up locking the "public" locks whereas "callback" code that I imagine
gets invoked via OSPython ends up locking the "private" locks -- is this
really true? Are these two use-cases really running on different (main vs.
non-main) threads?

If so, once the locks are acquired, don't both of these use-cases end up
using the *same* shared Process data? Is there any coupling between the
data that the locks are protecting and the locks themselves? Based on the
Process::GetRunLock() it seems like an arbitrary lock is chosen to protect
all of Process' private data..

If the public/public run locks *are* coupled to the public/private states
of the process respectively, why is it that the both write locks need to
be kept for long periods of time? I can understand the approach of having
an internal (private) "run" write lock being held while the process is
running, but I'm having trouble conceptualizing why the public run lock
needs to be held for similar durations. It seems any synchronization we do
for the public state should be very short-held "critical section" type
locks while functions query the last-known process state, or just return
an error because the process is running. Any SB function that results in a
public lock being held for longer than the duration of that function seems
like a bug in the SB code -- does that make sense or am I way off?

Another question that arises is why does python code need to run on the
same main internal thread? I would see SB* callers as users of the Process
class and as such they should only be allowed to touch the "public" state
and never run on the internal thread. I would imagine that LLDB would
spawn new threads for callbacks instead of reusing internal monitoring
threads. Is it not possible to have both "driving" code and "callback"
code go through exactly the same code path the SB* layer as well as the
Process class? My thinking is that both of those use cases should be
treated the same and interact with only the "public" Process state, while
an internal thread does the lower level Process work of launching things
and responding to events from the process, etc.

Another part that complicates reviewing this code is the external locking
-- wouldn't it be way better to encapsulate all the locking/unlocking
inside the Process class? I like code that keeps the lock code close to
the data they protect; what's the benefit of having SB* API classes
acquire and maintain internal Process locks?

I agree completely about avoiding Boost-bloat; the concern I had there was
that there's no tests for any of the R/W locking support classes.. I
concede that on further studies of the Boost APIs, they won't really solve
the double-locking or double-unlocking problem you're trying to solve --
although I'd prefer to see this whole Process locking thing redesigned
with internal rather than external locking. I think this will help nail
down exactly what cases are causing the double-locking thing to deadlock.

Thanks for trying to get a Linux machine up and running; we definitely
appreciate it!!

I have never tried getting a build working in Parallels though, so you
might have a better time with Bootcamp. The segfault during linking
definitely sounds like an out-of-memory situation. As for Ubuntu, 12.04 is
a little out-dated now; I recommend using 12.10 as that has GCC 4.7 which
is compatible with both configure and cmake-based builds. If you do need
to use 12.04 and you're stuck on GCC 4.6, I recommend using cmake instead
of configure as that system is smart enough to pass the right C++11 flag
(-std=c++0x) to older versions of GCC as opposed to ./configure
--enable-cxx11 which blindly passes in (-std=c++11). Also, cmake+ninja
builds LLDB much faster (by about 1min) than the configure based system.

As Ashok mentioned, if you'd like to use clang, llvm.org/apt is the place
to get bleeding edge packages for Ubuntu and Debian. Personally, I use
both GCC and Clang to keep things sane across the board. Both buildbots
seem to build the project right now.a

Cheers,
Dan

Hi Greg, thanks for the explanation.

One part I am not sure of is where OperatingSystemPython uses the SB
APIs.. You're referring to user-code, right?

No, a python class gets instantiated when a process starts up and it gets a chance to run python code in order to make new threads lists up and relay them back to LLDB. The python code must use the public SB API.

I am not seeing anything in
the OperatingSystemPython.[cpp|h] (even the update thread list function)
that uses SB* stuff.

Extactly, see above.

To confirm my understanding, it seems that "regular"
python code (like the test cases that drive the debugger) which uses SB*
ends up locking the "public" locks whereas "callback" code that I imagine
gets invoked via OSPython ends up locking the "private" locks -- is this
really true?

Only when the private state thread calls into the python code which then wants to use the public API to lookup variables, evaluate expressions, etc in order to make the new thread list.

Are these two use-cases really running on different (main vs.
non-main) threads?

When the private state thread is doing stuff, the process run lock is taken. Any public API access that requires a stopped process won't work when being done from the private state thread since the process is implementing some sort of run.

If so, once the locks are acquired, don't both of these use-cases end up
using the *same* shared Process data? Is there any coupling between the
data that the locks are protecting and the locks themselves? Based on the
Process::GetRunLock() it seems like an arbitrary lock is chosen to protect
all of Process' private data.

No, the read/write locks don't protect the data, they just protect access to the process to make sure it stays stopped while clients access things, or make sure clients that want to read, stay locked out while process control is going on. See below for more detail.

If the public/public run locks *are* coupled to the public/private states
of the process respectively, why is it that the both write locks need to
be kept for long periods of time? I can understand the approach of having
an internal (private) "run" write lock being held while the process is
running, but I'm having trouble conceptualizing why the public run lock
needs to be held for similar durations. It seems any synchronization we do
for the public state should be very short-held "critical section" type
locks while functions query the last-known process state, or just return
an error because the process is running. Any SB function that results in a
public lock being held for longer than the duration of that function seems
like a bug in the SB code -- does that make sense or am I way off?

Since we do async process control, so when a user says "step over", this might result in the process being stopped and started 100 times. Before doing any process control, we take the public process run lock by acquiring the write lock. This lets any people currently in the process with the public run lock that was acquired for reading, to get out. Once the public run lock is taken, we keep it until the process run control action ("step over") is complete. We must lock down access to the process for the entire duration because it wouldn't be a good time for someone to come in and ask for the stack frames of a thread in between run 56 and 57. It stack frames would be meaningless from a public perspective. Internally though, for any blessed sections of code (like OperatingSystemPython) which we know will need access to the public API from the private state thread, we allow them to not be locked out by using the private run lock.

Another question that arises is why does python code need to run on the
same main internal thread?

Any thread other than the private state thread would use the public lock. Any code run from the private state thread itself, since it controls the process, can be guaranteed that the process is and will stay stopped while accessing the data. So just code run from the private state thread is special for anything that goes out to any code that would need to use the public SB API.

I would see SB* callers as users of the Process
class and as such they should only be allowed to touch the "public" state
and never run on the internal thread. I would imagine that LLDB would
spawn new threads for callbacks instead of reusing internal monitoring
threads.

If we did this, we would end up locking ourselves out from being able to use the code. We don't want the people getting in and being able to do stuff on the process in between stops when we are implementing a "step over". If we make new threads for callbacks, then we would either be locked out from accessing the process, or we would need to let clients access the process in between intermediate steps of the current process run control, which is not what anyone wants. (if a client says "process.Continue()" immediately followed by code that gets stack frames for all threads, we don't want stack frames from the middle of a single step.

Is it not possible to have both "driving" code and "callback"
code go through exactly the same code path the SB* layer as well as the
Process class?

You mean using the same locks? I am confused by your question. All code uses the same public API through the SB* layer. The SB layer locks you out when it is dangerous to do things while the process is running. Many of the issues we ran into stemmed from Xcode accessing the SB* API from multiple threads and all of the locks (the target lock and the process read/write (stopped/running) locks are needed.

My thinking is that both of those use cases should be
treated the same and interact with only the "public" Process state, while
an internal thread does the lower level Process work of launching things
and responding to events from the process, etc.

Let me know what you think after reading the above explanation. The locks are needed and necessary and we haven't been able to come up with another way to control access to the things we need to do from the various threads. We currently have two clients:
- public clients
- private process state thread clients

public clients get locked out, as they should, when the process is running (continue, or stepping) so we don't get useless answers.

Do you really want to evaluate the value for "x" when we are in the middle of a single step? No you want to get an error stating "error: process is running".
Do you really want stack frames for all threads in a process when the process is in the middle of a single step? No you want to get an error stating "error: process is running".

The private clients though are limited to OperatingSystemPython right now, but could be expanded to breakpoint actions in the near future. A breakpoint might say it wants to have python run when the breakpoint gets hit and it would be great to not have to do this on another thread just so we can see that "x != 12", or "rax != 123".

Another thing we have to watch out for is that there are limitations on things that can be done on the private state thread. For example you wouldn't want to try to evaluate an expression (run the target) while you are currently running the target for another expression. There is special code for this for one special case (calling "mmap" to allocate memory by manually calling a function without using the expression parser), but those don't use any of the special locking we have implemented.

Another part that complicates reviewing this code is the external locking
-- wouldn't it be way better to encapsulate all the locking/unlocking
inside the Process class? I like code that keeps the lock code close to
the data they protect; what's the benefit of having SB* API classes
acquire and maintain internal Process locks?

You might need to do multiple things. Like to get a backtrace for all threads. Do you want to ask the process for thread at index 0, then backtrace it. Then another thread resumes the process and it stops somewhere else. Now you ask for thread at index 1 and you get a different thread because the thread list has changed. Not a great experience. This is why we externally lock.

I agree completely about avoiding Boost-bloat; the concern I had there was
that there's no tests for any of the R/W locking support classes.. I
concede that on further studies of the Boost APIs, they won't really solve
the double-locking or double-unlocking problem you're trying to solve --
although I'd prefer to see this whole Process locking thing redesigned
with internal rather than external locking. I think this will help nail
down exactly what cases are causing the double-locking thing to deadlock.

This works great for single access functions, but falls down for complex actions (like backtracking all threads).

We are open to any ideas you have for robust locking. We have tossed around a few ideas for the process control, but haven't come up with one that works better than what we have. Having a multi-threaded API sure does complicate things and is something other debuggers don't have to worry about.

Thanks for trying to get a Linux machine up and running; we definitely
appreciate it!!

I have never tried getting a build working in Parallels though, so you
might have a better time with Bootcamp. The segfault during linking
definitely sounds like an out-of-memory situation. As for Ubuntu, 12.04 is
a little out-dated now; I recommend using 12.10 as that has GCC 4.7 which
is compatible with both configure and cmake-based builds.

Nice! I will install that and give it a go!

If you do need
to use 12.04 and you're stuck on GCC 4.6, I recommend using cmake instead
of configure as that system is smart enough to pass the right C++11 flag
(-std=c++0x) to older versions of GCC as opposed to ./configure
--enable-cxx11 which blindly passes in (-std=c++11). Also, cmake+ninja
builds LLDB much faster (by about 1min) than the configure based system.

Thanks for the info. I will let you know how 12.10 works out.

As Ashok mentioned, if you'd like to use clang, llvm.org/apt is the place
to get bleeding edge packages for Ubuntu and Debian. Personally, I use
both GCC and Clang to keep things sane across the board. Both buildbots
seem to build the project right now.a

If 12.10 fails, I will try clang.

Let me know what you think of the new explanations above and if you have any ideas on controlling the API in a different better way.

Greg

Hi Greg,

Thanks again for the detailed explanation of this dilly of a pickle. I
thought about this problem a little more.. It would be really nice to
resolve this soon, since the different locking behaviour on MacOSX vs
other platforms is something I think we all want to excise as quickly as
possible.

My thoughts are inline below:

No, a python class gets instantiated when a process starts up and it gets
a chance to run python code in order to make new threads lists up and
relay them back to LLDB. The python code must use the public SB API.

Oh, that's somewhat surprising. Where is this class? I'm curious about
what it does that's done in Python.. Aren't you worried about the overhead
the Global Interpreter Lock might add to normal operations like step-over?
I have no data to back that up, but putting python stuff on the critical
path seems like it can't be good for performance. Also, not that it's
important to us, but does this also mean that building a working LLDB
without Python support is no longer an option?

To confirm my understanding, it seems that "regular"
python code (like the test cases that drive the debugger) which uses SB*
ends up locking the "public" locks whereas "callback" code that I
imagine
gets invoked via OSPython ends up locking the "private" locks -- is this
really true?

Only when the private state thread calls into the python code which then
wants to use the public API to lookup variables, evaluate expressions,
etc in order to make the new thread list.

Are these two use-cases really running on different (main vs.
non-main) threads?

When the private state thread is doing stuff, the process run lock is
taken. Any public API access that requires a stopped process won't work
when being done from the private state thread since the process is
implementing some sort of run.

Right-O. This makes sense.

If so, once the locks are acquired, don't both of these use-cases end up
using the *same* shared Process data? Is there any coupling between the
data that the locks are protecting and the locks themselves? Based on
the
Process::GetRunLock() it seems like an arbitrary lock is chosen to
protect
all of Process' private data.

No, the read/write locks don't protect the data, they just protect access
to the process to make sure it stays stopped while clients access things,
or make sure clients that want to read, stay locked out while process
control is going on. See below for more detail.

OK, here's where I think things start going sideways. It might be possible
to accomplish the synchronization that's needed between the public clients
and internal callers with two R/W locks, but I'm having trouble seeing how
they are to be used in tandem. I'll attempt to explain (what I think is) a
simpler approach below.

If the public/public run locks *are* coupled to the public/private
states
of the process respectively, why is it that the both write locks need to
be kept for long periods of time? I can understand the approach of
having
an internal (private) "run" write lock being held while the process is
running, but I'm having trouble conceptualizing why the public run lock
needs to be held for similar durations. It seems any synchronization we
do
for the public state should be very short-held "critical section" type
locks while functions query the last-known process state, or just return
an error because the process is running. Any SB function that results
in a
public lock being held for longer than the duration of that function
seems
like a bug in the SB code -- does that make sense or am I way off?

Since we do async process control, so when a user says "step over", this
might result in the process being stopped and started 100 times. Before
doing any process control, we take the public process run lock by
acquiring the write lock. This lets any people currently in the process
with the public run lock that was acquired for reading, to get out. Once
the public run lock is taken, we keep it until the process run control
action ("step over") is complete. We must lock down access to the process
for the entire duration because it wouldn't be a good time for someone to
come in and ask for the stack frames of a thread in between run 56 and
57. It stack frames would be meaningless from a public perspective.
Internally though, for any blessed sections of code (like
OperatingSystemPython) which we know will need access to the public API
from the private state thread, we allow them to not be locked out by
using the private run lock.

Hmm, OK. So, in the two R/W lock approach, there's places where both locks
need to be write-acquired though, right? I'm not sure I can think of all
the other interactions between the locks though. It might be equivalent to
the thing I propose below -- can you confirm if this is true or not?

If we make new threads for callbacks, then we would either be locked out
from accessing the process, or we would need to let clients access the
process in between intermediate steps of the current process run
control, which is not what anyone wants.

Agreed that those two things are bad, but I think a third option exists:

What we need is akin to a condition variable (that knows how to relinquish
a lock during wait) so that we don't "lock ourselves out" and a shared
lock to protect the "public" interfaces that do anything with the process'
state, with some extra smartness about "long running" operations that
require nobody mess around with the process state.

Here's what I'm thinking Process needs to have in order to solve this, in
a little more concrete terms:
1. One bool (m_state_in_transition) -- true when we have a long-running
operation in progress
2. One Condition Variable bool (m_callback_in_progress) -- this remains
true when a python thread that has SB access is launched
3. One Mutex (m_lock)* -- to protect every function inside process that
reads/writes state

The rules of the game are:

- Every function in Process needs to acquire the m_lock before any
reads/writes happen. This is somewhat shitty because multiple reads would
be serialized, but more on this later*
- Any (non-const) function in Process (that modifies state like Resume(),
Step(), etc) needs to error out if m_state_in_transition == true. These
are forbidden in callbacks.
- Any const function in Process (that reads state but does not change it)
needs to acquire m_lock
- Any time an internal Process function needs to invoke some code that has
access to SB APIs, the procedure is:
-- set m_state_in_transition AND m_callback_in_progress to true
-- spawn a new thread that¹s going to use the public APIs
-- wait for m_callback_in_progress to turn false WITH m_mutex,
relinquishing the lock so others can enter Process internals, BUT
m_state_in_transition prevents any badness.
- When any "long-running" function exits, m_state_in_transition is set to
false.

Aside: As an implementation detail, the first and the last steps can
probably be combined in a single scoped object.

In concert with the above approach, any "callback" (or, 'blessed' I think
you call it) thread that has access to SB but may run during one of the
Process internals needs to set m_callback_in_progress to false when done;
thereby joining itself with the Process internal function that invoked it.

* Based on the existing design, it seems you want multiple reads to happen
concurrently (for Xcode performance reasons). To allow multiple readers to
access Process at once, upgrade the m_lock from a run-of-the-mill Mutex to
a R/W lock. The problem with this is that we need the condition variable
to play nice (i.e. unlock the R/W lock during wait().) Pthread doesn't
support this with the rw_lock type as-is, but I'm sure it can be done if
we implement ReadWriteLock the "hard" way with lower-level primitives. See
http://stackoverflow.com/questions/2699993/using-pthread-condition-variable
-with-rwlock for inspiration.

The natural benefit of the above approach is that all the
locking/unlocking magic happen inside Process, and SB has to have 0
knowledge about Process locks.

Is it not possible to have both "driving" code and "callback"
code go through exactly the same code path the SB* layer as well as the
Process class?

You mean using the same locks? I am confused by your question. All code
uses the same public API through the SB* layer. The SB layer locks you
out when it is dangerous to do things while the process is running. Many
of the issues we ran into stemmed from Xcode accessing the SB* API from
multiple threads and all of the locks (the target lock and the process
read/write (stopped/running) locks are needed.

Sorry I was not too clear -- basically I'm worried about the complexity of
having two discrete R/W locks here, and separate behaviour based on
internal/external threads. Since processes have one state, it "feels" like
there should be one lock.

Let me know what you think after reading the above explanation. The locks
are needed and necessary and we haven't been able to come up with another
way to control access to the things we need to do from the various
threads. We currently have two clients:
- public clients
- private process state thread clients

public clients get locked out, as they should, when the process is
running (continue, or stepping) so we don't get useless answers.

Do you really want to evaluate the value for "x" when we are in the
middle of a single step? No you want to get an error stating "error:
process is running".

Right, expression evaluation would be one of those "non-const" cases that
should error out if Process m_in_transition is true.

Do you really want stack frames for all threads in a process when the
process is in the middle of a single step? No you want to get an error
stating "error: process is running".

Right-O.

The private clients though are limited to OperatingSystemPython right
now, but could be expanded to breakpoint actions in the near future. A
breakpoint might say it wants to have python run when the breakpoint gets
hit and it would be great to not have to do this on another thread just
so we can see that "x != 12", or "rax != 123".

I'm not sure another thread for the callback will add a lot of overhead
compared to the overhead of the Python GIL. But I'm open here -- maybe
there is a way to do the condition variable thing (or equivalent) without
spawning another thread. We'd need a mutex-unlock-and-invoke-function sort
of functionality, but the general principle would be the same; when
non-Process code is running, all locks should be released, but the Process
object should be in a state whereby it cannot be changed (I.e. certain
functions are forbidden).

Another thing we have to watch out for is that there are limitations on
things that can be done on the private state thread. For example you
wouldn't want to try to evaluate an expression (run the target) while you
are currently running the target for another expression. There is special
code for this for one special case (calling "mmap" to allocate memory by
manually calling a function without using the expression parser), but
those don't use any of the special locking we have implemented.

Since evaluating an expression can modify the process (for example
'expression -- i++') it should be treated as non-const unless some IR
interpreter can guarantee otherwise (in a future optimization utopia.) It
seems safe to turn the process m_in_transition to be true during
evaluation of any expression, thereby preventing anything else from
modifying or reading from Process. Of course, internally, the
expression-evaluation would need to have it's own UncheckedResume()
functionality that resumes the inferior without checking m_in_transition
like the API Resume() would.

Another part that complicates reviewing this code is the external
locking
-- wouldn't it be way better to encapsulate all the locking/unlocking
inside the Process class? I like code that keeps the lock code close to
the data they protect; what's the benefit of having SB* API classes
acquire and maintain internal Process locks?

You might need to do multiple things. Like to get a backtrace for all
threads. Do you want to ask the process for thread at index 0, then
backtrace it. Then another thread resumes the process and it stops
somewhere else. Now you ask for thread at index 1 and you get a different
thread because the thread list has changed. Not a great experience. This
is why we externally lock.

Agreed; it would be a terrible bug.

I'd prefer to see this whole Process locking thing redesigned
with internal rather than external locking. I think this will help nail
down exactly what cases are causing the double-locking thing to
deadlock.

This works great for single access functions, but falls down for complex
actions (like backtracking all threads).

I don't think external locking is the only way to accomplish this. What's
wrong with a Process::BacktraceAllThreads() (or even
Process::BacktraceSomeThreads(ThreadIDs)) function that internally
acquires the run lock and does its business without fear of somebody
continuing the process in the meantime? In BacktraceAllThreads(), if the
process is in transition, that function should just fail. If the process
is stopped, any other non-const calls (to Resume() or
expression-evaluation) should block until BacktraceAllThreads() returns,
or error out. Glancing at the code that calls GetRunLock() in the SB
layer, it seems like it all belongs in the Process class. I understand the
hesitation to add more stuff to Process, since the class is already
getting kinda large, but I think we should worry about splitting it up
into more more bite-sized components after this deadlock business is
sorted out.

We are open to any ideas you have for robust locking. We have tossed
around a few ideas for the process control, but haven't come up with one
that works better than what we have. Having a multi-threaded API sure
does complicate things and is something other debuggers don't have to
worry about.

Definitely does complicate things just a smidge :wink: I suppose it's too late
to require that Xcode uses lldb in a more disciplined single-threaded
mode? :wink: j/k

Cheers,
Dan

I'm heading out now so I didn't read this in detail, I'll do so tomorrow, but I just want to clarify this one point.

It is not true that processes have one state. They have two, a public state, and a private state. When you do a user level "step-over" for instance, you are actually single-stepping the process multiple times get through the range (plus stepping in, running back out again, calling functions to figure out how to step through trampolines, etc...) So for many cases where to the outside world the process looks like it has just "been running" it has in fact stopped and started many times.

It is important that these private state changes not leak out to people who are waiting on state changed events at the Process level. The whole logic of the execution control depends on this distinction being carefully maintained.

That's why the notion of a "private run lock" and a "public run lock" seemed like a reasonable solution, since they mirror this basic construct within lldb's execution engine.

Jim

Just a quick note on this.
It is still feasible to build LLDB without Python. You just won’t get the feature of OS plugins (which are not mandatory - there is one included as an example in examples/python)
It is much like Python data formatters - you build without Python, you don’t get them, but the rest of LLDB still works fine.

Enrico Granata
:email: egranata@.com
✆ 27683

Ah, sorry on re-read, I realize I should have written "inferior" instead
of "process" in that paragraph. The private/public state thing makes
perfect sense, and I don't think would need to be changed. Private state
should not leak out, and might vary many times during the "non-const" /
"long-running" operation we're talking about, SB users would only see
public state before and after one of these operations long operations.

TL;DR: I'm not sure that a second R/W lock is the right synchronization
primitive to use for the internal lock. I think a condition variable in
tandem with the first lock is what we are looking for.

Cheerio,
Dan

Hi Greg,

Thanks again for the detailed explanation of this dilly of a pickle. I
thought about this problem a little more.. It would be really nice to
resolve this soon, since the different locking behaviour on MacOSX vs
other platforms is something I think we all want to excise as quickly as
possible.

Agreed.

My thoughts are inline below:

No, a python class gets instantiated when a process starts up and it gets
a chance to run python code in order to make new threads lists up and
relay them back to LLDB. The python code must use the public SB API.

Oh, that's somewhat surprising. Where is this class?

Clients must manually enable this when they want it via a setting:

(lldb) settings set target.process.python-os-plugin-path /path/to/module.py

Then it from this module it will load the class named "OperatingSystemPlugIn" and the process will instantiate this class with one instance per process that gets used in the OperatingSystemPython plug-in.

I'm curious about
what it does that's done in Python.. Aren't you worried about the overhead
the Global Interpreter Lock might add to normal operations like step-over?

This is used mainly for bare board debugging where you have a kernel that has threads that aren't all on core, and it is also used for kernel panic triage. The type of places this is used is not in normal user space debugging, and if it ever is, a native plug-in should be used to get the performance that is needed.

So no, I am not worried about any overhead.

The kernels around here change from build to build and from arch to arch so the flexibility of having a python plug-in that can be distributed with each new build allows the kernel team to have a very flexible solution.

I have no data to back that up, but putting python stuff on the critical
path seems like it can't be good for performance.

It isn't in the critical path for the clients that use it. The correct way for clients where it is in the critical path would be to write a native plug-in.

Also, not that it's
important to us, but does this also mean that building a working LLDB
without Python support is no longer an option?

Again, go native if you really care. If you have and can use python and need to be able to change your OS plug-in because you are in the process of developing a new OS, then use the python way so you can see all of the threads you care about.

To confirm my understanding, it seems that "regular"
python code (like the test cases that drive the debugger) which uses SB*
ends up locking the "public" locks whereas "callback" code that I
imagine
gets invoked via OSPython ends up locking the "private" locks -- is this
really true?

Only when the private state thread calls into the python code which then
wants to use the public API to lookup variables, evaluate expressions,
etc in order to make the new thread list.

Are these two use-cases really running on different (main vs.
non-main) threads?

When the private state thread is doing stuff, the process run lock is
taken. Any public API access that requires a stopped process won't work
when being done from the private state thread since the process is
implementing some sort of run.

Right-O. This makes sense.

If so, once the locks are acquired, don't both of these use-cases end up
using the *same* shared Process data? Is there any coupling between the
data that the locks are protecting and the locks themselves? Based on
the
Process::GetRunLock() it seems like an arbitrary lock is chosen to
protect
all of Process' private data.

No, the read/write locks don't protect the data, they just protect access
to the process to make sure it stays stopped while clients access things,
or make sure clients that want to read, stay locked out while process
control is going on. See below for more detail.

OK, here's where I think things start going sideways. It might be possible
to accomplish the synchronization that's needed between the public clients
and internal callers with two R/W locks, but I'm having trouble seeing how
they are to be used in tandem. I'll attempt to explain (what I think is) a
simpler approach below.

If the public/public run locks *are* coupled to the public/private
states
of the process respectively, why is it that the both write locks need to
be kept for long periods of time? I can understand the approach of
having
an internal (private) "run" write lock being held while the process is
running, but I'm having trouble conceptualizing why the public run lock
needs to be held for similar durations. It seems any synchronization we
do
for the public state should be very short-held "critical section" type
locks while functions query the last-known process state, or just return
an error because the process is running. Any SB function that results
in a
public lock being held for longer than the duration of that function
seems
like a bug in the SB code -- does that make sense or am I way off?

Since we do async process control, so when a user says "step over", this
might result in the process being stopped and started 100 times. Before
doing any process control, we take the public process run lock by
acquiring the write lock. This lets any people currently in the process
with the public run lock that was acquired for reading, to get out. Once
the public run lock is taken, we keep it until the process run control
action ("step over") is complete. We must lock down access to the process
for the entire duration because it wouldn't be a good time for someone to
come in and ask for the stack frames of a thread in between run 56 and
57. It stack frames would be meaningless from a public perspective.
Internally though, for any blessed sections of code (like
OperatingSystemPython) which we know will need access to the public API
from the private state thread, we allow them to not be locked out by
using the private run lock.

Hmm, OK. So, in the two R/W lock approach, there's places where both locks
need to be write-acquired though, right? I'm not sure I can think of all
the other interactions between the locks though. It might be equivalent to
the thing I propose below -- can you confirm if this is true or not?

It should always be the case that the public process run lock will have its write lock taken first, then the private process run lock would take it and release it as many times as needed while it controls the process, then it would release the private write lock, then the public write lock.

If we make new threads for callbacks, then we would either be locked out
from accessing the process, or we would need to let clients access the
process in between intermediate steps of the current process run
control, which is not what anyone wants.

Agreed that those two things are bad, but I think a third option exists:

What we need is akin to a condition variable (that knows how to relinquish
a lock during wait) so that we don't "lock ourselves out" and a shared
lock to protect the "public" interfaces that do anything with the process'
state, with some extra smartness about "long running" operations that
require nobody mess around with the process state.

Here's what I'm thinking Process needs to have in order to solve this, in
a little more concrete terms:
1. One bool (m_state_in_transition) -- true when we have a long-running
operation in progress
2. One Condition Variable bool (m_callback_in_progress) -- this remains
true when a python thread that has SB access is launched
3. One Mutex (m_lock)* -- to protect every function inside process that
reads/writes state

The rules of the game are:

- Every function in Process needs to acquire the m_lock before any
reads/writes happen. This is somewhat shitty because multiple reads would
be serialized, but more on this later*

Yes, this is why the use read/write locks right now. The "read" clients (want the process to stay stopped) can all access process as they want, the sole "write" client (wants to resume the process) can keep any "read" clients out.

- Any (non-const) function in Process (that modifies state like Resume(),
Step(), etc) needs to error out if m_state_in_transition == true. These
are forbidden in callbacks.

"const" really doesn't mean much with a multi-threaded async process API. So I really don't see how this would work. If you call process->Resume() it will cause some data to get scooted over to the private state thread and all the work is done from there.

- Any const function in Process (that reads state but does not change it)
needs to acquire m_lock

Again, see above "const" comment...

- Any time an internal Process function needs to invoke some code that has
access to SB APIs, the procedure is:
-- set m_state_in_transition AND m_callback_in_progress to true
-- spawn a new thread that¹s going to use the public APIs
-- wait for m_callback_in_progress to turn false WITH m_mutex,
relinquishing the lock so others can enter Process internals, BUT
m_state_in_transition prevents any badness.
- When any "long-running" function exits, m_state_in_transition is set to
false.

Yikes, I reall

Aside: As an implementation detail, the first and the last steps can
probably be combined in a single scoped object.

In concert with the above approach, any "callback" (or, 'blessed' I think
you call it) thread that has access to SB but may run during one of the
Process internals needs to set m_callback_in_progress to false when done;
thereby joining itself with the Process internal function that invoked it.

* Based on the existing design, it seems you want multiple reads to happen
concurrently (for Xcode performance reasons). To allow multiple readers to
access Process at once, upgrade the m_lock from a run-of-the-mill Mutex to
a R/W lock. The problem with this is that we need the condition variable
to play nice (i.e. unlock the R/W lock during wait().) Pthread doesn't
support this with the rw_lock type as-is, but I'm sure it can be done if
we implement ReadWriteLock the "hard" way with lower-level primitives. See
http://stackoverflow.com/questions/2699993/using-pthread-condition-variable
-with-rwlock for inspiration.

Yeah, I explored this option when we first did the read/write lock and as soon as you have more than one primitive object (mutex, condition, read/write lock, etc), you are asking for even more trouble.

The natural benefit of the above approach is that all the
locking/unlocking magic happen inside Process, and SB has to have 0
knowledge about Process locks.

So how would you solve the problem where you want to play with items in the process and keep the process from going anywhere? Like

SBProcess process = ...
SBThread thread;
uint32_t num_threads = process.GetNumThreads();
for (uint32_t i=0; i<num_threads; ++i)
{
  thread = process.GetThreadAtIndex(i);
}

In the above code we let process do all the locking we could end up having the process change our from underneath us as we iterate?

Another example:

SBFrame frame;
uint32_t num_frames = thread.GetNumFrames()
for (...)
{
    frame = thread.GetFrameAtIndex(i);
}

Is it not possible to have both "driving" code and "callback"
code go through exactly the same code path the SB* layer as well as the
Process class?

You mean using the same locks? I am confused by your question. All code
uses the same public API through the SB* layer. The SB layer locks you
out when it is dangerous to do things while the process is running. Many
of the issues we ran into stemmed from Xcode accessing the SB* API from
multiple threads and all of the locks (the target lock and the process
read/write (stopped/running) locks are needed.

Sorry I was not too clear -- basically I'm worried about the complexity of
having two discrete R/W locks here, and separate behaviour based on
internal/external threads.

As opposed to taking a condition variable, locking it, and doing two of three other things? I am not clear how the new scheme simplifies things yet.

Since processes have one state, it "feels" like
there should be one lock.

Let me know what you think after reading the above explanation. The locks
are needed and necessary and we haven't been able to come up with another
way to control access to the things we need to do from the various
threads. We currently have two clients:
- public clients
- private process state thread clients

public clients get locked out, as they should, when the process is
running (continue, or stepping) so we don't get useless answers.

Do you really want to evaluate the value for "x" when we are in the
middle of a single step? No you want to get an error stating "error:
process is running".

Right, expression evaluation would be one of those "non-const" cases that
should error out if Process m_in_transition is true.

I really don't understand the new scheme you are suggesting. What does m_in_transition mean? In transition between what and what?

Do you really want stack frames for all threads in a process when the
process is in the middle of a single step? No you want to get an error
stating "error: process is running".

Right-O.

The private clients though are limited to OperatingSystemPython right
now, but could be expanded to breakpoint actions in the near future. A
breakpoint might say it wants to have python run when the breakpoint gets
hit and it would be great to not have to do this on another thread just
so we can see that "x != 12", or "rax != 123".

I'm not sure another thread for the callback will add a lot of overhead
compared to the overhead of the Python GIL. But I'm open here -- maybe
there is a way to do the condition variable thing (or equivalent) without
spawning another thread. We'd need a mutex-unlock-and-invoke-function sort
of functionality, but the general principle would be the same; when
non-Process code is running, all locks should be released, but the Process
object should be in a state whereby it cannot be changed (I.e. certain
functions are forbidden).

Another thing we have to watch out for is that there are limitations on
things that can be done on the private state thread. For example you
wouldn't want to try to evaluate an expression (run the target) while you
are currently running the target for another expression. There is special
code for this for one special case (calling "mmap" to allocate memory by
manually calling a function without using the expression parser), but
those don't use any of the special locking we have implemented.

Since evaluating an expression can modify the process (for example
'expression -- i++') it should be treated as non-const unless some IR
interpreter can guarantee otherwise (in a future optimization utopia.) It
seems safe to turn the process m_in_transition to be true during
evaluation of any expression, thereby preventing anything else from
modifying or reading from Process. Of course, internally, the
expression-evaluation would need to have it's own UncheckedResume()
functionality that resumes the inferior without checking m_in_transition
like the API Resume() would.

Another part that complicates reviewing this code is the external
locking
-- wouldn't it be way better to encapsulate all the locking/unlocking
inside the Process class? I like code that keeps the lock code close to
the data they protect; what's the benefit of having SB* API classes
acquire and maintain internal Process locks?

You might need to do multiple things. Like to get a backtrace for all
threads. Do you want to ask the process for thread at index 0, then
backtrace it. Then another thread resumes the process and it stops
somewhere else. Now you ask for thread at index 1 and you get a different
thread because the thread list has changed. Not a great experience. This
is why we externally lock.

Agreed; it would be a terrible bug.

I'd prefer to see this whole Process locking thing redesigned
with internal rather than external locking. I think this will help nail
down exactly what cases are causing the double-locking thing to
deadlock.

This works great for single access functions, but falls down for complex
actions (like backtracking all threads).

I don't think external locking is the only way to accomplish this. What's
wrong with a Process::BacktraceAllThreads() (or even
Process::BacktraceSomeThreads(ThreadIDs)) function that internally
acquires the run lock and does its business without fear of somebody
continuing the process in the meantime? In BacktraceAllThreads(), if the
process is in transition, that function should just fail. If the process
is stopped, any other non-const calls (to Resume() or
expression-evaluation) should block until BacktraceAllThreads() returns,
or error out. Glancing at the code that calls GetRunLock() in the SB
layer, it seems like it all belongs in the Process class. I understand the
hesitation to add more stuff to Process, since the class is already
getting kinda large, but I think we should worry about splitting it up
into more more bite-sized components after this deadlock business is
sorted out.

We are open to any ideas you have for robust locking. We have tossed
around a few ideas for the process control, but haven't come up with one
that works better than what we have. Having a multi-threaded API sure
does complicate things and is something other debuggers don't have to
worry about.

Definitely does complicate things just a smidge :wink: I suppose it's too late
to require that Xcode uses lldb in a more disciplined single-threaded
mode? :wink: j/k

Feel free to make a patch with your suggested changes and send it out and we can try it out on our stuff over here and maybe this will help us evaluate it better.

You might then try to compile your new LLDB.framework on MacOSX and then run Xcode with it:

DYLD_FRAMEWORK_PATH=/path/to/build/Debug /Applications/Xcode.app/Contents/MacOS/Xcode

This will run Xcode with your new LLDB and allow you to test it out to make sure it all works.

Thanks for the clarifications about the Python code. Glad it's not on the
critical path and that LLDB can still be built without python support.

Yeah, I explored this option when we first did the read/write lock and as
soon as you have more than one primitive object (mutex, condition,
read/write lock,
etc), you are asking for even more trouble.

I think condition variable + lock are meant to work together.

The natural benefit of the above approach is that all the
locking/unlocking magic happen inside Process, and SB has to have 0
knowledge about Process locks.

So how would you solve the problem where you want to play with items in
the process and keep the process from going anywhere? Like

SBProcess process = ...
SBThread thread;
uint32_t num_threads = process.GetNumThreads();
for (uint32_t i=0; i<num_threads; ++i)
{
thread = process.GetThreadAtIndex(i);
}

In the above code we let process do all the locking we could end up
having the process change our from underneath us as we iterate?

Another example:

SBFrame frame;
uint32_t num_frames = thread.GetNumFrames()
for (...)
{
   frame = thread.GetFrameAtIndex(i);
}

What's the context the above code is running in? I see two possibilities:

If there's a thread that are running the above, as well as another thread
that calls say SBProcess.Resume(), but neither of these threads are
invoked from within LLDB, I don't think there's any implicit locking that
can prevent process from changing. If implicit-locks are needed, the best
thing to do there, IMHO, is to invalidate the stale thread objects (when
the Process resumes) and have them error-out when used.. I'm not sure how
two r/w locks (or external locking) can help that use case as any locks
are held in the SB layer. How does the code above signal that it's done
with the process locks? If this use-case needs to be handled in a
non-surprising fashion, I think the only thing to do is to lift lock
acquisition one step above the SB layer and put the locks in the users
hands via an explicit SBProcess.Lock/Unlock() or some such mechanism.
Implicit locking as part of the SB functions doesn't seem like it would be
deadlock friendly..

On the other hand, if the loops above are running in a 'callback' context.
That is, invoked from within LLDB as part of some event notification
mechanism, I foresee LLDB spawning a thread with the above code right
before a condition_variable::wait(), thereby 'promising' to keep process
unchanged in the meantime. The same promise has to be kept by the code
above; so reading threads is fine, but evaluating expressions (that are
more complicated than reading some memory) or calling
process.Step/Resume() should cause errors. Anything the code above wants
to do to modify process state would have to be queued somewhere and
processed after control is passed back to the debugger.

Is it not possible to have both "driving" code and "callback"
code go through exactly the same code path the SB* layer as well as
the
Process class?

You mean using the same locks? I am confused by your question. All code
uses the same public API through the SB* layer. The SB layer locks you
out when it is dangerous to do things while the process is running.
Many
of the issues we ran into stemmed from Xcode accessing the SB* API from
multiple threads and all of the locks (the target lock and the process
read/write (stopped/running) locks are needed.

Sorry I was not too clear -- basically I'm worried about the complexity
of
having two discrete R/W locks here, and separate behaviour based on
internal/external threads.

As opposed to taking a condition variable, locking it, and doing two of
three other things? I am not clear how the new scheme simplifies things
yet.

I think a condition variable + a lock is simpler than two r/w locks. I may
be missing something though; I'll try to create a patch that implements
what I'm envisioning here and I'll let you know how it goes. It's possible
that what I want is not possible :slight_smile:

Since processes have one state, it "feels" like
there should be one lock.

Let me know what you think after reading the above explanation. The
locks
are needed and necessary and we haven't been able to come up with
another
way to control access to the things we need to do from the various
threads. We currently have two clients:
- public clients
- private process state thread clients

public clients get locked out, as they should, when the process is
running (continue, or stepping) so we don't get useless answers.

Do you really want to evaluate the value for "x" when we are in the
middle of a single step? No you want to get an error stating "error:
process is running".

Right, expression evaluation would be one of those "non-const" cases
that
should error out if Process m_in_transition is true.

I really don't understand the new scheme you are suggesting. What does
m_in_transition mean? In transition between what and what?

The name is not ideal..it could use more work. Basically, the condition
variable/bool is just a flag that gets set when certain functions (that
modify the inferior process state) become forbidden. I see it being set
true when the callback code is executed (I.e. Through
OperatingSystemPython). In such contexts, any code that wants to modify
the inferior state should not be allowed because the inferior is already
in the process of being modified by whatever called into
OperatingSystemPython. Previously I called these functions 'const' but I
see now that is not correct. What I really mean is read-only with respect
to public and private state.

Feel free to make a patch with your suggested changes and send it out and
we can try it out on our stuff over here and maybe this will help us
evaluate it better.

You might then try to compile your new LLDB.framework on MacOSX and then
run Xcode with it:

DYLD_FRAMEWORK_PATH=/path/to/build/Debug
/Applications/Xcode.app/Contents/MacOS/Xcode

This will run Xcode with your new LLDB and allow you to test it out to
make sure it all works.

Thanks! Are there any simple unit tests though that use LLDB and don't
depend on Xcode? Maybe I can help write some.. What kind of
callback-registration mechanisms should I be looking at? I'm only familiar
with SBBreakpoint.SetCallback() which I imagine is the same thing as is
exercised by the TestStopHookMultipleThreads.py suite?

Does anything exist yet to register functions to be called on process
state change?

I'll try to implement what I proposed (maybe with the simplification of
the private run lock to a simple mutex) and let you know if it fails
miserably or is something you should look at.

Anyways, I'm probably not going to get to this before next week, so any
hints or thoughts you can provide will be quite helpful!

Cheers,
Dan