Race condition or wrong API usage?

Hi all!

I'm having some weird problems with asynchronous API usage. I will address it in this email as if it was a bug, but it may be wrong API usage from my part.

The main problem is the run lock being write locked when the process is stopped (and I got a state changed event).
What happens is:

Create a target (a simple 'hello world'-like program) and set a breakpoint
Set Async mode.

Set up an SBListener that will listen (on the SBDebugger) for lldb.SBProcess.eBroadcastBitStateChanged, from event class lldb.SBProcess.GetBroadcasterClassName().
Call SBTarget::LaunchSimple(None, None, os.getcwd())
Wait for an event
The process is now in a stopped state (checked using GetStateFromEvent() == eStateStopped)
Try to get the stopped thread. There isn't any (the only thread has eStopReasonInvalid).

Why does this happen?

SBTarget::Launch() will start the process (the run_lock gets created write-locked, the Process' public state will be eStateRunning) and wait for it to stop (when it stops, the Event's DoOnRemoval will change the public state (to eStateStopped) of the Process and call WriteUnlock()).
After the process stopped on entry, we want to continue until we hit our breakpoint. SBTarget::Launch calls Process::Resume, which will call WriteTryLock (succeeding) and call into Process::PrivateResume()

The lock will not be unlocked again.
The Process' private state will toggle between the stopped and running states several times, before stopping at the breakpoint. When it stops at the breakpoint, it will set the Process' public state to eStateStopped. But the public state was already eStateStopped, so we won't call WriteUnlock on Process.cpp:1310.

Any clues on where to look for? I've started debugging it but maybe Greg will know best where to look and how to fix this.

Attached is a test that show the problem (main.cpp is from python_api/process).

Thanks,

  Filipe

main.cpp (1.02 KB)

Makefile (77 Bytes)

TestProcessRWLock.py (3.92 KB)

Hi again,

It seems that, if I use the debugger's listener (like Driver.cpp does), it works:

If I change the test to do this:
        listener = self.dbg.GetListener()
instead of:
listener = lldb.SBListener("test-listener")

Everything works (after we take into account the additional events related to the program's STDOUT). But shouldn't my first version work? I setup the SBListener before doing anything that will use it, and the events should be propagated to every listener that is hanging to the Process broadcaster. Or am I missing something?

Thanks,

  Filipe

We are taking a look at this, and we will let you know.

In general there should only be one listener that is getting process stopped events, as when the event in consumed, it does some work (DoOnRemoval for instance). I think we enforce this, but if we don't and there are two consumers of the events, we might have the issue you are seeing?

We will look into this and get back to you.

So the way that processes and listeners works is that you can either explicitly provide a listener to the Launch call when you launch the process, or if you don't (because the process launching really doesn't work if nobody is there to consume events) the Debugger's listener will get hooked into the process when it gets launched.

In your case, you used SimpleLaunch, so the debugger's listener was getting used.

Now, when you do launch, two events are generated in the process of the launch. First a "eStateStopped" event, when we stop at the entry point, and second an "eStateRunning" event, when we "resume". The SBTarget::Launch event will consume the "eStateStopped" event on its main listener, so if you had either provided one at launch or were listening to the Debugger's listener, you would only see "eStateRunning". But if you just attach another listener to the process then you're going to get all the broadcast events unfiltered. That's probably not such a useful thing to do... You also have to watch out if you do that because the actions associated with stop events (for instance running a breakpoint command) happen the first time the event is pulled off the event queue. If you have two listeners listening to the Process then you can't be sure which one is going to trigger this, which may make your running of the debugger unpredictable. We probably should disallow two "process" debug listeners, but we haven't implemented that yet.

Note, the immediate problem in your case is that you got the eStateStopped from the initial stop state. The target immediately resumed, but you didn't know that 'cause you hadn't fetched the running event. That initial stop event isn't useful, since to treat it properly you would have to know that a "run" will follow it immediately.

As an aside, in other cases where the debugger stops for some ostensibly public reason like a breakpoint hit, but then restarts itself (for instance if it has a breakpoint command that calls "continue") then you will get a stopped event, but the "restarted" bit will be set in the event, so you will know that you have to wait for another "stop" before you can ask questions about the process. I didn't do that in the case of this initial stop, just because it wasn't supposed to make its way out into the world...

Anyway, if you want to run the debugger asynchronously, the correct thing to do is to either use the debugger's listener or make your own listener, and insert that into the process when you launch it - using the more complex Launch API. I monkeyed around with your script a little bit and using the debugger's listener worked trivially, but using a listener passed to launch didn't work. There's probably some dopey error, but I didn't chase that down, I've got other stuff I have to do right now. But if you can't get this to work we can look at it further.

Note if you use the debugger's listener, it will also deliver STDIO events, etc, so you either need to turn that off (but then you probably want to disable stdio in the launch flags) or you'll need to deal with those events.

And finally, you WILL get a eStateRunning event from Launch before you get the eStateStopped from hitting the breakpoint. So you'll have to change your test to accommodate that.

Jim

Hi Jim and Greg,

Yes, that was the bug. Using the debugger's listener will work (I haven't tried using Launch with a listener).
I think we should at least warn about this kind of stuff in SetASync or some page about asynchronous usage of lldb (or implement the error Jim talked about (having two process debug listeners)), since a user might trip up on that.

When I have the time I will try to check out one of those options.

Thanks,

  Filipe

So the way that processes and listeners works is that you can either explicitly provide a listener to the Launch call when you launch the process, or if you don't (because the process launching really doesn't work if nobody is there to consume events) the Debugger's listener will get hooked into the process when it gets launched.

In your case, you used SimpleLaunch, so the debugger's listener was getting used.

Now, when you do launch, two events are generated in the process of the launch. First a "eStateStopped" event, when we stop at the entry point, and second an "eStateRunning" event, when we "resume". The SBTarget::Launch event will consume the "eStateStopped" event on its main listener, so if you had either provided one at launch or were listening to the Debugger's listener, you would only see "eStateRunning". But if you just attach another listener to the process then you're going to get all the broadcast events unfiltered. That's probably not such a useful thing to do... You also have to watch out if you do that because the actions associated with stop events (for instance running a breakpoint command) happen the first time the event is pulled off the event queue. If you have two listeners listening to the Process then you can't be sure which one is going to trigger this, which may make your running of the debugger unpredictable. We probably should disallow two "process" !

debug lis
teners, but we haven't implemented that yet.