"step" threading issues

I'm calling when my users want to "step into" a function:

     SBThread th= self->m_process.GetSelectedThread();
     th.StepInto();

When it's done I get notified in the way of my process listener:
case SBProcess::eBroadcastBitStateChanged:
  int n = self->m_process.GetStateFromEvent (data);
  where n is eStateStopped, however at this point when I read :

m_process.GetSelectedThread().GetStopReason() it returns eStopReasonNone half the time, but never when I'm debugging. This is a simple threaded all. How should I find out if ti returned from a plan?

Op 29-4-2013 15:40, Carlo Kok schreef:

I'm calling when my users want to "step into" a function:

     SBThread th= self->m_process.GetSelectedThread();
     th.StepInto();

When it's done I get notified in the way of my process listener:
case SBProcess::eBroadcastBitStateChanged:
  int n = self->m_process.GetStateFromEvent (data);
  where n is eStateStopped, however at this point when I read :

m_process.GetSelectedThread().GetStopReason() it returns eStopReasonNone
half the time, but never when I'm debugging. This is a simple threaded
all. How should I find out if ti returned from a plan?

I have to add to this that it ONLY happens for Step Into for some reason.

Op 29-4-2013 15:43, Carlo Kok schreef:

Op 29-4-2013 15:40, Carlo Kok schreef:

I'm calling when my users want to "step into" a function:

     SBThread th= self->m_process.GetSelectedThread();
     th.StepInto();

When it's done I get notified in the way of my process listener:
case SBProcess::eBroadcastBitStateChanged:
  int n = self->m_process.GetStateFromEvent (data);
  where n is eStateStopped, however at this point when I read :

m_process.GetSelectedThread().GetStopReason() it returns eStopReasonNone
half the time, but never when I'm debugging. This is a simple threaded
all. How should I find out if ti returned from a plan?

I have to add to this that it ONLY happens for Step Into for some reason.

And an addition to that: I get several eStateStopped events in sequence when this happens.

You should only get one stopped event unless you are hitting a breakpoint that continues your target. In this case the eStateStopped event would be a "restarted" event which can be found out by:

    static bool
    SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should never see two eStateStopped events in a row, if you are, please try and reproduce on a Mac target and file a bug.

For step into, the stop reason should always be eStopReasonPlanComplete.

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a breakpoint that continues your target. In this case the eStateStopped event would be a "restarted" event which can be found out by:

     static bool
     SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should never see two eStateStopped events in a row, if you are, please try and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for StepInto. I'm up to date on last weeks trunk update to windows; but I'll try to compile lldb on OSX to see if I can reproduce it there.

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a
breakpoint that continues your target. In this case the eStateStopped
event would be a "restarted" event which can be found out by:

     static bool
     SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should
never see two eStateStopped events in a row, if you are, please try
and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for
StepInto. I'm up to date on last weeks trunk update to windows; but I'll
try to compile lldb on OSX to see if I can reproduce it there.

I get this from the log:
http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop", which makes no sense to me, nor can I find any way this could happen, I do know it doesn't happen if i slowly step through it.

Op 29-4-2013 22:15, Carlo Kok schreef:

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a
breakpoint that continues your target. In this case the eStateStopped
event would be a "restarted" event which can be found out by:

     static bool
     SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should
never see two eStateStopped events in a row, if you are, please try
and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for
StepInto. I'm up to date on last weeks trunk update to windows; but I'll
try to compile lldb on OSX to see if I can reproduce it there.

I get this from the log:
http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop", which makes no
sense to me, nor can I find any way this could happen, I do know it
doesn't happen if i slowly step through it.

I've narrowed it down to this (line 1622):

ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x1a03: returning vote for complete stack's back plan
...
ThreadList::ShouldReportStop returning yes

ShouldReportStop returns "Yes" because
m_completed_plan_stack.count > 0 in which case it returns:
return m_completed_plan_stack.back()->ShouldReportStop (event_ptr);

Now the last thing in that is a:
Vote
ThreadPlanCallFunction::ShouldReportStop(Event *event_ptr)
{
     if (m_takedown_done || IsPlanComplete())
         return eVoteYes; << goes here.
     else
         return ThreadPlan::ShouldReportStop(event_ptr);
}

Which is the call to g_lookup_implementation_no_stret_function_code.

Does anyone have an idea what else I can check to solve this step into issue?

I've not been able to reproduce this on Osx or Linux.

Op 2-5-2013 12:03, Carlo Kok schreef:

Op 29-4-2013 22:15, Carlo Kok schreef:

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a
breakpoint that continues your target. In this case the eStateStopped
event would be a "restarted" event which can be found out by:

     static bool
     SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should
never see two eStateStopped events in a row, if you are, please try
and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for
StepInto. I'm up to date on last weeks trunk update to windows; but I'll
try to compile lldb on OSX to see if I can reproduce it there.

I get this from the log:
http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop", which makes no
sense to me, nor can I find any way this could happen, I do know it
doesn't happen if i slowly step through it.

I've narrowed it down to this (line 1622):

ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x1a03: returning vote for complete
stack's back plan
...
ThreadList::ShouldReportStop returning yes

ShouldReportStop returns "Yes" because
m_completed_plan_stack.count > 0 in which case it returns:
return m_completed_plan_stack.back()->ShouldReportStop (event_ptr);

Now the last thing in that is a:
Vote
ThreadPlanCallFunction::ShouldReportStop(Event *event_ptr)
{
     if (m_takedown_done || IsPlanComplete())
         return eVoteYes; << goes here.
     else
         return ThreadPlan::ShouldReportStop(event_ptr);
}

Which is the call to g_lookup_implementation_no_stret_function_code.

Does anyone have an idea what else I can check to solve this step into
issue?

I've not been able to reproduce this on Osx or Linux.

If i change:
if (m_completed_plan_stack.size() > 0)
to:
if (m_completed_plan_stack.size() > 0 && m_plan_stack.size() == 0)

in Thread::ShouldReportStop, it works perfectly.

Op 2-5-2013 12:03, Carlo Kok schreef:

Op 29-4-2013 22:15, Carlo Kok schreef:

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a
breakpoint that continues your target. In this case the eStateStopped
event would be a "restarted" event which can be found out by:

    static bool
    SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should
never see two eStateStopped events in a row, if you are, please try
and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for
StepInto. I'm up to date on last weeks trunk update to windows; but I'll
try to compile lldb on OSX to see if I can reproduce it there.

I get this from the log:
http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop", which makes no
sense to me, nor can I find any way this could happen, I do know it
doesn't happen if i slowly step through it.

I've narrowed it down to this (line 1622):

ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x1a03: returning vote for complete
stack's back plan
...
ThreadList::ShouldReportStop returning yes

ShouldReportStop returns "Yes" because
m_completed_plan_stack.count > 0 in which case it returns:
return m_completed_plan_stack.back()->ShouldReportStop (event_ptr);

Now the last thing in that is a:
Vote
ThreadPlanCallFunction::ShouldReportStop(Event *event_ptr)
{
    if (m_takedown_done || IsPlanComplete())
        return eVoteYes; << goes here.
    else
        return ThreadPlan::ShouldReportStop(event_ptr);
}

Which is the call to g_lookup_implementation_no_stret_function_code.

Does anyone have an idea what else I can check to solve this step into
issue?

I've not been able to reproduce this on Osx or Linux.

If i change:
if (m_completed_plan_stack.size() > 0)
to:
if (m_completed_plan_stack.size() > 0 && m_plan_stack.size() == 0)

in Thread::ShouldReportStop, it works perfectly.

Yes, but that's because then this branch will never get called (m_plan_stack.size() is never 0, there's always a base plan.

So this isn't a correct fix.

Jim

Op 2-5-2013 18:47, jingham@apple.com schreef:

Op 2-5-2013 12:03, Carlo Kok schreef:

Op 29-4-2013 22:15, Carlo Kok schreef:

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a
breakpoint that continues your target. In this case the eStateStopped
event would be a "restarted" event which can be found out by:

     static bool
     SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should
never see two eStateStopped events in a row, if you are, please try
and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for
StepInto. I'm up to date on last weeks trunk update to windows; but I'll
try to compile lldb on OSX to see if I can reproduce it there.

I get this from the log:
http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop", which makes no
sense to me, nor can I find any way this could happen, I do know it
doesn't happen if i slowly step through it.

I've narrowed it down to this (line 1622):

ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x1a03: returning vote for complete
stack's back plan
...
ThreadList::ShouldReportStop returning yes

ShouldReportStop returns "Yes" because
m_completed_plan_stack.count > 0 in which case it returns:
return m_completed_plan_stack.back()->ShouldReportStop (event_ptr);

Now the last thing in that is a:
Vote
ThreadPlanCallFunction::ShouldReportStop(Event *event_ptr)
{
     if (m_takedown_done || IsPlanComplete())
         return eVoteYes; << goes here.
     else
         return ThreadPlan::ShouldReportStop(event_ptr);
}

Which is the call to g_lookup_implementation_no_stret_function_code.

Does anyone have an idea what else I can check to solve this step into
issue?

I've not been able to reproduce this on Osx or Linux.

If i change:
if (m_completed_plan_stack.size() > 0)
to:
if (m_completed_plan_stack.size() > 0 && m_plan_stack.size() == 0)

in Thread::ShouldReportStop, it works perfectly.

Yes, but that's because then this branch will never get called (m_plan_stack.size() is never 0, there's always a base plan.

So this isn't a correct fix.

I figured it wouldn't be that simple. However it cannot be right that it stopped" at the same time.

Op 2-5-2013 18:47, jingham@apple.com schreef:

Op 2-5-2013 12:03, Carlo Kok schreef:

Op 29-4-2013 22:15, Carlo Kok schreef:

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are hitting a
breakpoint that continues your target. In this case the eStateStopped
event would be a "restarted" event which can be found out by:

    static bool
    SBProcess::GetRestartedFromEvent (const lldb::SBEvent &event);

This means the program stopped but restarted automatically. You should
never see two eStateStopped events in a row, if you are, please try
and reproduce on a Mac target and file a bug.

Indeed that's my problem. I get several of those with reason "stop" for
StepInto. I'm up to date on last weeks trunk update to windows; but I'll
try to compile lldb on OSX to see if I can reproduce it there.

I get this from the log:
http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop", which makes no
sense to me, nor can I find any way this could happen, I do know it
doesn't happen if i slowly step through it.

I've narrowed it down to this (line 1622):

ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x1a03: returning vote for complete
stack's back plan
...
ThreadList::ShouldReportStop returning yes

ShouldReportStop returns "Yes" because
m_completed_plan_stack.count > 0 in which case it returns:
return m_completed_plan_stack.back()->ShouldReportStop (event_ptr);

Now the last thing in that is a:
Vote
ThreadPlanCallFunction::ShouldReportStop(Event *event_ptr)
{
    if (m_takedown_done || IsPlanComplete())
        return eVoteYes; << goes here.
    else
        return ThreadPlan::ShouldReportStop(event_ptr);
}

Which is the call to g_lookup_implementation_no_stret_function_code.

Does anyone have an idea what else I can check to solve this step into
issue?

I've not been able to reproduce this on Osx or Linux.

If i change:
if (m_completed_plan_stack.size() > 0)
to:
if (m_completed_plan_stack.size() > 0 && m_plan_stack.size() == 0)

in Thread::ShouldReportStop, it works perfectly.

Yes, but that's because then this branch will never get called (m_plan_stack.size() is never 0, there's always a base plan.

So this isn't a correct fix.

I figured it wouldn't be that simple. However it cannot be right that it a: resumes the process and b: returns "yes let the public api know we stopped" at the same time.

I disagree. You need that for instance to implement "process handle SOMESIG --stop false --print true". You are auto-continuing, yet you want to tell the event-loop runner that this happened so that it can notify about it however is appropriate. For instance in the case of the lldb driver we listen to this event and print some bit to the console. But a GUI might want to do this in some different way, so I don't want to just dump something to stdout and hope somebody notices...

Also we send an event if a breakpoint condition or command is hit but continues the process so that a UI would know to update hit counts in its breakpoint display.

The stopped event always says it restarted (you can query this with the Process::ProcessEventData::GetRestartedFromEvent API.) You just have to make sure you check that any time you get a stopped event.

I have to fix the ThreadPlan.h docs to be more clear about how ShouldReportStop works, however (and I should change its name to be a little more explicit.) ShouldReportStop only gets called if the process is going to auto-continue after the stop. That makes sense, I can't see why you would want to have the process really stop and NOT tell the agent running the event loop about it. But it isn't clear from the name. Whoever did ThreadPlanCallFunction probably didn't realize this, since it shouldn't be returning true from ShouldReportStop. After all, if some thread plan ran a function and decided on the basis of the results of that function call to auto-continue, then there's no reason to tell the outside world about that. I'd have to think a little more carefully to be 100% sure that there aren't any cases where this would be useful, but I can't think of any right now.

OTOH, it looks like the Linux port of LLDB is for some reason not resilient to these "auto-continue" events. That puzzles me, since this should all be handled in generic execution control logic, and this sort of thing causes no problems on OS X.

Jim

Op 2-5-2013 19:47, jingham@apple.com schreef:

Op 2-5-2013 18:47, jingham@apple.com schreef:

Op 2-5-2013 12:03, Carlo Kok schreef:

Op 29-4-2013 22:15, Carlo Kok schreef:

Op 29-4-2013 18:41, Carlo Kok schreef:

Op 29-4-2013 18:23, Greg Clayton schreef:

You should only get one stopped event unless you are
hitting a breakpoint that continues your target. In
this case the eStateStopped event would be a
"restarted" event which can be found out by:

static bool SBProcess::GetRestartedFromEvent (const
lldb::SBEvent &event);

This means the program stopped but restarted
automatically. You should never see two eStateStopped
events in a row, if you are, please try and reproduce
on a Mac target and file a bug.

Indeed that's my problem. I get several of those with
reason "stop" for StepInto. I'm up to date on last weeks
trunk update to windows; but I'll try to compile lldb on
OSX to see if I can reproduce it there.

I get this from the log: http://pastebin.com/msnqdi6P

at line 1656 it resumes it, yet still broadcast a "stop",
which makes no sense to me, nor can I find any way this
could happen, I do know it doesn't happen if i slowly step
through it.

I've narrowed it down to this (line 1622):

ThreadList::ShouldReportStop 3 threads
Thread::ShouldReportStop() tid = 0x1a03: returning vote for
complete stack's back plan ... ThreadList::ShouldReportStop
returning yes

ShouldReportStop returns "Yes" because
m_completed_plan_stack.count > 0 in which case it returns:
return m_completed_plan_stack.back()->ShouldReportStop
(event_ptr);

Now the last thing in that is a: Vote
ThreadPlanCallFunction::ShouldReportStop(Event *event_ptr) {
if (m_takedown_done || IsPlanComplete()) return eVoteYes; <<
goes here. else return
ThreadPlan::ShouldReportStop(event_ptr); }

Which is the call to
g_lookup_implementation_no_stret_function_code.

Does anyone have an idea what else I can check to solve this
step into issue?

I've not been able to reproduce this on Osx or Linux.

If i change: if (m_completed_plan_stack.size() > 0) to: if
(m_completed_plan_stack.size() > 0 && m_plan_stack.size() ==
0)

in Thread::ShouldReportStop, it works perfectly.

Yes, but that's because then this branch will never get called
(m_plan_stack.size() is never 0, there's always a base plan.

So this isn't a correct fix.

I figured it wouldn't be that simple. However it cannot be right
that it a: resumes the process and b: returns "yes let the public
api know we stopped" at the same time.

I disagree. You need that for instance to implement "process handle
SOMESIG --stop false --print true". You are auto-continuing, yet you
want to tell the event-loop runner that this happened so that it can
notify about it however is appropriate. For instance in the case of
the lldb driver we listen to this event and print some bit to the
console. But a GUI might want to do this in some different way, so I
don't want to just dump something to stdout and hope somebody
notices...

Also we send an event if a breakpoint condition or command is hit but
continues the process so that a UI would know to update hit counts in
its breakpoint display.

The stopped event always says it restarted (you can query this with
the Process::ProcessEventData::GetRestartedFromEvent API.) You just
have to make sure you check that any time you get a stopped event.

ah. I was unaware of that call, however that does seem to fix (At least part) of it.

I have to fix the ThreadPlan.h docs to be more clear about how
ShouldReportStop works, however (and I should change its name to be a
little more explicit.) ShouldReportStop only gets called if the
process is going to auto-continue after the stop. That makes sense,
I can't see why you would want to have the process really stop and
NOT tell the agent running the event loop about it. But it isn't
clear from the name. Whoever did ThreadPlanCallFunction probably
didn't realize this, since it shouldn't be returning true from
ShouldReportStop. After all, if some thread plan ran a function and
decided on the basis of the results of that function call to
auto-continue, then there's no reason to tell the outside world about
that. I'd have to think a little more carefully to be 100% sure that
there aren't any cases where this would be useful, but I can't think
of any right now.

OTOH, it looks like the Linux port of LLDB is for some reason not
resilient to these "auto-continue" events. That puzzles me, since
this should all be handled in generic execution control logic, and
this sort of thing causes no problems on OS X.

I "solved" the issue with GetRestartedFromEvent but then it crashed on the slim multi read/single write code, which on windows uses an internal api of Windows. It appears that it did a write unlock twice, without lock in between(and windows then crashes on the next lock operation) in Process::SetPublicState (StateType new_state), this might cause issues on other os'es too, since I doubt pthread guarantees that that works

MacOS X seems to not to care about this, but it still is not something we should allow to happen. I'm currently going through cleaning up all the cases where this happens in the current testsuite. If my brains don't fail me I should be done early next week.

Jim