Process::SyncIOHandler(), everyone please try out the following patch

We had some race conditions in the past where you would type:

(lldb) continue

And you might get a "(lldb) " prompt that came out before and intermingled with the process STDIO. To fix this we sync with the process IOHandler using: Process::SyncIOHandler(2000) in a two places: "process continue" and all "thread step ..." commands. The problem is Process::SyncIOHandler() does the following:

bool
Process::SyncIOHandler (uint64_t timeout_msec)
{
    bool timed_out = false;

    // don't sync (potentially context switch) in case where there is no process IO
    if (m_process_input_reader)
    {
        TimeValue timeout = TimeValue::Now();
        timeout.OffsetWithMicroSeconds(timeout_msec*1000);
        m_iohandler_sync.WaitForValueEqualTo(true, &timeout, &timed_out);
        m_iohandler_sync.SetValue(false, eBroadcastNever);
    }
    return !timed_out;
}

And then in a background thread that handles the internal process events, we modify m_iohandler_sync:

void
Process::HandlePrivateEvent()
{
    ...
    if (StateIsRunningState (new_state))
    {
        if (!GetTarget().GetDebugger().IsForwardingEvents() && new_state != eStateLaunching)
            PushProcessIOHandler ();
        m_iohandler_sync.SetValue(true, eBroadcastAlways);
    }
    else if (StateIsStoppedState(new_state, false))
    {
        m_iohandler_sync.SetValue(false, eBroadcastNever);
    ...

    }
    ...
}

The problem is that if the background thread is able to handle the running state followed by a stopped state, it might set the value to true and then back to false again before the main thread calls Process::SyncIOHandler() and then the main thread waits for 2 seconds.

The solution seems simple: remove the m_iohandler_sync.SetValue(false, eBroadcastNever); from the StateIsStoppedState else clause in Process::HandlePrivateEvent() since the Process::SyncIOHandler() always sets it to false. Can everyone try and apply the following patch and run your test suite and also use LLDB for a while?

% svn diff
Index: source/Target/Process.cpp

Test run looks equivalent on FreeBSD - I had two failures on my desktop:

FAIL: LLDB (suite) :: TestEvents.py (FreeBSD feynman 10.1-STABLE
FreeBSD 10.1-STABLE #28 r280427+86df2de(stable-10): Thu Mar 26
16:07:47 EDT 2015
emaste@feynman:/tank/emaste/obj/tank/emaste/src/git-stable-10/sys/GENERIC
amd64 amd64)
FAIL: LLDB (suite) :: TestSendSignal.py (FreeBSD feynman 10.1-STABLE
FreeBSD 10.1-STABLE #28 r280427+86df2de(stable-10): Thu Mar 26
16:07:47 EDT 2015
emaste@feynman:/tank/emaste/obj/tank/emaste/src/git-stable-10/sys/GENERIC
amd64 amd64)

The first fails intermittently for me under load, while the second has
been failing for a while.

The change seems to make the lldb-prompt-at-the-wrong-time problem
worse (or at least, no better) during interactive single stepping
though. For example:

% bin/lldb /bin/ls
(lldb) target create "/bin/ls"
Current executable set to '/bin/ls' (x86_64).
(lldb) b main
Breakpoint 1: where = ls`main + 33 at ls.c:163, address = 0x00000000004023f1
(lldb) run
Process 58244 launching
Process 58244 launched: '/bin/ls' (x86_64)
(lldb) Process 58244 stopped
* thread #1: tid = 103132, 0x00000000004023f1 ls`main(argc=1,
argv=0x00007fffffffe598) + 33 at ls.c:163, stop reason = breakpoint
1.1
    frame #0: 0x00000000004023f1 ls`main(argc=1,
argv=0x00007fffffffe598) + 33 at ls.c:163
   160 #ifdef COLORLS
   161 char termcapbuf[1024]; /* termcap definition buffer */
   162 char tcapbuf[512]; /* capability buffer */
-> 163 char *bp = tcapbuf;
   164 #endif
   165
   166 (void)setlocale(LC_ALL, "");
step
Process 58244 stopped
* thread #1: tid = 103132, 0x00000000004023fa ls`main(argc=1,
argv=0x00007fffffffe598) + 42 at ls.c:166, stop reason = step in
    frame #0: 0x00000000004023fa ls`main(argc=1,
argv=0x00007fffffffe598) + 42 at ls.c:166
   163 char *bp = tcapbuf;
   164 #endif
   165
-> 166 (void)setlocale(LC_ALL, "");
   167
   168 /* Terminal defaults to -Cq, non-terminal defaults to -1. */
   169 if (isatty(STDOUT_FILENO)) {
(lldb) step
Process 58244 stopped
* thread #1: tid = 103132, 0x0000000800dab011
libc.so.7`setlocale(category=0, locale=0x0000000000406373) + 33 at
setlocale.c:113, stop reason = step in
    frame #0: 0x0000000800dab011 libc.so.7`setlocale(category=0,
locale=0x0000000000406373) + 33 at setlocale.c:113
   110 return (NULL);
   111 }
   112
-> 113 if (locale == NULL)
   114 return (category != LC_ALL ?
   115 current_categories[category] : currentlocale());
   116
(lldb) step
(lldb) Process 58244 stopped
* thread #1: tid = 103132, 0x0000000800dab01a
libc.so.7`setlocale(category=0, locale=0x0000000000406373) + 42 at
setlocale.c:121, stop reason = step in
    frame #0: 0x0000000800dab01a libc.so.7`setlocale(category=0,
locale=0x0000000000406373) + 42 at setlocale.c:121
   118 * Default to the current locale for everything.
   119 */
   120 for (i = 1; i < _LC_LAST; ++i)
-> 121 (void)strcpy(new_categories[i], current_categories[i]);
   122
   123 /*
   124 * Now go fill up new_categories from the locale argument

So do you still have problems with the prompt coming out at the wrong time or getting interspersed on FreeBSD with top of tree? It is just made worse by this patch?

Try this patch:

iosync.patch (7.92 KB)

Hi Greg,

This latest patch appears to have ANSI escape codes in it. Any chance you can re-upload?

By the way, the original one-liner patch doesn’t seem to fix the double prompt for me, and it also doesn’t fix the case of printing a stack trace after stopping at a breakpoint through a .lldbinit file. In other words, this is an interactive session:

d:\src\llvmbuild\ninja\bin>lldb
(lldb) command source -s 1 ‘d:\src\llvmbuild\ninja\bin..lldbinit’
(lldb) file d:\testexe\simple_step.exe
Current executable set to ‘d:\testexe\simple_step.exe’ (i686).
(lldb) break set -f simple_step.cpp -l 11
Breakpoint 1: where = simple_step.exe`main + 54 at simple_step.cpp:11, address = 0x00416096
(lldb) run
Process 11400 launching
(lldb) Process 11400 launched: ‘d:\testexe\simple_step.exe’ (i686)
(lldb) Process 11400 stopped

  • thread #1: tid = 0x376c, 0x00cd6096 simple_step.exemain(argc=1, argv=0x01149028) + 54 at simple_step.cpp:11, stop reason = breakpoint 1.1 frame #0: 0x00cd6096 simple_step.exemain(argc=1, argv=0x01149028) + 54 at simple_step.cpp:11
    8
    9 int main(int argc, char **argv) {
    10 int fib7 = fib(7);
    → 11 printf(“The value of fib(7) is %u\n”, fib7);
    12 return 0;
    13 }
    (lldb)

If I make a .lldbinit file with the exact same set of commands and start lldb, I get this output.

d:\src\llvmbuild\ninja\bin>lldb
(lldb) command source -s 1 ‘d:\src\llvmbuild\ninja\bin..lldbinit’
Process 5288 launching
(lldb) (lldb)

The one-liner patch doesn’t change anything for me, so it seems it doesn’t fix everything? I haven’t tried the second patch though because of the ANSI escape codes.

Try this one.

iosync.patch (7.21 KB)

Same thing. Double prompt and no source listing when the breakpoint is hit. If there’s any way that I can help diagnose let me know (logs, etc)

We need to come up with a bullet proof way of making this just work. Right now we have the sync hack and that is obviously working for reasons no one understands because it introduces 2 second delays which masks the true issue that needs to be fixed.

Do you still see overlapping (lldb) prompts even with no fixes? Or do things work correctly for you with top of tree?

Greg

Behavior is identical with and without the patch - double prompt, no source listing. It’s possible the patch changes the nature of the race in such a way that i might see different results after many runs, but I couldn’t see anything after ~10 runs

Yes, I get the (lldb) prompt coming out at the wrong time (in relation
to the source listing or target I/O) on top of tree. It's at best
unchanged with the (one-liner) patch -- because it seems to vary with
the overall system load it's hard to say for certain. But I get
interspersed prompts on top of tree with or without the patch.

The change that's been a reliable workaround for me is a usleep(100)
at the end of Process:PrivateResume(). I'll see if that's still the
case and also try the other patch you posted tomorrow.

Hi all,

I am testing the changes on linux. I don't this helps much, but this
is what I have found out:

no patch:
The prompt seems to be coming out at the right place always.

one-liner:
The prompt comes out at the right place, but this does not prove
anything, as the baseline was ok as well. The test suite result is
unchanged.

second patch:
The prompt comes out at the right place, but this does not prove
anything, as the baseline was ok as well. The test suite result is
unchanged.

Interestingly, I do remember the prompt coming out at the wrong place
in the past, but I can't seem to reproduce it now...

cheers,
pl

I've just noticed I'm consistently getting double prompts after
detaching from a process during remote debugging. The one-line patch
doesn't help that, but the second patch seems to make the problem go
away.

Hope that helps,
pl

Yes this m_iohandler_sync is essentially implementing a "sleep(2)" as the overall fix. It is coded in a much more complex way, but that is the way it works out on MacOSX at least.

I will take look at this later today to come up with a real fix to hopefully fix this once and for all.

The issue seems to be we have 3 threads:
1 - main thread running command interpreter waiting for user input
2 - debugger event handling thread that handles all events (run/stop/stdio from process)
3 - process private event handling thread

I believe that the issue we are running into is we get a "stopped" on thread 2 and we grab stdio and also display the thread stop info (thread list and where we stopped) asynchronously while thread 1 is in the process of displaying the "(lldb) " prompt again. We really need any async text to be supplied to the IOHandler stack for display so that it can correctly coordinate the output. I believe right now thread 2 just prints stuff when it wants to and then asks the IOHandler to refresh the top IOHandler after it does the display, but I need to look at this again to make sure this is what is happening.

Greg

Would a log file help? Maybe this way you’d be able to see what order things are actually happening in. If so, let me know which log channels you want me to turn on and I can upload something.

The change that's been a reliable workaround for me is a usleep(100)
at the end of Process:PrivateResume(). I'll see if that's still the
case and also try the other patch you posted tomorrow.

I tried just now, and did not observe overlapping prompts with any of:

1-liner change + usleep(100) in PrivateResume
the larger change
larger change + usleep(100)

(But it seems the problem was made worse by system load, and so I'm
not completely confident that I've recreated the same conditions under
which I reproduced it on top of tree yesterday.)

I think just looking at what is happening with the event thread and finding out how to make sure that any async text is displayed through the command interpreter (IOHandler stack) and we should easily be able to control it. The command interpreter is what is displaying the user interface so it is logical that is should control the output of async text if it already isn't.

The sleep() and usleep() solutions help us avoid the "continue" that immediately stops, but when the stop takes a bit longer we can easily run into issues.

We can probably reproduce a lot of these issues by stepping over usleep() statements:

int main(...)
{
    for (int i=1000; i<2000000; i+= 1000)
        usleep(i);
}

This will vary how quickly we get the run/stop events and should start to show problems.

OK, another more complete patch to try out...

io.patch (15.8 KB)

Problem persists with this version on FreeBSD, unfortunately:

...
   165
   166 (void)setlocale(LC_ALL, "");
(lldb) step
(lldb) Process 77191 stopped
* thread #1: tid = 100853, 0x00000000004023fa ls`main(argc=1,
argv=0x00007fffffffe588) + 42 at ls.c:166, stop reason = step in
    frame #0: 0x00000000004023fa ls`main(argc=1,
argv=0x00007fffffffe588) + 42 at ls.c:166
   163 char *bp = tcapbuf;
   164 #endif
   165
-> 166 (void)setlocale(LC_ALL, "");
   167
   168 /* Terminal defaults to -Cq, non-terminal defaults to -1. */
   169 if (isatty(STDOUT_FILENO)) {
step
...

Same results for me on Windows. My repro case is creating the following .lldbinit file:

file d:\testexe\simple_step.exe
break set -f simple_step.cpp -l 12
run
bt

and then running lldb with no arguments.

The situation is even worse on Linux, i'm afraid. Whereas I was
getting no wrong prompts before (at least when debugging locally), now
the prompt appears in the wrong place in about 10% of the cases.

pl