gdb-remote lldb prompt race condition

Hi folks

I have been seeing another issue with the display of the lldb prompt. This time it's when I do "target create elf-file", then "gdb-remote port-number". After the "gdb-remote" command I see the fact that my process is stopped, e.g.

Process 1 stopped

on the screen. But no (lldb) prompt.

Some investigation revealed that what's *probably* happening is the main thread after processing the "gdb-remote" returns to it's IOHandler, which then prints (lldb). However, the inferior's state changes seem to delivered to stdout via a different thread, basically one which sits in Debugger::DefaultEventHandler. This subsequent output then, I think, overwrites the previous (lldb) prompt.

Now in my (and presumably other people's) situation, this issue is compounded by the speed of the TCP/IP connection to the gdbserver stub, the "poll the hardware" nature of my stub, and the fact the hardware is actually simulated - yes over a TCP/IP socket.

FWIW, I resolved this by a horrible (POSIX only) hack, of sleeping for 300ms at the bottom of the CommandInterpreter::HandleCommand function.

@@ -9,6 +9,8 @@

  #include "lldb/lldb-python.h"

+#include <poll.h> // MG for prompt bugs

The event should get delivered to the Debugger thread that is waiting for events and it should coordinate with the top io handler when printing it. If someone is grabbing the event manually by hijacking events, then we need to fix that code to send the event on to the unhijacked main event loop.

Hi Greg,

If that is the case, then the main thread (i.e. the one running Debugger::ExecuteIOHandler) needs to be blocked until the event arrives.

Why?

Because with the existing design once the user has entered their "gdb-remote" command, and completed the connect call, the main thread goes back to the IOHandlerEditline::Run() loop, sees that IsActive() is true, prints the next prompt, etc.. When I debugged this I didn't see any call to Deactivate or SetIsDone, which would have made IsActive return false. (And then the async DefaultEventHandler awakes and it's output "Process 1 stopped" splats over the prompt).

If the code is changed so that the edit line IOHandler's IsActive returns false, while an asynchronous event is happening, then I think that the main thread would spin, since the reader_sp->Run() function below:

void
Debugger::ExecuteIOHanders()
{
     while (1)
     {
         IOHandlerSP reader_sp(m_input_reader_stack.Top());
         if (!reader_sp)
             break;

         reader_sp->Activate();
         reader_sp->Run();
         reader_sp->Deactivate();

would immediately return. That's why I'm thinking the main thread probably should block until the last issued command has completed.

Out of interest, I did research your "If someone is grabbing the event manually by hijacking events" point. But when stopped state is detected (i.e. the reply to ?) in GDBRemote and Broadcaster::PrivateBroadcastEvent is called, there is no hijacking_listener. Indeed the execution path is that as indicated by the -->

     if (hijacking_listener)
     {
         if (unique && hijacking_listener->PeekAtNextEventForBroadcasterWithType (this, event_type))
             return;
         hijacking_listener->AddEvent (event_sp);
     }
     else
     {
         collection::iterator pos, end = m_listeners.end();
         // Iterate through all listener/mask pairs
         for (pos = m_listeners.begin(); pos != end; ++pos)
         {
             // If the listener's mask matches any bits that we just set, then
             // put the new event on its event queue.
             if (event_type & pos->second)
             {
                 if (unique && pos->first->PeekAtNextEventForBroadcasterWithType (this, event_type))
                     continue;
----> pos->first->AddEvent (event_sp);

So my contention is that in the case of gdb-connect the initial stop state should either be delivered/printed sychronously in the Process::ConnectRemote (i.e. in the mainthread context) or that the main thread should block until either the event arrives, or for some other reason the command last issued by the user is deemed to be "complete".

thanks
Matt

Greg Clayton wrote:

I’m wondering whether whether we need some brave soul, to redesign the current lldb IO handling mechanisms.

I’m not sure about the “brave soul” part, but my team may look to take this on. We’re getting hammered by it in several different contexts (i.e. the idea that races here seem to require several one-off fixes to get the handling right). When we get a few cycles on it we may try to hash out something coherent and propose it here. (If somebody beats us to it, have at it).

I’m blocked by I think a related issue here, so I may get to this sooner than later.

-Todd

Hi Greg,

If that is the case, then the main thread (i.e. the one running Debugger::ExecuteIOHandler) needs to be blocked until the event arrives.

The command line command "process connect" needs to synchronize better. It currently isn't and this is causing the problem. All we know is that we are executing a "gdb-remote 1234" command which turns into "process connect connect://localhost:1234" and then the command execution returns immediately and then the io handler gets the next command as it should. We should add code to commands that we know might needs some syncing to make sure they don't violate things.

Why?

Because with the existing design once the user has entered their "gdb-remote" command, and completed the connect call, the main thread goes back to the IOHandlerEditline::Run() loop, sees that IsActive() is true, prints the next prompt, etc.. When I debugged this I didn't see any call to Deactivate or SetIsDone, which would have made IsActive return false. (And then the async DefaultEventHandler awakes and it's output "Process 1 stopped" splats over the prompt).

No other IOHandler gets pushed, so the command interpreter is active and it will just get the next command as soon as it can. Most commands do something, then complete and then we are ready for more. There are a few that needs some extra syncing because they cause a launch/attach/connect and we usually get a response right away. There are others that might be immediate and might be async "thread step-XXX" for example. The step might complete and it might not. There are others that are purely async "process continue".

To complicate this, if we attach to a process, then there is no process IO handler to push and the command line will always be active.

If the code is changed so that the edit line IOHandler's IsActive returns false, while an asynchronous event is happening, then I think that the main thread would spin, since the reader_sp->Run() function below:

void
Debugger::ExecuteIOHanders()
{
   while (1)
   {
       IOHandlerSP reader_sp(m_input_reader_stack.Top());
       if (!reader_sp)
           break;

       reader_sp->Activate();
       reader_sp->Run();
       reader_sp->Deactivate();

would immediately return. That's why I'm thinking the main thread probably should block until the last issued command has completed.

So this is the job of each command. Most commands complete right way.

It sounds like we might want to introduce a formal synchronization to the IOHandler and each command would need to change it from the default "no sync required".

We would need two things:
- no sync required (default)
- sync with timeout for commands that usually complete quickly, but if they don't we need to get back to commands

Out of interest, I did research your "If someone is grabbing the event manually by hijacking events" point. But when stopped state is detected (i.e. the reply to ?) in GDBRemote and Broadcaster::PrivateBroadcastEvent is called, there is no hijacking_listener. Indeed the execution path is that as indicated by the -->

   if (hijacking_listener)
   {
       if (unique && hijacking_listener->PeekAtNextEventForBroadcasterWithType (this, event_type))
           return;
       hijacking_listener->AddEvent (event_sp);
   }
   else
   {
       collection::iterator pos, end = m_listeners.end();
       // Iterate through all listener/mask pairs
       for (pos = m_listeners.begin(); pos != end; ++pos)
       {
           // If the listener's mask matches any bits that we just set, then
           // put the new event on its event queue.
           if (event_type & pos->second)
           {
               if (unique && pos->first->PeekAtNextEventForBroadcasterWithType (this, event_type))
                   continue;
----> pos->first->AddEvent (event_sp);

So my contention is that in the case of gdb-connect the initial stop state should either be delivered/printed sychronously in the Process::ConnectRemote (i.e. in the mainthread context) or that the main thread should block until either the event arrives, or for some other reason the command last issued by the user is deemed to be "complete".

Agreed. "process connect" should be doing better synchronization.

We also need a better way to deliver async output to the IOHandler stack. Right the only place that tries to handle this is in the Debugger class where it handles process events. This is where the tricky code of popping the process IOHandler lives. What should happen is we should have a function like:

Debugger::AsyncIO (...)

and any async output (from the process' stdout/stderr, or an eStateStopped event with thread state and frame and reason for stop) should be delivered through this.

The top IOHandler would then get this delivered to it in a thread safe way and it could then hide itself (for the command interpreter remove the "(lldb) " prompt and any command you are typing, display the output, then refresh the prompt and command, and continue. I believe this will solve all of our issues (using Debugger::AyncIO(...)).

Greg

Yes, unfortunately, the prompt issue rears it's head in several places. That's why it seems clear that at least some extension to the current design is required. I'm also wondering whether Shawn's SyncIOHandler can be applied to this one. Regarding "spare cycles", I'm about to launch into a big investigation for kalimba on lldb - since we have some variants where bytes (minimum addressable units) aren't 8-bits, so I can do little more than the odd piece of analysis, and linux patch testing... :frowning:

Matt

Todd Fiala wrote:

Ok,

Thanks for this, Greg. I'm not in a position to look at this right now, but wanted to at least point out some of my observations. Hopefully someone can pick this up sometime soon.

thanks
Matt

Greg Clayton wrote:

FWIW I am planning on having Shawn Best look at doing this when he gets back.

-Todd

That would be good - thanks for the update, Todd!

Matt

Todd Fiala wrote: