Remote Non-Stop Mode

Hi everybody,

We have implemented remote support for non-stop mode in a local Hexagon repo of LLDB and begun to upstream the work, see http://reviews.llvm.org/D9656. Non-Stop mode in a remote stub lets you examine a single stopped thread while others execute normally. This isn't currently a feature of llgs, but is of gdbserver and can be tested on x86.

The remote protocol currently needs to be extended to support asynchronous notifications so it can interact with a non-stop stub. To solve this we've switched to multi-threaded mode from the abstract communications class by implementing AppendBytesToCache() in GDBRemoteCommunication. Allowing us to spawn a read thread that continually reads packets into a packet queue. Packets are then accessed by popping a packet from the queue in a thread safe manner.

Since this is a significant change to the remote protocol implementation which won't just affect non-stop mode I'd like some feedback to check if people have any comments about it.

The patches are ready to go, and will be put up for review soon.

Thanks,
    Ewan

Hello Ewan,

this is exciting, I have been missing non-stop mode support in LLDB
for a while now. I can probably help with implementing the LLGS side
of things once you get your code through. Could you cc me in on the
patches so I can follow this more closely?

PS: The thing that has annoyed me most about gdb non-stop mode is that
it's value is basically locked once you start debugging. I often
realize I want to use this mode once I have started the process, or
want to switch to non-stop mode partway through the session, so I
would really like it mode could be switched at will. Do you have any
idea how difficult it would be to implement something like this?

cheers,
pl

Hi,

We have been seeing a crash on windows when connecting to an android target using lldb-server.
I am not sure if this affects other platforms too.
I think this was introduced with http://reviews.llvm.org/D9056.

I tracked the crash back to the workings of ModuleCache::GetAndPut().

The crash seems to be due to a file descriptor being released twice, once by the original "File lock_file" and again by the "LockFile lock" who share the same file descriptor.

The file descriptor sharing happens because of this line:
ModuleCache.cpp @ 164
LockFile lock (lock_file.GetDescriptor ());

Both destructors attempt to release effectively the same file descriptor. I was able to fix the crash by duplicating the file handle in the lock file constructor using _dup(). (patch attached)
I wasn't sure if this was the right fix however. Has anyone else seen this?
Should "File lock_file" perhaps transfer its file descriptor completely rather then share it?

Thanks,
Aidan

fd_crash_fix.patch (527 Bytes)

Hi Pavel,

Glad to hear there’s demand for it, I’ll cc you into the patches.

Switching to non-stop during a session I don’t think would be too hard once we’ve got it working. As all the packoet handling will be there, and we’d just need t add a way to send a QNonStop packet when switching.
It might however uncover some bugs in the stepping logic which could be tricky to track down.

Cheers,
Ewan

Switching to non-stop during a session I don't think would be too hard once we've got it working. As all the packoet handling will be there, and we'd just need t add a way to send a QNonStop packet when switching.
It might however uncover some bugs in the stepping logic which could be tricky to track down.

Doing this is a great project, but getting this to work reliably will be quite a bit of work. I bet there will be lots of bugs that will not be at all tricky to track down, just tricky to fix...

Jim

We also need to think about behavior when switching states.

For example, if we're in all-stop mode with all threads stopped, then switch
to non-stop mode and do a continue, which threads to we run? All of them?
One of them (which one?)?

Zachary, do you see this on windows at all? Tip for us results in crashes when releasing file descriptors without the below fix.

Colin

Hi Colin,

could you give more context about crash - what build configuration do you use (debug, release,…) and which OS?
I’m running this code on Windows 7 and haven’t noticed any failures.

Windows 7 64, Debug, latest vs13.

I don’t experience this crash but looking at the proposed fix it certainly seems like a plausible explanation. Seems to me like something should transfer ownership though rather than duplicating the handle

Windows file lock is tied to a file handle - using handle duplication effectively applies locking on duplicated handle, not on original, so your original handle will be blocked on IO trying to read/write.
Maybe we can just eliminate CloseHandle call (https://github.com/llvm-mirror/lldb/blob/master/source/Host/windows/LockFileWindows.cpp#L50) if file will be eventually closed by LockFileWindows caller.

Why does someone other than the LockFile object care about holding a raw handle? Shouldn’t it just hold a reference/pointer to the LockFile?

I’m somewhat concerned about following scenario:

File file_obj(…);
LockFile lock (file_obj.GetDescriptor ());

My understanding, that on Windows if LockFile duplicates original handle (i.e. file.GetDescriptor ()) only a duplicated handle will be allowed to call IO functions on this file within locked region, meanwhile file_obj will be blocked on IO.
If LockFile just holds a reference to original handle then file_obj can proceed on IO operations in exclusive mode.

In order to make it clear that LockFile references File we can make LockFile to take File& as constructor parameter instead of a raw handle.

I feel like it should be the other way around. If LockFile is taking a reference then it is implicitly saying that someone else owns the underlying File handle. But who else cares about the contents of a lock file other than the LockFile object itself?

Why not just change the LockFile constructor to not take a file descriptor but instead take the path? This is even better because now in LockFileWindows you don’t even need to use an fd you can just store a HANDLE directly, and it gives you more flexibility with how you create the file because you can use windows-specific options such as FILE_FLAG_DELETE_ON_CLOSE (if you want that, anyway).

If you’re concerned about the case where the Open operation fails and you want to handle that error, then don’t let the constructor take anything, but instead you could make a LockFile::Open() method or a static LockFile &&LockFile::Create() method. (by returning r-value reference here you could make LockFile object moveable but non-copyable, which makes sense for something like a lock file.

I feel like it should be the other way around. If LockFile is taking a
reference then it is implicitly saying that someone else owns the
underlying File handle. But who else cares about the contents of a lock
file other than the LockFile object itself?

It's not the case in current LockFile usage scenario but I can imagine

following use case - somebody wants to synchronize IO operations on file's
content:

   - With multiple readers and awaiting writers
   - Single writer and awaiting readers

After you locked a file with LockFile you can proceed on IO operations and
be confident that these operations are synchronized - e.g., instead of
zero-size lock file we could have used module's file as synchronization
point and to lock a module file first before pouring data into it..