Refactoring in LLDB Windows Plugin

If you’re not working in LLDB’s Windows process plugin, you probably don’t care about this message.

FYI: I’m doing some refactoring (actually unfactoring) in the Windows process plugin. It’ll take a series patches over a few days next week. If you’re planning on working in this area, please let me know so we can coordinate.

Details:

Last year, I factored the classes like ProcessWindows into pairs of classes with names like ProcessWindows and ProcessWindowsLive so that I could introduce classes like ProcessWindowsMiniDump that shared common code. Now that the Windows-specific minidump plugin has been superseded by the cross-platform minidump plugin, this factoring is no longer necessary. Since the factoring creates extra layers that make the code harder to understand and maintain, I’d like to undo the split.

My plan is to do this in three NFC patches:

Patch 1. Roll the functionality from the common classes into the -Live classes. This will eliminate everything under Plugins/Process/Windows/Common and leave the functional code in Process/Plugins/Windows/Live.

Patch 2. Rename the -Live classes to shorter, more tractable names.

Patch 3. Move the code up from the Live subdirectory so that it once again lives in Plugins/Process/Windows.

Patches 2 and 3 could be done in a single step, but I think the history will be easier to follow if I keep them separate.

If you have any concerns about this plan, please let me know.

Adrian.

One thing that we have talked about is to move the ProcessWindows stuff into lldb-server (it has a NativeProcess and NativeThread class you would need to subclass instead of making ProcessWindows and ThreadWindows classes) instead of making a native plug-in that is only useful on the current system. Then remote windows debugging would be possible. It would end up using thee ProcessGDBRemote.cpp process plug-in then. Then the ProcessWindows plugin directory would not be needed. Any thoughts on this?

Greg

I have no objection to moving in that direction (enabling remote debugging). Zach just pointed out that we’d first have to port lldb-server to Windows. Offhand, I don’t know how much of an effort that would be.

In the near-term, I’m focused on postmortem debugging, so this wouldn’t be a priority for me for a while.

The “unfactoring” I mentioned in this email is essentially done, though I’m planning to move some files to eliminate an unnecessary subdirectory from the source paths.

What would it take to make it so that local and remote process plugins use the same exact interface? I mean in theory they’re doing the same thing, just on a different machine. If they shared an identical interface then you could hook the lldb-server up to it and it would work either locally or remotely.

What was the original motivation for having the api design of remote and local process plugins diverge?

The plan was always do remote so we are always using one thing. We started off thinking we wanted to have a native plug-in and a remote GDB server, but when we found we didn't have serious performance issues we went the lldb-server/debugserver route for everything on our end. lldb-server uses NativeProcess and NativeThread as base classes that must be subclassed and we would make a ProcessNative plug-in that used the native compiled version of these (like lldb-server does), but then we have two code paths to test: native and remote. So we either have to run twice the number of tests, one local and one remote, so we can make sure native and remote work correctly, or we just go one route. What would tend to happen is we would 99.9% of people would do local debugging only and all bugs submitted would mostly be bugs with the native implementation and remote would suffer and become neglected. GDB had two different code paths for these so remote really did suffer and we evolved to use remote only on all our systems. Another nice reason for this is you can save the GDB remote packet log traffic when you do encounter a bug and see exactly what happened when a bug happened.

So due to history, we started thinking we would need both native and remote plug-ins, but we migrated over time to just one solution for simpler testing, ensuring remote debugging is rock solid since it is always used for local debugging, and for the convenience of being able to completely lop all traffic to/from the process with the GDB remote logs.

Greg

For what it's worth, we've been using lldb with ds2 to do remote
debugging on Windows (x86) and Windows Phone (arm) and the lldb side
of things works well with remote Windows targets. Besides porting
lldb-server to Windows there shouldn't be any extra effort on the lldb
side to do what Greg is talking about.

I’m curious, can you share how did you deal with the fact that the gdb-remote protocol is very signal-centric? E.g. every stop-reply ($T) has to have a specific signal associated with it, and some signals have special meaning in lldb. Do you just fake the signal numbers when you need to? Or is the situation not as bad as I imagine?

cheers,
pl

Yeah, we have to fake the numbers, we have some sort of dummy mapping
that converts Windows events to the UNIX equivalent that makes the
most sense. It's kind of dumb, but that's the only thing that LLDB or
gdb support.

First half of the conversion is here:
https://github.com/facebook/ds2/blob/master/Sources/Target/Windows/Thread.cpp#L110
and the second half is here:
https://github.com/facebook/ds2/blob/master/Sources/GDBRemote/Structures.cpp#L357

In LLDB you can always return "05" as the signal number (which is SIGTRAP), but then fill in the other key/value types in the stop reply packet. GDB solely relies on the signal numbers, LLDB doesn't.

See the section named "Stop reply packet extensions" in the gdb protocol doc we have:

svn cat http://llvm.org/svn/llvm-project/lldb/trunk/docs/lldb-gdb-remote.txt

In a stop reply packet, you can specify a "reason" and a "description":

//
// "reason" enum The enumeration must be one of:
// "trace" the program stopped after a single instruction
// was executed on a core. Usually done when single
// stepping past a breakpoint
// "breakpoint" a breakpoint set using a 'z' packet was hit.
// "trap" stopped due to user interruption
// "signal" stopped due to an actual unix signal, not
// just the debugger using a unix signal to keep
// the GDB remote client happy.
// "watchpoint". Should be used in conjunction with
// the "watch"/"rwatch"/"awatch" key value pairs.
// "exception" an exception stop reason. Use with
// the "description" key/value pair to describe the
// exceptional event the user should see as the stop
// reason.
// "description" ascii-hex An ASCII hex string that contains a more descriptive
// reason that the thread stopped. This is only needed
// if none of the key/value pairs are enough to
// describe why something stopped.

Feel free to extend the "reason" enum if needed, you can always describe it correctly in the "description" field like "Windows EXC_UNDEFINED_DWORD access exception" and that will get displayed as is in the debugger.

Greg Clayton