Process monitoring Host/Plugin interactions

I’m going to tackle implementing process monitoring for Windows next, and I’ve started going over how this works for other platforms.

There’s a bit of a confusing interaction between Host, Target, and the individual plugins, and I’m wondering if it might make more sense to put the monitoring code somewhere other than Host.

From what I can tell, launching a process for debugging goes something like this:

  1. Call ProcessPlugin::DoLaunch
  2. some system calls to actually launch the process
  3. Call Host::StartMonitoringChildProcess with callback set to somewhere in your plugin.
  4. The monitoring thread (in Host) periodically calls back into ProcessPlugin

For starters, this seems to be broken in the case where the ProcessLaunchInfo has pre-set an m_monitor_callback because the plugins end up just ignoring this and using their own callback.

Ultimately, how a process is monitored is going to depend on the plugin, so shouldn’t StartMonitoringChildProcess also be part of the plugin? In what situation would you ever want to monitor a child process that is not under the control of some process plugin?

Any process you launch that you aren't going to debug. Like launching a shell command or just say running "debugserver" in order for _it_ to be able to debug your application along with ProcessGDBRemote. ProcessGDBRemote is very interested if the "debugserver" process dies because it crashes, so it can kill the debug session and say "debugserver exited with status 12". Also when running a shell command you need to be able to run the shell command and find out when the shell command exits.

So the monitoring of the process definitely will stay in the host layer. The host layer can and should be used to launch any process (though some plug-ins might not be respecting that and they might be doing it themselves. The idea is we write the process launching code once in the host layer, then lldb-gdbserver, lldb-platform, and lldb can all use that one copy of the code to correctly launch processes and monitor (for signals) and reap them (get exit status and know that any stdout/stderr can now be forwarded onto whomever needs it.

Greg

So right now I have ProcessWindows::DoLaunch() set up to just call Host::LaunchProcess and pass the ProcessLaunchInfo along. This uses the default default monitoring algorithm that’s built into the host layer, which basically just waits for the process to exist and nothing else. This doesn’t work in the case of a debugged process, because there’s no initial stop so Process::Launch (in Target/Process.cpp) doesn’t ever get the notification that the process launched successfully.

Is the solution then to just keep the default monitoring algorithm in Host as is, but in ProcessWindows::DoLaunch(), specify a different callback before passing it through to Host::LaunchProcess? This is roughly what ProcessLinux does, but I wonder what happens if ProcessWindows::DoLaunch() has received a ProcessLaunchInfo that has a user-specified monitoring callback.

So right now I have ProcessWindows::DoLaunch() set up to just call Host::LaunchProcess and pass the ProcessLaunchInfo along. This uses the default default monitoring algorithm that's built into the host layer, which basically just waits for the process to exist and nothing else. This doesn't work in the case of a debugged process, because there's no initial stop so Process::Launch (in Target/Process.cpp) doesn't ever get the notification that the process launched successfully.

Is the solution then to just keep the default monitoring algorithm in Host as is, but in ProcessWindows::DoLaunch(), specify a different callback before passing it through to Host::LaunchProcess?

Yes, this is what you must do since you will want to call Process::SetExitStatus(...).

This is roughly what ProcessLinux does, but I wonder what happens if ProcessWindows::DoLaunch() has received a ProcessLaunchInfo that has a user-specified monitoring callback.

You would need to chain it an call theirs from yours.

What is the monitor_signals variable supposed to represent? Windows doesn’t have signals, so maybe I can just ignore this variable. Looking through the code, it seems like even MacOSX doesn’t use this, and just always sets it to false, and the only place this is ever set to true is in Linux and FreeBSD. Maybe Todd knows? (I don’t have his email new address, but maybe you can +cc him on this email if you don’t know the answer).

I kind of feel like the code would be clearer if ProcessMonitor were an abstract interface in Host, then DefaultWindowsProcessMonitor could do what it needs to do, DefaultPosixProcessMonitor could do what it needs to do, ProcessLaunchInfo could just have a pointer to a ProcessMonitor instead of both a callback and a baton, it would be easier to chain callbacks like this because the class could store a list of callbacks instead of just 1, and this monitor_signals flag could be a member variable of LinuxProcessMonitor so that other platforms wouldn’t have to worry about dubiously ignoring it. Anyway, I’ve had enough refactoring for a while, just thinking out loud.

What is the monitor_signals variable supposed to represent? Windows doesn't have signals, so maybe I can just ignore this variable.

You can ignore this for windows. The typical way to monitor a child process is to call waitpid() which will tell you about signals and exit status. Is there no way to communicate anything between a process via the handle other than the process exiting?

Looking through the code, it seems like even MacOSX doesn't use this, and just always sets it to false, and the only place this is ever set to true is in Linux and FreeBSD. Maybe Todd knows? (I don't have his email new address, but maybe you can +cc him on this email if you don't know the answer).

We are just exposing a feature that people launching subprocesses really might need. People can use signals to communicate between processes, so our host layer needs to include this. Another approach would be to return a launch error if anyone requests monitoring signals on windows.

I kind of feel like the code would be clearer if ProcessMonitor were an abstract interface in Host

We don't use ProcessMonitor on MacOSX.

then DefaultWindowsProcessMonitor could do what it needs to do, DefaultPosixProcessMonitor could do what it needs to do, ProcessLaunchInfo could just have a pointer to a ProcessMonitor instead of both a callback and a baton, it would be easier to chain callbacks like this because the class could store a list of callbacks instead of just 1, and this monitor_signals flag could be a member variable of LinuxProcessMonitor so that other platforms wouldn't have to worry about dubiously ignoring it. Anyway, I've had enough refactoring for a while, just thinking out loud.

I would prefer to avoid this and just have you ignore or return an error if this flag is set. Reasons being, ProcessMonitor is only used by linux and doesn't really need to be in LLDB anymore. We really should be switching over to using lldb-gdbserver (it would be great if windows did this) so we get remote debugging for all platforms.

Greg

What is the monitor_signals variable supposed to represent? Windows doesn’t have signals, so maybe I can just ignore this variable.

You can ignore this for windows. The typical way to monitor a child process is to call waitpid() which will tell you about signals and exit status. Is there no way to communicate anything between a process via the handle other than the process exiting?

The handle is the only interface to the process. So basically if it’s an operation that can be performed, or a query that can be answered, it happens through a handle. What are some use cases for setting monitor_signals to true? For example, in Windows (at least when a process is being debugged), we can detect things like thread creation, dll (e.g. shared object) load and unload, exceptions (divide by zero, segfault, etc). Would this kind of thing fall under the category of what a user might expect to monitor when monitor_signals is true?

If there’s a way to map this monitor_signals flag to something that makes sense on Windows, then I could try to find the common ground and change the name of the variable accordingly. Otherwise I can just ignore it.

Looking through the code, it seems like even MacOSX doesn’t use this, and just always sets it to false, and the only place this is ever set to true is in Linux and FreeBSD. Maybe Todd knows? (I don’t have his email new address, but maybe you can +cc him on this email if you don’t know the answer).

We are just exposing a feature that people launching subprocesses really might need. People can use signals to communicate between processes, so our host layer needs to include this. Another approach would be to return a launch error if anyone requests monitoring signals on windows.

I kind of feel like the code would be clearer if ProcessMonitor were an abstract interface in Host

We don’t use ProcessMonitor on MacOSX.

then DefaultWindowsProcessMonitor could do what it needs to do, DefaultPosixProcessMonitor could do what it needs to do, ProcessLaunchInfo could just have a pointer to a ProcessMonitor instead of both a callback and a baton, it would be easier to chain callbacks like this because the class could store a list of callbacks instead of just 1, and this monitor_signals flag could be a member variable of LinuxProcessMonitor so that other platforms wouldn’t have to worry about dubiously ignoring it. Anyway, I’ve had enough refactoring for a while, just thinking out loud.

I would prefer to avoid this and just have you ignore or return an error if this flag is set. Reasons being, ProcessMonitor is only used by linux and doesn’t really need to be in LLDB anymore. We really should be switching over to using lldb-gdbserver (it would be great if windows did this) so we get remote debugging for all platforms.

I imagine that switching switching to lldb-gdbserver on Windows will happen in the medium to long term, but local debugging and core dump debugging is 90% of what we need, and also significantly less effort. Once there’s more people than just me working on it though things should happen more quickly.

Regarding MacOSX and process monitor it does seem to monitor processes (in the sense that it calls waitpid to get status updates), it just doesn’t use a class called ProcessMonitor in the plugin the same way the Linux stuff does. All the platforms have need of the same high level functionality, so it seems like it would be good to unify all this and make the interfaces consistent, as it opens up more possibilities for code reuse. Even if it’s transitioning to lldb-gdbserver, it just means lldb-gdbserver will need to monitor processes.

In any case, I do plan to just ignore the monitor_signals flag for now, and maybe log a warning or something

>
> What is the monitor_signals variable supposed to represent? Windows doesn't have signals, so maybe I can just ignore this variable.

You can ignore this for windows. The typical way to monitor a child process is to call waitpid() which will tell you about signals and exit status. Is there no way to communicate anything between a process via the handle other than the process exiting?
The handle is the only interface to the process. So basically if it's an operation that can be performed, or a query that can be answered, it happens through a handle. What are some use cases for setting monitor_signals to true? For example, in Windows (at least when a process is being debugged), we can detect things like thread creation, dll (e.g. shared object) load and unload, exceptions (divide by zero, segfault, etc). Would this kind of thing fall under the category of what a user might expect to monitor when monitor_signals is true?

No, this would be something along the lines let me launch a subprocess and and the subprocess would send a SIGUSR1 or SIGUSR2 up to the parent process when it wants some more data. The parent process would wait for the signals and then send more data to the child process. Or you have a parent process that launches the "compile 100 source files" subprocess and wait for 100 SIGCHLD (child process has exited) signals to be sent back to the parent process. Again, this is unix behavior and we don't use it for debugging, but if we are abstracting the OS through a host layer we better support it, or we might end up with someone just calling posix_spawn directly and using waitpid() and monitoring the process with their own thread if we don't. Then things don't run on windows and our host abstraction falls down.

If there's a way to map this monitor_signals flag to something that makes sense on Windows, then I could try to find the common ground and change the name of the variable accordingly. Otherwise I can just ignore it.

Whatever we want to support, we can add/change the callback if needed. So I would look into IPC on windows and see if any the process handle gets used for any of that, and see if we can easily abstract that into something.

> Looking through the code, it seems like even MacOSX doesn't use this, and just always sets it to false, and the only place this is ever set to true is in Linux and FreeBSD. Maybe Todd knows? (I don't have his email new address, but maybe you can +cc him on this email if you don't know the answer).

We are just exposing a feature that people launching subprocesses really might need. People can use signals to communicate between processes, so our host layer needs to include this. Another approach would be to return a launch error if anyone requests monitoring signals on windows.

>
> I kind of feel like the code would be clearer if ProcessMonitor were an abstract interface in Host

We don't use ProcessMonitor on MacOSX.

> then DefaultWindowsProcessMonitor could do what it needs to do, DefaultPosixProcessMonitor could do what it needs to do, ProcessLaunchInfo could just have a pointer to a ProcessMonitor instead of both a callback and a baton, it would be easier to chain callbacks like this because the class could store a list of callbacks instead of just 1, and this monitor_signals flag could be a member variable of LinuxProcessMonitor so that other platforms wouldn't have to worry about dubiously ignoring it. Anyway, I've had enough refactoring for a while, just thinking out loud.

I would prefer to avoid this and just have you ignore or return an error if this flag is set. Reasons being, ProcessMonitor is only used by linux and doesn't really need to be in LLDB anymore. We really should be switching over to using lldb-gdbserver (it would be great if windows did this) so we get remote debugging for all platforms.

I imagine that switching switching to lldb-gdbserver on Windows will happen in the medium to long term, but local debugging and core dump debugging is 90% of what we need, and also significantly less effort. Once there's more people than just me working on it though things should happen more quickly.

Regarding MacOSX and process monitor it does seem to monitor processes (in the sense that it calls waitpid to get status updates), it just doesn't use a class called ProcessMonitor in the plugin the same way the Linux stuff does. All the platforms have need of the same high level functionality, so it seems like it would be good to unify all this and make the interfaces consistent, as it opens up more possibilities for code reuse. Even if it's transitioning to lldb-gdbserver, it just means lldb-gdbserver will need to monitor processes.

In any case, I do plan to just ignore the monitor_signals flag for now, and maybe log a warning or something

Sounds good.

It uses waited only to get exist status. For other signals, they are converted into mach exception and send to the lldb process directly.

– Jean-Daniel

I keep coming back to this interface solution. The issue I’m having is that I need completely different monitoring algorithms for when the process is launched as a debug target versus something like a shell command or utility process where I only want to know when it exits. In both cases it will spawn a thread and run some code in the thread, but the code that runs in the thread will be different.

On the other hand, Process launching and monitoring should be initiated by Host, since it is common functionality, and as you said previously process plugins should be using Host::LaunchProcess but may not be respecting that.

So the issue is: I need my process plugin to be able call Host::LaunchProcess in such a way that Host::LaunchProcess knows to use a different monitoring algorithm.

But then things get tricky. The algorithm it needs to use in the case of a debug target does not really belong in Host, because it will allow me to detect events like dll loads / unloads, thread creation, child process spawning, exceptions, and other things. So this code belongs in the process plugin. The cleanest way I can come up with to really handle this is to let Host implement this interface in such a way that it only monitors for exit status, and let my plugin implement it in such a way that it monitors for all the other stuff as well.

If you’re opposed to this, I can “make it work” it’s going to involve implementing ProcessWindows::DoLaunch without the help of Host::LaunchProcess(), which is what I was trying to avoid.

I think the high level notion of a process status monitor (note this what I’m calling a process status monitor is much more narrowly defined than what is encompassed by the ProcessMonitor class in Linux and FreeBSD) applies to all platform, regardless of whether we’re doing local debugging, lldb-gdbserver, or even no debugging at all (e.g. running a shell command), so I think it’s generally useful.

I can do this in such a way that (for now anyway) only Windows actually makes use of this interface and other platforms’ logic remains untouched, with the idea of using it in the future (after llgs is more cemented, for example).

What are your thoughts? If you need to see some code to make things concrete, I can upload a patch.

I’m still open to other solutions, but ultimately goal is just to find the right abstraction to get the highest amount of code reuse so that all platforms can benefit from each others’ improvements without having to write a bunch of platform specific code themselves.

I keep coming back to this interface solution. The issue I'm having is that I need completely different monitoring algorithms for when the process is launched as a debug target versus something like a shell command or utility process where I only want to know when it exits. In both cases it will spawn a thread and run some code in the thread, but the code that runs in the thread will be different.

On the other hand, Process launching and monitoring should be initiated by Host, since it is common functionality, and as you said previously process plugins should be using Host::LaunchProcess but may not be respecting that.

So the issue is: I need my process plugin to be able call Host::LaunchProcess in such a way that Host::LaunchProcess knows to use a different monitoring algorithm.

But then things get tricky. The algorithm it needs to use in the case of a debug target does not really belong in Host, because it will allow me to detect events like dll loads / unloads, thread creation, child process spawning, exceptions, and other things. So this code belongs in the process plugin. The cleanest way I can come up with to really handle this is to let Host implement this interface in such a way that it only monitors for exit status, and let my plugin implement it in such a way that it monitors for all the other stuff as well.

If you're opposed to this, I can "make it work" it's going to involve implementing ProcessWindows::DoLaunch without the help of Host::LaunchProcess(), which is what I was trying to avoid.

We could pass the launch flags (which would contain eLaunchFlagDebug and could be used to "do the right thing") into the process monitor. Would that solve the issue?

I think the high level notion of a process status monitor (note this what I'm calling a process status monitor is much more narrowly defined than what is encompassed by the ProcessMonitor class in Linux and FreeBSD) applies to all platform, regardless of whether we're doing local debugging, lldb-gdbserver, or even no debugging at all (e.g. running a shell command), so I think it's generally useful.

I do think there is usually a difference between the "process status monitor" and the "I have a special connection to a process because I am debugging it and need all exceptions and other very low level stuff.". So I am not sure it is a great idea to try and merge the two.

I can do this in such a way that (for now anyway) only Windows actually makes use of this interface and other platforms' logic remains untouched, with the idea of using it in the future (after llgs is more cemented, for example).

What are your thoughts? If you need to see some code to make things concrete, I can upload a patch.

Think about the difference between monitoring the process and debugging a process. They are quite different for MacOSX. I believe they are as well on linux (waitpid() is not used as the main way to figure out if a process has stopped due to a breakpoint for example, and windows seems to be quite different).

A simpler approach might be to allow clients to specify if a process should be monitored when launching with a new eLaunchFlagDontMonitor. Only windows would set this from its ProcessWindows::DoLaunch() and then it can run the more complex thread that tracks a process for debugging. On Linux and MacOSX, it is quite ok to monitor a process with waitpid(), and also be debugging it with ptrace() or being attached to a task port with MacOSX.

Greg

So this email originally went through while I was still trying to come up with a working solution. So before I had arrived at the solution I came up with in the patches I uploaded.

Anyway, some rambling ahead.

Ultimately I ended up not going through Host::LaunchProcess, so perhaps most of the questions I asked are moot now. In any case, the biggest problem I had when trying to make ProcessWindows::DoLaunch use Host::LaunchProcess is that the interface wasn’t really flexible enough. It seems that based on the distinction you describe between a process status monitor and a debug monitor, Windows doesn’t ever need to “monitor” a process for anything. At least not when debugging.

Passing it through as a flag like eLaunchFlagDontMonitor as you suggested probably would work, but it seems like at that point we’re sinking platform specific flags into a generic interface. If eLaunchFlagDontMonitor is set if and only if we’re launching a process for debugging on Windows, then imagine if it were called eLaunchFlagWindowsProcess instead.

Part of me thinks that launching processes is just complicated, and maybe we shouldn’t try to mash it into one generic function (e.g. Host::LaunchProcess). That was the driving idea behind the ProcessLauncher abstraction. It allows launching strategies that are plugin specific, or platform specific, to be implemented in plugin-specific or platform-specific ways, or even in multiple different ways.

Of course at some level generic code needs to be able to have one function that just says “hey, launch this process”, but that function could simply look at the flags in the ProcessLaunchInfo and create the correct type of ProcessLauncher based on the flags that are specified.

A concrete example of how this is useful is because right now, it’s unclear what would happen in ProcessWindows if someone passed a monitor callback / baton pair, because I just ignore it. Really that just means that the idea of the monitor callback and baton isn’t actually generic enough, and maybe would be better somewhere else. But if you search through the code, there’s only a few very well-defined places where it’s actually set. The most obvious example is running a shell command. So here, you would just have ShellProcessLauncher that does something different than RegularProcessLauncher, or whatever. You could remove the callback and baton from ProcessLaunchInfo entirely.

Anyway, these are all just high level ideas. FWIW, one of the reasons I’m going out of my way and make great effort to create these abstractions in a way that they’re as reusable and generic as possible, is because it would be great if everything I’m doing now just fits into llgs in the future without a tremendous porting effort. Not having posix compatibility in Windows breaks alot of the assumptions that LLDB originally made, so hopefully incrementally tackling these assumptions over time provides some benefit in the long run.

How about this solution: on windows in Host::LaunchProcess() if eLaunchFlagDebug is set, then return an error if the monitor callback is set. If it is set, don't monitor the process with the usual method that would be used (again only on windows).

There is a lot that goes into launching a process for the arguments, environment, working directory, re-direct stdin/out/err and others, so it would be a shame to duplicate that code especially since the Host::LaunchProcess is so close.

So:
1 - no new launch flag
2 - on windows watch for eLaunchFlagDebug and don't monitor the process if it is set
3 - modify ProcessWindows::DoLaunch() to use Host::LaunchProcess() and then monitor it in the different way for debugging.

I don't see the need for a separate implementation unless I am missing something.

Greg

In the patch I posted, there actually isn’t any duplicated code. All the Windows specific stuff about launching a process is in Host/windows/ProcessLauncherWindows. It has all the stdio redirection and everything else. In Host::LaunchProcess there’s an #ifdef that on Windows makes a ProcessLauncherWindows and uses it. In ProcessWindows::DoLaunch I create a DebugProcessLauncher, which has all the knowledge about launching processes on different threads, debug monitoring, etc, and at some point in it also makes a ProcessLauncherWindows and re-uses it.

So the platform specific part of launching processes on Windows is already re-used between the Host::LaunchProcess and ProcessWindows::DoLaunch codepaths. It’s just that Host::LaunchProcess and ProcessWindows::DoLaunch reuse the same thing, rather than ProcessWindows::DoLaunch directly reusing Host::LaunchProcess.

Another tricky part is that Host::LaunchProcess returns a HostThread and users of the API expect to be able to join on this thread to wait for the process to exit, and that might not work because you might have the same monitor thread debugging multiple processes.

What about having Host::LaunchProcess accept a ProcessLauncher as an argument. If you don’t provide one, it will do what it currently does, if you do provide one it will use that. That’s kind of similar to the status monitor patch you lgtm’ed, but maybe we don’t actually need that, and this woudl be a better change that accomplishes the same thing. This way ProcessWindows::DoLaunch could just pass in the DebugProcessLauncher. Functionally, this is equivalent to what is already happening, but it makes Host::LaunchProcess a little more flexible.

I still kind of don’t like the callback and baton in the ProcessLaunchInfo. Seems like if it doesn’t make sense for a platform, it’s not truly generic and shouldn’t be in generic code. That’s what spurred the comment about different implementations. 95% of process launching doesn’t use it. AFAICT gdb-server and running shell commands are the only thing that uses it. Everything else just uses Process::SetProcessExitStatus. So it seems to make the code a little bit healthier if we sink that logic into the places that need it and other people don’t have to worry about what to do if it’s set since they always expect it to be unset. Their codepaths wouldnt’ even have to make the decision because there would be no callback and baton members on ProcessLaunchInfo.

Not actually proposing that I should do that, just seems like something that could be worked towards incrementally over the long term.

What about having Host::LaunchProcess accept a ProcessLauncher as an argument. If you don't provide one, it will do what it currently does, if you do provide one it will use that. That's kind of similar to the status monitor patch you lgtm'ed, but maybe we don't actually need that, and this woudl be a better change that accomplishes the same thing. This way ProcessWindows::DoLaunch could just pass in the DebugProcessLauncher. Functionally, this is equivalent to what is already happening, but it makes Host::LaunchProcess a little more flexible.

That would be fine.

I still kind of don't like the callback and baton in the ProcessLaunchInfo. Seems like if it doesn't make sense for a platform, it's not truly generic and shouldn't be in generic code. That's what spurred the comment about different implementations. 95% of process launching doesn't use it. AFAICT gdb-server and running shell commands are the only thing that uses it. Everything else just uses Process::SetProcessExitStatus.
So it seems to make the code a little bit healthier if we sink that logic into the places that need it and other people don't have to worry about what to do if it's set since they always expect it to be unset. Their codepaths wouldnt' even have to make the decision because there would be no callback and baton members on ProcessLaunchInfo.

As you know LLDB is very POSIX oriented because it has strong root in unix. I don't consider debug process launching as the default in the host layer, so the fact that only two places right now are launching subprocesses doesn't make the argument everyone should be calling SetProcessExitStatus() by default. There are many simulators here at Apple and out in the world and the Platform::DebugProcess() will need to launch those simulators as subprocesses and be able to feed binaries into them. The process launching on the host needs to reflect the abilities of the underlying host layer, including the ability to monitor signals. I don't care if this is done using callbacks, classes, or any other form, but it needs to stay in the design so that clients on the host can do what they expect to be able to do with a unix child process. If this means we need to extend of change the callback, I am all for it, but it needs to be a superset of all systems we support.

Not actually proposing that I should do that, just seems like something that could be worked towards incrementally over the long term.

That would be fine long term. Not for now, but yes long term.

Sounds like we’re on the same page about everything. It’s hard to talk in the abstract because ideas are just ideas, ultimately you can’t judge something that you haven’t seen. In any case, the idea that I have I think is actually more flexible than what we’re doing currently, in part because Windows needs the additional flexibility, but should still handle all current use cases and be at least somewhat futureproof (including being reusable in llgs). Ultimately I want Windows to fit into and integrate with LLDB in a way that is natural and clean, and not have little hacks and workarounds all over the place. :slight_smile: