Linux/Posix spawning, signal masking

Hi all,

In source/Host/common/Host.cpp, there is a posix process spawning method called LaunchProcessPosixSpawn. On my Linux system, it is used to start the target process in lldb-gdbserver. It looks like FreeBSD uses it as well. Most of the Platform launchers appear to funnel to it as well (via Host::LaunchProcess ()).

There is a section of code in LaunchProcessPosixSpawn that masks all signals for the child process that is started up:

::posix_spawnattr_setsigmask(&attr, &all_signals)

When that is set, it seems to be preventing the child from receiving everything except the non-blockable signals. This has the effect of (at the very least) blocking SIGINT (i.e. ^C from the keyboard) and SIGTERM (standard unadorned “kill pid”). This appears to be the cause of a few bugs as I try to get lldb-gdbserver working on Linux. For example:

  1. hitting ^C on an lldb-gdbserver that spawned a debuggee target process would kill the lldb-gdbserver, but not the debuggee target, which continues to run.

  2. sending a SIGTERM (kill {debuggee pid}) while it is getting debugged and running is ignored.

  3. sending a SIGTERM to the debuggee after issue #1 (where lldb-gdbserver is no longer running) is ignored.

  4. killing lldb-gdbserver from an attached lldb that shuts down and kills the remote does indeed kill lldb-gebserver, but does not kill the target process.

(Of course sending a ‘kill -9 {debuggee pid}’ works fine to kill the target process).

I’ve modified this method locally to not mask any signals:

::posix_spawnattr_setsigmask (&attr, &no_signals)

This seems to address all the problems I had above for lldb-gdbserver as signals propagate properly, and the target responds correctly to signals sent when the parent dies, etc.

I’ve also run all the existing tests (on my system, that amounts to 275 tests that really run), and those are all passing.

However, given that so much code flows through here (or at least appears to) that is not directly related to the lldb-gdbserver — i.e. local linux/FreeBSD debugging — I’m highly skeptical that this is the right fix. I did try using local debugging with lldb with my change in place, and that seemed to work fine. But I’m thinking that perhaps the signal blocking was intended and that behavior is needed in some cases that (perhaps) are not covered by tests that run on Linux.

Any thoughts on why all the signals were getting masked on process spawning? Does that change look okay as is?

Thanks!

Sincerely,
Todd Fiala

It is indeed the right fix. Host.mm has a mac specific version which does this:

    sigset_t no_signals;
    sigset_t all_signals;
    sigemptyset (&no_signals);
    sigfillset (&all_signals);
    ::posix_spawnattr_setsigmask(&attr, &no_signals);
    ::posix_spawnattr_setsigdefault(&attr, &all_signals);

Compared to the incorrect version you noticed in Host.cpp:

    sigset_t no_signals;
    sigset_t all_signals;
    sigemptyset (&no_signals);
    sigfillset (&all_signals);
    ::posix_spawnattr_setsigmask(&attr, &all_signals);
    ::posix_spawnattr_setsigdefault(&attr, &no_signals);

(notice all_signals and no_signals are reversed in the last two function calls in Host.cpp.

You might compare the "LaunchProcessPosixSpawn" in Host.mm and try and just copy it to Host.cpp and see if it works. I can't find anything darwin specific in the code after a quick glance and we have used the Host.mm LaunchProcessPosixSpawn() for a few years now, so it is definitely tested...

So the fix would seem to be:
1 - Remove LaunchProcessPosixSpawn() from Host.mm
2 - Copy it over into Host.cpp
3 - Make sure LaunchProcessPosixSpawn is compiled for Apple builds as well in Host.cpp

It is indeed the right fix.

Ok - I’ll take care of that part. Thanks!

So the fix will be:
1 - Remove LaunchProcessPosixSpawn() from Host.mm
2 - Copy it over into Host.cpp
3 - Make sure LaunchProcessPosixSpawn is compiled for Apple builds as well in Host.cpp

Right?

Correct.

You might need to add "#if defined(__APPLE__)" around the calls to __pthread_chdir and __pthread_fchdir. Those might be darwin specific.

There is no way to set your current working directory when using posix_spawn, so to work around this, we change directory in a thread specific way by calling __pthread_chdir(working_dir) and reverting it using __pthread_fchdir(-1), so as to not hose over the process that has the lldb shared library loaded by changing the global working directory.

So in order to eventually fix being able to obey the launch flag eLaunchFlagDebug and also to be able to set the working directory, non darwin builds might need to change over to using fork()/exec() since after the fork the child process can set its working directory and also call ptrace() to start being debuggable and then calling exec(). When ptrace() has been called to allow the process to be debugged, I believe it stops right at the entry point upon exec. Again, this would be for non-darwin builds since Apple has a posix_spawn flag of POSIX_SPAWN_START_SUSPENDED that can be set to enforce stopping at the entry point and also has the thread specific way to change directory.

Greg

So I just about have this done (the original 3 steps, not switching to fork()/exec()).

One thing I hit was an area I noticed yesterday: on Linux (x86_64, Ubuntu 12.04), I’m finding that I need to not take defaults for all the signal handlers in lldb-gdbserver; otherwise, the target process never starts operating. Technically the target process is exiting immediately with an exit_state of 127 with signal = 0.

For the moment, I have that #ifdef’d in the Apple/Linux/FreeBSD LaunchProcessPosixSpawn method so that only Linux doesn’t reset the child signal handlers to defaults. I’m thinking the FreeBSD/Apple BSD lineage might behave more similarly here as my first stake in the sand for a guess, but that would be changing behavior for FreeBSD. (Ed - any thoughts here? I could just as well flip the #if defined(linux) to #if !defined(APPLE).) Ultimately the root cause of this is not clear to me yet.

For the directory handling, I’m having that handled via __pthread* methods on Apple and continuing to do what it did before on everything else. I’ll look into the fork()/exec() option as a follow up after I work through some of the more obviously broken bits of lldb-gdbserver.

See below for the current state of the patch. I’m about to run tests on Ubuntu and try a build and tests on my MBP.

-Todd

posix_spawn_2014-01-22_01.diff (17.2 KB)

Looks good except the "// No more thread specific current working directory" comment when about to call __pthread_chdir() should be changed to:

  // Set the working directory on this thread only

Thanks. It tested fine on Ubuntu, moving over to OS X.

(And I’ll adjust the comment).

This just went in. I had to make several adjustments for the OSX build as I had code that had to straddle Host.mm and Host.cpp. I updated the fix to include the Haswell change (that method moved from Host.mm to Host.cpp).

I used the configure-based OSX build process, which I can get to build but not run tests yet. Let me know if you hit any OSX test breaks due to this.