Running tests on OS X 10.10 vs "Killed: 9"

Hi,

building ‘check-all’ on any of my machines running OS X 10.10 usually fails because a few tests fail due to some processes being killed by the kernel (there’s always “Killed: 9” somewhere in lit’s error output). Everything’s fine on 10.9.

How do folks deal with this? Don’t use 10.10 for building llvm? Is there some tweakable to tell the kernel “please don’t kill my processes”?

Here’s an example from just now:

FAIL: LLVM :: MC/X86/shuffle-comments.s (18589 of 28139)
******************** TEST ‘LLVM :: MC/X86/shuffle-comments.s’ FAILED ********************
Script:

I’ve run into this before; there is a relatively simple fix that Chris Bieneman shared. CC’ing him because I forget the magical incantation. It had something to do with setting -m 1 on the taskgated command line in some init file IIRC.

– Sean Silva

Hi,

building 'check-all' on any of my machines running OS X 10.10 usually
fails because a few tests fail due to some processes being killed by the
kernel (there's always "Killed: 9" somewhere in lit's error output).
Everything's fine on 10.9.

How do folks deal with this? Don't use 10.10 for building llvm? Is there
some tweakable to tell the kernel "please don't kill my processes"?

Here's an example from just now:

FAIL: LLVM :: MC/X86/shuffle-comments.s (18589 of 28139)

Btw, for the sake of having this in writing somewhere: one other thing I've
run into (I was generating a huge number of lit tests for an artificial
reason; wasn't during normal development) is that once you go past ~32K
tests (2^15) lit's current way it queues up processes will cause it to
completely hang on Mac. Essentially the problem is that it has a for loop
like:

for t in tests:
    push_onto_threadsafe_queue(t)
... kick off worker threads that consume from the queue ...

unfortunately, push_onto_threadsafe_queue ups an OS semaphore and on mac
there is an upper limit of ~32K. Once you hit that
the push_onto_threadsafe_queue operation will block and therefore lit will
be completely blocked.

Someday we will need to e.g. spawn off a feeder thread to do the
push_onto_threadsafe_queue loop.

Filipe, did you say that you had actually run into the 32K limit at some
point during regular development? (what combination of LLVM repos were
checked out and being tested simultaneously?)

-- Sean Silva

Hi,

building 'check-all' on any of my machines running OS X 10.10 usually
fails because a few tests fail due to some processes being killed by the
kernel (there's always "Killed: 9" somewhere in lit's error output).
Everything's fine on 10.9.

How do folks deal with this? Don't use 10.10 for building llvm? Is there
some tweakable to tell the kernel "please don't kill my processes"?

Here's an example from just now:

FAIL: LLVM :: MC/X86/shuffle-comments.s (18589 of 28139)

Btw, for the sake of having this in writing somewhere: one other thing
I've run into (I was generating a huge number of lit tests for an
artificial reason; wasn't during normal development) is that once you go
past ~32K tests (2^15) lit's current way it queues up processes will cause
it to completely hang on Mac. Essentially the problem is that it has a for
loop like:

for t in tests:
    push_onto_threadsafe_queue(t)
... kick off worker threads that consume from the queue ...

Maybe clearer:

for t in tests:
    push_onto_threadsafe_queue(t)
...
kick off worker threads that consume from the queue
...

i.e. It pushes all tests onto the queue before it starts kicking off jobs.

-- Sean Silva

If you have:
llvm, clang, libcxx libcxxabi, compiler-rt, lld
Then you get around 31K tests. IIRC, those problems were arising before Chris Bieneman fixed compiler-rt tests to not test iOS arches on a Mac.

Right now, even enabling compiler-rt for iOS will still generate ~31K tests. But it’s very close to the limit.

Sorry for the late reply on this. 10.10 and later hit a threading bug on OS X that causes a number of tests *usually equal to the number of hardware threads on your system) to fail. The workaround for this is to modify the taskgated launch daemon so that it only uses one thread. You can do so by editing the launchd plist at /System/Library/LaunchDaemons/com.apple.taskgated.plist to resemble this:

<?xml version="1.0" encoding="UTF-8"?> POSIXSpawnType Interactive EnableTransactions Label com.apple.taskgated MachServices com.apple.taskgated TaskSpecialPort 9 ProgramArguments /usr/libexec/taskgated -s -m 1

After editing the file, you’ll need to run:

sudo launchctl unload /System/Library/LaunchDaemons/com.apple.taskgated.plist && sudo launchctl load /System/Library/LaunchDaemons/com.apple.taskgated.plist

That should restart the daemon and all will work.

-Chris

Thanks, that seems to help!