Moving test runner timeout logic into Python

Hi all,

Over the last two days, I’ve hit some inconsistencies across platforms surrounding signal handling and the operation of the timeout/gtimeout executable mechanism that we use to handle timeouts of tests. The net result is I still see tests sometimes hang up the test running process, even though my changes in the last couple days seem to have reduced the frequency somewhat.

I’d like to address that once and for all with something that is less likely to differ across platforms. I have a relatively simple way to do that within the parallel test runner directly. I’m planning on prototyping that now, but before I dive too far into that, I wanted to expose the idea in case somebody had any major concerns with not using timeout/gtimeout on the systems that had it.

I expect it to be a relatively small change when I get it up for review.

The nice thing about going straight-python on it is we should get the same behavior everywhere, and not depend on signal handling to do it.

Thoughts?

Can you offer a hint about how you plan to implement this? When you say it we should get the same behavior everywhere, I assume this means Windows too, which currently does not support running with a timeout at all (because timeout / gtimeout aren’t present)

Yep - the approach (for now) is likely to look like:

p = subprocess.Popen(…) # exact call differs between Windows/Non-Windows

done_event = # some kind of semaphore/event, probably threading.Thread.Event()

spinup thread 1, running this code:

Thread 1 - grab output, do communicate() call

p.communicate()

Signal we finished - the process ended successfully.

done_event.signal()

…back to the thread that called subprocess.Popen()

Wait for time timeout value for the inferior dotest.py process to complete…

timed_out = done_event.wait(timeout_in_seconds)

If timed_out indicates the timeout occurred, we timed out.

And thus, the process did not finish on time.

if timed_out == True:

Kill the inferior dotest

p.kill() # or p.terminate()

This will cause the other thread to fall through now, but we know it timed out.

Could get fancier here and do a nice kill, then a less blockable kill. But make the

process die one way or another.

do the other post-process activity here…

^= that’s rough pseudo-code. I need to look up a few details. But that’s more or less what I was thinking. Looked like all of that was available on Windows. We can also have it only optionally time out.

Something like that is what I had in mind.

A nice bit here, also, is for those places where we are using timeout (Linux, OS X, etc.) we get to trade off and use a thread where we were using a whole different process. (i.e. the timeout wrapper process goes away).

No obvious reason I see why that wouldn’t work. You probably want to wrap the “thread 1” code in a try: … except: pass because p.terminate probably will cause an exception on the other thread.

Yeah good idea.

Anyways, that’s what I’m going after.

On the Windows front, is there any reason other than lack of timeout/gtimeout why you wouldn’t want timeouts? I’m trying to figure if there is any reason I would want to work this in as an optional thing. (Making it not optional would be slightly less complicated but either way isn’t particularly a big deal).

-Todd

We definitely want timeouts. I was planning to implement timeout / gtimeout in C++ and checking it in and building it as part of the build process. But this would be better for obvious reasons.

Cool. Win win :slight_smile: