Bug ID 25019
Summary race in process return code collection in python test runner
Component All Bugs
I have noticed that roughly 1 in 5 times, one of the process control child process return code tests is failing. It is receiving a 0 return code from the process that was definitely returning a 10 value (verified in multiple ways). This looks like it maybe similar to what was posted here: [http://stackoverflow.com/questions/31539749/python-subprocess-read-returncode-is-sometimes-different-from-returned-code](http://stackoverflow.com/questions/31539749/python-subprocess-read-returncode-is-sometimes-different-from-returned-code) It looks like the root of it is that subprocess.Popen.communicate() indirectly sets the return code, and wait() does as well. And the mechanism by which this happens is not thread safe. Since this code is happening in two different threads of execution (intentionally), this is going to be problematic. I had initially torn out the communicate() call and replaced it with an asyncore loop, which we can easily work in our own poll() check to manage lifetime of the process. There's a simple state flow that needs to happen here. I yanked that out since we didn't have the asyncore.file_dispatcher on the Windows side. I think we may need a simple stdout/stderr pipe pump that essentially does what asyncore did, in a pure-python manner, so that we can make sure poll() happens in a thread-safe manner.