[Bug 25019] New: race in process return code collection in python test runner

Bug ID 25019
Summary race in process return code collection in python test runner
Product lldb
Version unspecified
Hardware PC
OS All
Status NEW
Severity normal
Priority P
Component All Bugs
Assignee lldb-dev@lists.llvm.org
Reporter todd.fiala@gmail.com
CC llvm-bugs@lists.llvm.org
Classification Unclassified

I have noticed that roughly 1 in 5 times, one of the process control child
process return code tests is failing.  It is receiving a 0 return code from the
process that was definitely returning a 10 value (verified in multiple ways).

This looks like it maybe similar to what was posted here:

It looks like the root of it is that subprocess.Popen.communicate() indirectly
sets the return code, and wait() does as well.  And the mechanism by which this
happens is not thread safe.

Since this code is happening in two different threads of execution
(intentionally), this is going to be problematic.

I had initially torn out the communicate() call and replaced it with an
asyncore loop, which we can easily work in our own poll() check to manage
lifetime of the process.  There's a simple state flow that needs to happen

I yanked that out since we didn't have the asyncore.file_dispatcher on the
Windows side.

I think we may need a simple stdout/stderr pipe pump that essentially does what
asyncore did, in a pure-python manner, so that we can make sure poll() happens
in a thread-safe manner.

Todd Fiala changed bug 25019