Have you been able to isolate the test which causes the deadlock? You should be able to get close to figuring it out by seeing which processes are hung and maybe which tests aren’t finished…
Honestly, lit should just create temp files and get out of the business of polling and reading from subprocess pipes.
I think you have more or less diagnosed the problem, with the caveat that communicate will not block because the underlying pipes of that process are full. It is more likely that some other process is blocked writing to a full pipe, and the process under communication is also waiting on that pipe. Consider this pipeline:
llc -debug -mtriple=x86_64-linux < %s | FileCheck %s
In this case, llc will dump lots of text to stderr, which is piped to lit. That buffer will fill and writes will block. lit will ‘communicate’ with FileCheck, and no progress will be made.
The Python 2.7.2 docs specifically say that calls to wait() when stdout / stderr == PIPE, and access to stdin.read / .write and stderr.read will cause deadlocks. It seems this happens when the OS pipe buffer is filled, so everything below is correct in terms of root cause. I see uses of all of those in this script but am not well versed in python and can't offer any suggestions other than the one to use communicate() for everything mentioned on the linked page:
In this case, llc will dump lots of text to stderr, which is piped to lit. That buffer will fill and writes will block. lit will ‘communicate’ with FileCheck, and no progress will be made