Problem with dotest_channels.py

Hi Todd, lately I’ve been seeing this sporadically when running the test suite.

[TestNamespaceLookup.py FAILED]
Command invoked: C:\Python27_LLDB\x86\python_d.exe D:\src\llvm\tools\lldb\test\dotest.pyc -q --arch=i686 --executable D:/src/llvmbuild/ninja/bin/lldb.exe -s D:/src/llvmbuild/ninja/lldb-test-traces -u CXXFLAGS -u CFLAGS --enable-crash-dialog -C d:\src\llvmbuild\ninja_release\bin\clang.exe --results-port 55886 --inferior -p TestNamespaceLookup.py D:\src\llvm\tools\lldb\packages\Python\lldbsuite\test --event-add-entries worker_index=10:int
416 out of 416 test suites processed - TestAddDsymCommand.py error: uncaptured python exception, closing channel <lldbsuite.test.dotest_channels.UnpicklingForwardingReaderChannel connected 127.0.0.1:56008 at 0x2bdd578> (<class ‘socket.error’>:[Errno 10054] An existing connection was forcibly closed by the remote host [C:\Python27_LLDB\x86\lib\asyncore.py|read|83] [C:\Python27_LLDB\x86\lib\asyncore.py|handle_read_event|449] [D:\src\llvm\tools\lldb\packages\Python\lldbsuite\test\dotest_channels.py|handle_read|133] [C:\Python27_LLDB\x86\lib\asyncore.py|recv|387])

It seems to happen randomly and not always on the same test. Sometimes it doesn’t happen at all. I wonder if this could be related to some of the work that’s been going on recently. Are you seeing this? Any idea how to diagnose?

Hi Zachary!

If nothing else, maybe we can print out a more useful exception backtrace. What kind of exception, what line, and what was the message? That might help give us a better idea of what’s causing it.

Yeah that’s a messed up exception scenario that is hard to read. I’ll figure something out when I repro it. One side is closing before the other is expecting it, but likely in a way we need to expect.

I think it is ugly-ified because it is coming from some kind of worker thread within async-core.

I will get something in to help it today. The first bit may be just catching the exception as you mentioned.

Hey Zachary,

I just put in:
r255581

which should hopefully:
(1) catch the exception you see there,
(2) handle it gracefully in the common and to-be-expected case of the test inferior going down hard, and
(3) print out an error if anything else unexpected is happening here.

Let me know if you get any more info with it. Thanks!

-Todd

With Todd’s change, I was getting a Ninja crash. Zach and I replaced the returns Todd added with raises, in order to propagate the exception up the stack, and that avoids the Ninja crash, so I’ll check that in in a moment.

In the mean time, here’s the error message we got out of it.

155 out of 416 test suites processed - TestBacktraceAll.py
INFO: received socket error when reading data from test inferior:
[Errno 10054] An existing connection was forcibly closed by the remote host
error: uncaptured python exception, closing channel <lldbsuite.test.dotest_channels.UnpicklingForwardingReaderChannel connected 127.
0.0.1:58961 at 0x2bb8878> (<class ‘socket.error’>:[Errno 10054] An existing connection was forcibly closed by the remote host [D:\Python_for_lldb\x86\lib\asyncore.py|read|83] [D:\Python_for_lldb\x86\lib\asyncore.py|handle_read_event|449] [D:\src\llvm\llvm\tools\lldb\packages\Python\lldbsuite\test\dotest_channels.py|handle_read|137] [D:\Python_for_lldb\x86\lib\asyncore.py|recv|387])
175 out of 416 test suites processed - TestNoSuchArch.py

I wonder if you need a flush somewhere before you invoke the cleanup func? Would that do it? It looks like the sending side of the connection is closing before the receiving side has received all its data.

Yeah I’ll have a look at what it’s doing.

I wouldn’t expect a return to crash there, just not receive the data. I’m guessing other parts of asyncore code might be doing invalid things with the socket at that point. We do need to be able to handle this case, though, on timeouts that kill the sending/inferior side.