problems running the LLDB lit tests on Windows

I’m trying to figure out what’s happening with the LLDB lit tests on Windows. I’m not sure how to proceed with debugging this.

I execute this command:

ninja check-lldb

And several things happen very rapidly:

  1. On the console, I get one warning that says:

D:/src/llvm/mono/llvm-project/llvm\utils\lit\lit\discovery.py:121: ResourceWarning: unclosed file <_io.BufferedReader name=3>
key = (ts, path_in_suite)

  1. Then I get several dozen messages of this form:

D:/src/llvm/mono/llvm-project/llvm\utils\lit\lit\TestRunner.py:727: ResourceWarning: unclosed file <_io.BufferedReader name=6>
res = _executeShCmd(cmd.rhs, shenv, results, timeoutHelper)

  1. I get more than 200 dialog boxes that are essentially assertion failures in the CRT implementation of close. The line complained about in the dialog is:

_VALIDATE_CLEAR_OSSERR_RETURN((fh >= 0 && (unsigned)fh < (unsigned)_nhandle), EBADF, -1);

where fh is the value passed to close. Indeed, fh typically has a value like 452 which is not in the range of 0 to _nhandle because _nhandle is 64.

Starting from 3, I tried to walk up the stack to see what’s going on, but it’s just the generic workings of the Python virtual machine. The close call is happening because something in the .py code is calling close. It’s hard to see the Python code in the debugger. It doesn’t actually seem to be test code.

So I checked out the command line for one of those asserting processes to see if I could figure out which tests are exhibiting the problem.

“C:\python_35\python_d.exe” “-R” “-c” “from multiprocessing.spawn import spawn_main; spawn_main(pipe_handle=992, parent_pid=32640)” “–multiprocessing-fork”

The pipe_handle value does not correspond to the value being passed to the close. The parent_pid always refers to the parent lit command.

There always seem to be 32 Python processes in this state. If I kill one, another is immediately spawned to replace it (creating a new assertion failure dialog). I’m guessing that if I continued, there would be one for each test, and that somewhere there’s a limit of 32 processes at a time.

So this kind of sounds like a lit bug, but other lit tests (as in ninja check-llvm) run just fine. So it has something to do with how we invoke lit for LLDB. The command being executed, per the build.ninja file, is:

cd /D D:\src\llvm\build\mono\tools\lldb\lit && C:\python_35\python_d.exe D:/src/llvm/build/mono/./bin/llvm-lit.py -sv --param lldb_site_config=D:/src/llvm/build/mono/tools/lldb/lit/lit.site.cfg --param lldb_unit_site_config=D:/src/llvm/build/mono/tools/lldb/lit/Unit/lit.site.cfg D:/src/llvm/build/mono/tools/lldb/lit

The LLDB-specific things in the command are lit configs, with which I’ve been blissfully ignorant. Should I head down that rabbit hole? Could this be a problem with my environment?

See my comment in https://reviews.llvm.org/D45333 .

r330275 changed how lldb’s lit tests were set up. This gives cmake errors using the Visual Studio generator; I wouldn’t be surprised if what you’re seeing using ninja is the same issue.

Short version: the cmake code that sets up the lit config in lldb is different from the cmake code that sets up the lit config in clang. This is causing the VS generator errors, and might be causing your problems with ninja.

If I run the llvm lit tests with the debug build of Python, I get the same kind of errors, so I think this is a bug in lit that we haven’t seen because people have been using it with non-debug Python. I’m investigating that angle.