Python object lifetimes affect the reliability of tests

I’ve tracked down a source of flakiness in tests on Windows to Python object lifetimes and the SB interface, and I’m wondering how best to handle it.

Consider this portion of a test from TestTargetAPI:

def find_functions(self, exe_name):
“”“Exercise SBTaget.FindFunctions() API.”""
exe = os.path.join(os.getcwd(), exe_name)

Create a target by the debugger.

target = self.dbg.CreateTarget(exe)
self.assertTrue(target, VALID_TARGET)
list = target.FindFunctions(‘c’, lldb.eFunctionNameTypeAuto)
self.assertTrue(list.GetSize() == 1)

for sc in list:
self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
self.assertTrue(sc.GetSymbol().GetName() == ‘c’)

The local variables go out of scope when the function exits, but the SB (C++) objects they represent aren’t (always) immediately destroyed. At least some of these objects keep references to the executable module in the shared module list, so when the test framework cleans up and calls SBDebugger::DeleteTarget, the module isn’t orphaned, so LLDB maintains an open handle to the executable.

The result of the lingering handle is that, when the next test case in the test suite tries to re-build the executable, it fails because the file is not writable. (This is problematic on Windows because the file system works differently in this regard than Unix derivatives.) Every subsequent case in the test suite fails.

I managed to make the test work reliably by rewriting it like this:

def find_functions(self, exe_name):
“”“Exercise SBTaget.FindFunctions() API.”""
exe = os.path.join(os.getcwd(), exe_name)

Create a target by the debugger.

target = self.dbg.CreateTarget(exe)
self.assertTrue(target, VALID_TARGET)

try:
list = target.FindFunctions(‘c’, lldb.eFunctionNameTypeAuto)
self.assertTrue(list.GetSize() == 1)

for sc in list:
try:
self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
self.assertTrue(sc.GetSymbol().GetName() == ‘c’)
finally:
del sc

finally:
del list

The finally blocks ensure that the corresponding C++ objects are destroyed, even if the function exits as a result of a Python exception (e.g., if one of the assertion expressions is false and the code throws an exception). Since the objects are destroyed, the reference counts are back to where they should be, and the orphaned module is closed when the target is deleted.

But this is ugly and maintaining it would be error prone. Is there a better way to address this?

In general, it seems bad that our tests aren’t well-isolated. I sympathize with the concern that re-starting LLDB for each test case would slow down testing, but I’m also concerned that the state of LLDB for any given test case can depend on what happened in the earlier cases.

Adrian.

I stumbled upon similar problem when was looking into why SBDebugger wasn’t unloaded upon app’s exit.
The problem was in Python global objects like lldb.debugger, lldb.target sitting around.
So, my guess is to try to call ScriptInterpreterPython::Clear within test’s tearDown call - e.g., expose Clear method as part of SBCommandInterpreter and call it via SBDebugger::GetCommandInterpreter

That wouldn’t work in this case because it causes a failure from one test to the next. So a single test suite has 5 tests, and the second one fails because the first one didn’t clean up correctly. You couldn’t call ScriptInterpreterpython::Clear here, because then you’d have to initialize it again, and while it might work, it seems scary and like something which is untested and we recommend you don’t do.

What about calling gc.collect() in the tearDown() method?

I've tracked down a source of flakiness in tests on Windows to Python
object lifetimes and the SB interface, and I'm wondering how best to handle
it.

Consider this portion of a test from TestTargetAPI:

    def find_functions(self, exe_name):
        """Exercise SBTaget.FindFunctions() API."""
        exe = os.path.join(os.getcwd(), exe_name)

        # Create a target by the debugger.
        target = self.dbg.CreateTarget(exe)
        self.assertTrue(target, VALID_TARGET)
        list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
        self.assertTrue(list.GetSize() == 1)

        for sc in list:
            self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() ==
exe_name)
            self.assertTrue(sc.GetSymbol().GetName() == 'c')

The local variables go out of scope when the function exits, but the SB
(C++) objects they represent aren't (always) immediately destroyed. At
least some of these objects keep references to the executable module in the
shared module list, so when the test framework cleans up and calls
`SBDebugger::DeleteTarget`, the module isn't orphaned, so LLDB maintains an
open handle to the executable.

The result of the lingering handle is that, when the next test case in the
test suite tries to re-build the executable, it fails because the file is
not writable. (This is problematic on Windows because the file system
works differently in this regard than Unix derivatives.) Every subsequent
case in the test suite fails.

I managed to make the test work reliably by rewriting it like this:

    def find_functions(self, exe_name):
        """Exercise SBTaget.FindFunctions() API."""
        exe = os.path.join(os.getcwd(), exe_name)

        # Create a target by the debugger.
        target = self.dbg.CreateTarget(exe)
        self.assertTrue(target, VALID_TARGET)

        try:
            list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
            self.assertTrue(list.GetSize() == 1)

            for sc in list:
                try:

self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
                    self.assertTrue(sc.GetSymbol().GetName() == 'c')
                finally:
                    del sc

        finally:
            del list

The finally blocks ensure that the corresponding C++ objects are
destroyed, even if the function exits as a result of a Python exception
(e.g., if one of the assertion expressions is false and the code throws an
exception). Since the objects are destroyed, the reference counts are back
to where they should be, and the orphaned module is closed when the target
is deleted.

But this is ugly and maintaining it would be error prone. Is there a
better way to address this?

We should drill into why they're not cleaning up right away. If it's out
of our control (i.e. some kind of lazy clean up), then maybe we can put in
some meta-magic that generates the deletes like you added.

In general, it seems bad that our tests aren't well-isolated.

Each .py file is getting run as a separate process unless the tests are run
in a way that disables the concurrent test runner. Methods within a given
.py file are getting run in the same process. So we're really talking
about test methods within a file, not test methods across files.

I sympathize with the concern that re-starting LLDB for each test case
would slow down testing, but I'm also concerned that the state of LLDB for
any given test case can depend on what happened in the earlier cases.

One thing I plan to do mid-term is:
* announce all the tests that can run in a python file
* ensure we report on tests *not* run because of a crash (i.e. get counts
for true unknowns, not just info on tests that pass/fail/error).
* having an "announce-only" pass so we can dynamically collect up all he
tests without necessarily running them. (for things like test runners that
can list all the tests available and want to show progress).

This would conceivably support the underpinnings to say:
* collect all the tests (announce-only)
* add a way to fire up a single test method (we sortta have that already),
and only do one test method per process.

I think there is additional coordination that would have to happen to
handle inferior binaries build for a test and make sure they exist where
(and when) they need to, with the right arch settings and whatnot.

But, that would definitely get full isolation per test method.

That wouldn't work in this case because it causes a failure from one test
to the next. So a single test suite has 5 tests, and the second one fails
because the first one didn't clean up correctly. You couldn't call
ScriptInterpreterpython::Clear here, because then you'd have to initialize
it again, and while it might work, it seems scary and like something which
is untested and we recommend you don't do.

What about calling `gc.collect()` in the tearDown() method?

If it's a laziness thing, that seems like it might do it. I would think we
could stick that in the base test class and get it everywhere. Is that
something you can try, Adrian?

That wouldn't work in this case because it causes a failure from one test
to the next. So a single test suite has 5 tests, and the second one fails
because the first one didn't clean up correctly. You couldn't call
ScriptInterpreterpython::Clear here, because then you'd have to initialize
it again, and while it might work, it seems scary and like something which
is untested and we recommend you don't do.

What about calling `gc.collect()` in the tearDown() method?

If it's a laziness thing, that seems like it might do it. I would think
we could stick that in the base test class and get it everywhere. Is that
something you can try, Adrian?

That seemed promising, but it doesn't seem to work, so maybe I don't
understand the problem as well as I thought I did.

The gc.collect works only if I set the variables to None in the finally blocks, which is no better than just del’ing the objects in the finally blocks.

I've tracked down a source of flakiness in tests on Windows to Python object lifetimes and the SB interface, and I'm wondering how best to handle it.

Consider this portion of a test from TestTargetAPI:

def find_functions(self, exe_name):
     """Exercise SBTaget.FindFunctions() API."""
     exe = os.path.join(os.getcwd(), exe_name)

     # Create a target by the debugger.
     target = self.dbg.CreateTarget(exe)
     self.assertTrue(target, VALID_TARGET)
     list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
     self.assertTrue(list.GetSize() == 1)

     for sc in list:
         self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
         self.assertTrue(sc.GetSymbol().GetName() == 'c')

The local variables go out of scope when the function exits, but the SB (C++) objects they represent aren't (always) immediately destroyed. At least some of these objects keep references to the executable module in the shared module list, so when the test framework cleans up and calls `SBDebugger::DeleteTarget`, the module isn't orphaned, so LLDB maintains an open handle to the executable.

Creating a target with:

  target = self.dbg.CreateTarget(exe)

Will give you a SBTarget object that has a strong reference to the target, but the debugger still has a copy in its target list, so the SBTarget isn't designed to delete the object when the target variable goes out of scope. If you want the target to be deleted, you actually have to call through to the debugger with:

bool
SBDebugger:DeleteTarget (lldb::SBTarget &target);

So the right way to clean up the target is:

self.dbg.DeleteTarget(target);

Even though there might be code within LLDB that has a valid shared pointer to the lldb_private::Target still, it calls lldb_private::Target::Destroy() which clears out most instance variable (the module list, the process, any plug-ins, etc).

SBTarget objects have strong references so that they _can_ keep the object alive if needed in case someone else destroys the target on another thread, but they don't control the lifetime of the target.

Other objects have weak references to the objects: SBProcess, SBThread, SBFrame. If the objects are actually destroyed already, the weak pointer won't be able to get a valid shared pointer to the underlying object
and any SB API calls on these objects will return error, none, zero, etc...

The result of the lingering handle is that, when the next test case in the test suite tries to re-build the executable, it fails because the file is not writable. (This is problematic on Windows because the file system works differently in this regard than Unix derivatives.) Every subsequent case in the test suite fails.

I managed to make the test work reliably by rewriting it like this:

def find_functions(self, exe_name):
     """Exercise SBTaget.FindFunctions() API."""
     exe = os.path.join(os.getcwd(), exe_name)

     # Create a target by the debugger.
     target = self.dbg.CreateTarget(exe)
     self.assertTrue(target, VALID_TARGET)

     try:
         list = target.FindFunctions('c', lldb.eFunctionNameTypeAuto)
         self.assertTrue(list.GetSize() == 1)

         for sc in list:
             try:
                 self.assertTrue(sc.GetModule().GetFileSpec().GetFilename() == exe_name)
                 self.assertTrue(sc.GetSymbol().GetName() == 'c')
             finally:
                 del sc

     finally:
         del list

The finally blocks ensure that the corresponding C++ objects are destroyed, even if the function exits as a result of a Python exception (e.g., if one of the assertion expressions is false and the code throws an exception). Since the objects are destroyed, the reference counts are back to where they should be, and the orphaned module is closed when the target is deleted.

But this is ugly and maintaining it would be error prone. Is there a better way to address this?

So you should be able to fix this by deleting the target with "self.dbg.DeleteTarget(target)"

We could change all tests over to always store any targets they create in the test object itself:

self.target = self.dbg.CreateTarget(exe)

Then the test suite could check for the existance of "self.target" and if it exists, it could call "self.dbg.DeleteTarget(self.target)" automatically to avoid such issues?

We actually do already to the self.dbg.DeleteTarget(target), and that’s the line that’s failing. The reason it’s failing is because the ‘sc’ reference is still alive, which is holding an mmap, which causes a mandatory file lock on Windows.

The diagnostics went pretty deep into python internals, but I think we might have figured it out. I don’t know if this is a bug in Python, but I think we’d probably need to ask Guido to be sure :slight_smile:

As far as we can tell, what happens is that on the exceptional codepath (e.g the assert fails), you walk back up the stack until you get to the except handler. This exception handler is in TestCase.run(). After it handles the exception it goes and runs teardown. However, for some reason, Python is still holding a strong reference to the traceback, even though we’re completely out of the finally block. What this means is that if you call sys.exc_info() *even after you’ve exited the finally block, it still returns info about the previous exception that’s not even being handled anymore. I would have expected this to be gone since there’s no exception in-fligth anymore. So basically, Python is still holding a reference to the active exception, the exception holds the stack frame, the stack frame holds the test method, the test method has locals, one of which is a SymbolList, a member of which is symbol context, which has the file locked.

Our best guess is that if you have something like this:

def foo():
try:

Do stuff

except Exception, e:
pass

Do more stuff

that if the exceptional path is executed, then both e and sys.exc_info() are alive while do more stuff is happening. We’ve found two ways to fixthis:

  1. Change to this:

def foo():
try:

Do stuff

except Exception, e:
pass
del e
sys.exc_clear()

Do more stuff

  1. Put the try / except inside a function. When the function returns, sys.exc_info() is cleared.

I like 2 better, but we’re still testing some more to make sure this really fixes it 100% of the time.

To add more evidence for this, here’s a small repro:

import sys

print "sys.exc_info() = ", “Empty” if sys.exc_info() == (None, None, None) else “Valid”
try:
raise Exception
except Exception, e:
print "sys.exc_info() = ", “Empty” if sys.exc_info() == (None, None, None) else “Valid”
pass

print "sys.exc_info() = ", “Empty” if sys.exc_info() == (None, None, None) else “Valid”
print "e = ", “Bound” if ‘e’ in vars() else “Unbound”
pass

For me this prints

sys.exc_info() = Empty
sys.exc_info() = Valid
sys.exc_info() = Valid
e = Bound

Couldn’t we just change DeleteTarget to make sure everything is unmapped?

It’s not that simple. A test method could be holding onto arbitrary resources which could in theory prevent cleanup. tests can register their own cleanup handlers for example, and the teardown will call back into the cleanup handler. So one of those handlers could be making the assumption that it can clean something up even though it can’t because the test method is still holding onto some resources.

I want to find out if this is a bug in my Python implementation, because it seems quite strange to me that sys.exc_info(), which is documented to return information about the exception currently being processed, still returns something after it is finished being processed.

One possible approach is to never run a built executable directly, but to copy it to some other place (e.g. my-exe.exe → my-exe-run-13.exe) and only run the copied executable. Then rebuilding the original won’t fail.

it doesn’t appear to be a bug in my implementation. So it seems like everyone has been experiencing this problem, but Windows it just made a big difference because of the mandatory file locks. From the Python dcoumentation,

This function returns a tuple of three values that give information about the exception that is currently being handled. The information returned is specific both to the current thread and to the current stack frame. If the current stack frame is not handling an exception, the information is taken from the calling stack frame, or its caller, and so on until a stack frame is found that is handling an exception. Here, “handling an exception” is defined as “executing or having executed an except clause.” For any stack frame, only information about the most recently handled exception is accessible.

So within a given stack frame, if an exception has been handled before you, then every frame and every variable reachable from that frame still has a strong reference to it.

So it sounds like our solution of moving this exception handler into a function or calling sys.exc_clear() is the right fix.

Thanks everyone.

To close the loop, we’ve address this problem with this patch: http://reviews.llvm.org/rL250467