Too many ModuleSP references

Hello all,

I've been debugging the newly added TestStepOverBreakpoint.py, which
has been failing on windows, for a very prosaic reason: after the test
completes, we are unable to run "make clean" because lldb still holds
the file open.

After some debugging, I've found that this happens because the test
case stores the SBBreakpoint object in a member variable of the python
test case class. The breakpoint class ends up transitively holding a
reference to the main executable module, which prevents the module
from being garbage-collected when the target is destroyed.

For reference, the full ownership chain is something like:
StepOverBreakpointTestCase(python) => SBBreakpoint => Breakpoint =>
BreakpointLocation => LLVMUserExpression => IRExecutionUnit =>
SymbolContext => Module.

To get the test working, we need to break this chain somewhere. A
couple of places I see are:
- BreakpointLocation: Remove the compiled expression reference when
the target is destroyed (AFAICS, it is used as a cache to avoid
recomputing the expression every time. It can be theoretically
recomputed if needed, but that shouldn't be necessary as the target is
destroyed anyway)

- SBBreakpoint: make SBBreakpoint hold a weak_ptr to the Breakpoint
object. When the target is destroyed, the SBBreakpoint object becomes
invalid (One doesn't cannot do anything useful with the breakpoint
once the target has been deleted anyway).

- StepOverBreakpointTestCase: Have the test not store the breakpoints
in the test case object. Basically, declare that this is not a bug,
and it's the users responsibility to clean up necessary objects.

Any thoughts on what is the appropriate solution here?

cheers,
pavel

No comments on this specifically, but +1 to reducing shared_ptr usage in general. We use way too many, and often it feels like shared_ptr is used as a way to avoid having to think about ownership, which leads to more problems than it solves imo

Thanks for digging into this.

Maybe a better option would be to have IRExecutionUnit hold a SymbolContextScope, rather than a SymbolContext. We started out with SymbolContext's because they were convenient but they hold too many things alive so when we run into SymbolContext's we've been converting them to the SymbolContextScope's when possible. I can't see any need for the IRExecutionUnit to hold the module alive, so long as it handles failure to get the SymbolContext gracefully, which shouldn't be hard.

Jim

No comments on this specifically, but +1 to reducing shared_ptr usage in general. We use way too many, and often it feels like shared_ptr is used as a way to avoid having to think about ownership, which leads to more problems than it solves imo

We do a pretty good job of using shared pointers where they are needed, so I don’t agree with the above statement. We have strong memory models in LLDB and certain things need to keep certain things alive. LLVM is quite the wild west with regard to memory management, so I don’t want to base any changes on it. We just need to use std::weak_ptr when needed. There should be no two items that contain strong references to each other.

In general think about what should keep things alive. If I have a SBModule variable, I should expect that it won’t let my module go away. If I have a SBTarget, I would expect it to keep the target around as long as I have a reference. Things that belong to any of these items, like breakpoints, watchpoints, compile units, functions, shouldn’t keep the module or target around.

Hello all,

I’ve been debugging the newly added TestStepOverBreakpoint.py, which
has been failing on windows, for a very prosaic reason: after the test
completes, we are unable to run “make clean” because lldb still holds
the file open.

After some debugging, I’ve found that this happens because the test
case stores the SBBreakpoint object in a member variable of the python
test case class. The breakpoint class ends up transitively holding a
reference to the main executable module, which prevents the module
from being garbage-collected when the target is destroyed.

For reference, the full ownership chain is something like:
StepOverBreakpointTestCase(python) => SBBreakpoint => Breakpoint =>
BreakpointLocation => LLVMUserExpression => IRExecutionUnit =>
SymbolContext => Module.

To get the test working, we need to break this chain somewhere. A
couple of places I see are:

  • BreakpointLocation: Remove the compiled expression reference when
    the target is destroyed (AFAICS, it is used as a cache to avoid
    recomputing the expression every time. It can be theoretically
    recomputed if needed, but that shouldn’t be necessary as the target is
    destroyed anyway)

We should absolutely be clearing these useless expressions when the process is killed. Compiling expressions take 300-500 ms at the very least and we do cache the breakpoint expressions in each location, which is good for the lifetime of the process, but these locations should go away when the process goes away or execs.

  • SBBreakpoint: make SBBreakpoint hold a weak_ptr to the Breakpoint
    object. When the target is destroyed, the SBBreakpoint object becomes
    invalid (One doesn’t cannot do anything useful with the breakpoint
    once the target has been deleted anyway).

This is also valid. If the breakpoint is still around, you will be able to use it as the weak_ptr will make a shared pointer with a valid value, and if it isn’t it won’t be able to do anything anyway.

  • StepOverBreakpointTestCase: Have the test not store the breakpoints
    in the test case object. Basically, declare that this is not a bug,
    and it’s the users responsibility to clean up necessary objects.

It would be nice to avoid this.

Any thoughts on what is the appropriate solution here?

So I vote for:
1 - remove all breakpoint expressions when the target stops and the locations becomes unresolved.
2 - SBBreakpoint, SBBreakpointLocation, SBWatchpoint and can switch over to using std::weak_ptr

The other fix is that might be worth it is to make a SymbolContextRef, kind of like we have the ExecutionContext (with shared pointers) and ExecutionContextRef (with weak pointers). We would then have things like IRExecutionUnit switch over to use a SymbolContextRef. We should also look for other places where people are storing SymbolContext as member variables and possibly switch them over to use SymbolContextRef.

class SymbolContextRef
{
lldb::TargetWP target_wp; ///< The Target for a given query
lldb::ModuleWP module_wp; ///< The Module for a given query
CompileUnit *comp_unit; ///< The CompileUnit for a given query
Function *function; ///< The Function for a given query
Block *block; ///< The Block for a given query
LineEntry line_entry; ///< The LineEntry for a given query
Symbol *symbol; ///< The Symbol for a given query
Variable *variable; ///< The global variable matching the given query
};

Then a SymbolContext would add a constructor that takes a SymbolContextRef:

SymbolContextRef sym_ctx_ref = …;

Anytime you want to use a SymbolContextRef, you first must turn it into a SymbolContext:

SymbolContext sym_ctx(sym_ctx_ref);

If the module is gone, then we need to NULL out comp_unit, function, block, symbol and variable (since they are owned by the Module) and clear the line_entry. If the module is still around, then these things can stay as is and the local SymbolContext now has a strong reference to the module and or target.

IRExecutionUnit currently uses the symbol context for name lookups, so I am not even sure if it is a good idea for it to store one permanently, but if it we use SymbolContextRef, then I don’t care as much.

So in general think about who owns what and what should keep things alive. In this case it sounds like making SBBreakpoint, SBBreakpointLocation, and SBWatchpoint use std::weak_ptr would solve this part of this issue, but I still don’t like a IRExecutionUnit being able to keep a module around, so we should probably make a SymbolContextRef. I am guessing if you switch SBBreakpoint over to use std::weak_ptr all the expressions will go away on their own.

Greg Clayton

Thanks for digging into this.

Maybe a better option would be to have IRExecutionUnit hold a SymbolContextScope, rather than a SymbolContext. We started out with SymbolContext's because they were convenient but they hold too many things alive so when we run into SymbolContext's we've been converting them to the SymbolContextScope's when possible. I can't see any need for the IRExecutionUnit to hold the module alive, so long as it handles failure to get the SymbolContext gracefully, which shouldn't be hard.

SymbolContextScope doesn't hold ownership as it is just a pointer. This pointer would go bad if the module goes away and it would crash when you tried to use it. See my other email for another approach with SymbolContextRef.

Greg

Agree with all of this, but I think most of the time you only need 1 thing keeping an object alive (i.e. unique_ptr), and from there you can just hand out references. In the case where an object might die while a reference is outstanding, then a lot of times I think the design can be adjusted so that doesn’t happen, but when it’s still necessary, you can store a shared_ptr and vend out weak_ptrs. You don’t need everything in the system keeping everything else alive.

I’m speaking from a purely general standpoint, I haven’t done a deep dive into LLDB to see where this does / doesn’t apply, but in my experience shared_ptrs are one of the most overused things in all of C++. So when I see a lot of them, I just default to “I bet these can be reduced significantly”

Thanks for digging into this.

Maybe a better option would be to have IRExecutionUnit hold a SymbolContextScope, rather than a SymbolContext. We started out with SymbolContext's because they were convenient but they hold too many things alive so when we run into SymbolContext's we've been converting them to the SymbolContextScope's when possible. I can't see any need for the IRExecutionUnit to hold the module alive, so long as it handles failure to get the SymbolContext gracefully, which shouldn't be hard.

SymbolContextScope doesn't hold ownership as it is just a pointer. This pointer would go bad if the module goes away and it would crash when you tried to use it. See my other email for another approach with SymbolContextRef.

Brain, brain, what is brain?

Yeah, I was conflating ExecutionContextRef & SymbolContext<WORDS THAT ARE NOT REF>. But still, having a SymbolContext equivalent to ExecutionContextRef seems the correct solution to me, even though we don't have the pre-baked class for it.

Jim

I don't agree with this. I think trying to force folks using the API from Python to manually clear all stored objects would be really obnoxious. If anything, we should figure out how to make this accidental failure into an intended failure so we can make sure we don't end up requiring this kind of micro-management.

Jim

- StepOverBreakpointTestCase: Have the test not store the breakpoints
in the test case object. Basically, declare that this is not a bug,
and it's the users responsibility to clean up necessary objects.

It would be nice to avoid this.

I don't agree with this. I think trying to force folks using the API from Python to manually clear all stored objects would be really obnoxious. If anything, we should figure out how to make this accidental failure into an intended failure so we can make sure we don't end up requiring this kind of micro-management.

It's possible you meant "it would be nice to avoid it's being the user's responsibility to clean up necessary objects", in which case sorry for mis-reading but happy we agree...

Jim

Thank you for the feedback. I'll look into how to get the expressions
cleared when the breakpoint locations get unresolved. I can also
switch the breakpoints to a weak_ptr if you think that is worthwhile,
but one of the two fixes is enough for me at the moment.

regards,
pavel

Yep, that is what I meant. No one should have to worry about cleansing any variables. It should just work. So we need to pick the lldb::SB objects that have strong reference very carefully.

I think the same goes for SBValue, whenever I've looked at 'locals' the executable gets locked until my SBValue refs get gcd.