Ownership of Stmts and Exprs, and destroying them

Hi,

As I understand the ownership, any given Expr is owned by the Expr it's an argument for, which is owned by the Stmt it makes up, which is owned by the function declaration, and so on, up to the translation unit. Except for some rare cases, like VLA sizes being owned by the VariableLengthArrayType, and function default argument expressions being owned by their FunctionDecl.

Types are owned by ASTContext and live until it is destroyed. Declarations are owned by their DeclStmt or similar.

Expressions that are currently being built - e.g. the argument parameters of ActOnBinOp - are essentially unowned. As such, if the Sema function returns early, without building some object to wrap them, they leak. In addition, they leak when the parser returns early and abandons the ExprTys it was carrying around. This is seen as acceptable, because the program will soon end anyway if there was an error.

Still, not everyone seems to think so. Doug goes out of his way to not leak these objects in the Sema routines, and asked me not to leak them, either. Chris thought it great that Doug wasn't leaking, too.

To destroy a Stmt or its derived classes, you don't use delete, but the Destroy member. This may be overridden in subclasses, and the base version first recursively calls Destroy for every child statement, and then commits suicide (delete this). Calling delete yourself is *not* sufficient.

Is this correct? Because I want to clean this up. I want to make the Sema routines (and later the parser, too) not leak. So I need to know exactly how this works.

Sebastian

Hi,

As I understand the ownership, any given Expr is owned by the Expr it's
an argument for, which is owned by the Stmt it makes up, which is owned
by the function declaration, and so on, up to the translation unit.
Except for some rare cases, like VLA sizes being owned by the
VariableLengthArrayType, and function default argument expressions being
owned by their FunctionDecl.

Yep.

Expressions that are currently being built - e.g. the argument
parameters of ActOnBinOp - are essentially unowned.

They are owned, but the ownership is transfered to the ActOn method that gets them. It is up to that method to deallocate them or transfer ownership to the return value.

As such, if the Sema
function returns early, without building some object to wrap them, they
leak. In addition, they leak when the parser returns early and abandons
the ExprTys it was carrying around. This is seen as acceptable, because
the program will soon end anyway if there was an error.

This is not acceptable, Clang can be used by other clients that are long lived, and leaking memory is a really bad thing.

Still, not everyone seems to think so. Doug goes out of his way to not
leak these objects in the Sema routines, and asked me not to leak them,
either. Chris thought it great that Doug wasn't leaking, too.

I don't know that anyone thinks it good that we leak memory :). The history here is that much of Sema was built before there *was* a clear ownership model. This means that there is a lot of "bad" code in it that doesn't obey the newly defined ownership model. Also, the parser has a couple of places where it leaks as well. If the parser does error recovery and ends up obviating a parsed expr/statement/etc, it should call the Actions::DeleteExpr/DeleteStmt methods to free them. This happens in some cases, but not all of them.

To destroy a Stmt or its derived classes, you don't use delete, but the
Destroy member. This may be overridden in subclasses, and the base
version first recursively calls Destroy for every child statement, and
then commits suicide (delete this). Calling delete yourself is *not*
sufficient.

I think that statements (exprs are statements) are freed with delete currently:

void Sema::DeleteExpr(ExprTy *E) {
   delete static_cast<Expr*>(E);
}
void Sema::DeleteStmt(StmtTy *S) {
   delete static_cast<Stmt*>(S);
}

We're slowly migrating decls to use a Create/Destroy interface, in order to let them be pool allocated or use some other allocation mechanism. It would be good to also move statements this way eventually, but decls should be finished first.

Is this correct? Because I want to clean this up. I want to make the
Sema routines (and later the parser, too) not leak. So I need to know
exactly how this works.

This would be great. Probably the first place to start is to use audit the parser for places where early exits cause parsed stmts to be leaked. Once that is done, start going through sema. The pattern we want to use in Sema for statements is using OwningPtr to hold incoming arguments. For example, ActOnBinOp should look like:

                                     ExprTy *LHS, ExprTy *RHS) {
   BinaryOperator::Opcode Opc = ConvertTokenKindToBinaryOpcode(Kind);
   OwningPtr<Expr> lhs = (Expr *)LHS, rhs = (Expr*)RHS;

   ...
   // Build a built-in binary operation.
   return CreateBuiltinBinOp(TokLoc, Opc, lhs.take(), rhs.take());

For routines that have more incoming values or are variadic, the pattern in ActOnCallExpr should be followed.

Incidentally, I'd like for C++ specific code to be moved to one of the C++ Sema files where it makes sense. For example, the Sema::ActOnBinOp contains a bunch of code for binary operator overloading. I'd prefer the code to look like this:

     if (getLangOptions().CPlusPlus &&
       (lhs->getType()->isRecordType() || lhs->getType()- >isEnumeralType() ||
        rhs->getType()->isRecordType() || rhs->getType()- >isEnumeralType())) {
       if (HandleCXXBinOpOverloading(...))
         return result;
     }

     // Non overloading code.

-Chris

Chris Lattner wrote:

I don't know that anyone thinks it good that we leak memory :). The history here is that much of Sema was built before there *was* a clear ownership model. This means that there is a lot of "bad" code in it that doesn't obey the newly defined ownership model. Also, the parser has a couple of places where it leaks as well. If the parser does error recovery and ends up obviating a parsed expr/statement/etc, it should call the Actions::DeleteExpr/DeleteStmt methods to free them. This happens in some cases, but not all of them.

OK. I thought there had to be such methods, but didn't go looking.

To destroy a Stmt or its derived classes, you don't use delete, but the
Destroy member. This may be overridden in subclasses, and the base
version first recursively calls Destroy for every child statement, and
then commits suicide (delete this). Calling delete yourself is *not*
sufficient.

I think that statements (exprs are statements) are freed with delete currently:

void Sema::DeleteExpr(ExprTy *E) {
  delete static_cast<Expr*>(E);
}
void Sema::DeleteStmt(StmtTy *S) {
  delete static_cast<Stmt*>(S);
}

We're slowly migrating decls to use a Create/Destroy interface, in order to let them be pool allocated or use some other allocation mechanism. It would be good to also move statements this way eventually, but decls should be finished first.

Look at what happens if you just delete a CallExpr:

  ~CallExpr() {
    delete SubExprs;
  }

Doesn't delete the actual expressions, just the array holding them.
Expr has no explicit destructor.

  virtual ~Stmt() {}

So if you delete a CallExpr, its children are never freed and leak. The reason is found in Stmt::Destroy:

void Stmt::DestroyChildren(ASTContext& C) {
  for (child_iterator I = child_begin(), E = child_end(); I !=E; ++I)
    if (Stmt* Child = *I) Child->Destroy(C);
}

void Stmt::Destroy(ASTContext& C) {
  DestroyChildren(C);
  // FIXME: Eventually all Stmts should be allocated with the allocator
  // in ASTContext, just like with Decls.
  // this->~Stmt();
  delete this;
}

In other words, I believe that Sema::DeleteExpr/Stmt, just like holding exprs with OwningPtrs, is insufficient. They need to Destroy, not delete.

This would be great. Probably the first place to start is to use audit the parser for places where early exits cause parsed stmts to be leaked.

OK.

Once that is done, start going through sema. The pattern we want to use in Sema for statements is using OwningPtr to hold incoming arguments. For example, ActOnBinOp should look like:

                                    ExprTy *LHS, ExprTy *RHS) {
  BinaryOperator::Opcode Opc = ConvertTokenKindToBinaryOpcode(Kind);
  OwningPtr<Expr> lhs = (Expr *)LHS, rhs = (Expr*)RHS;

As I said, I don't think this is sufficient. I've written a TempOwner that acts like OwningPtr that calls Destroy, though.

For routines that have more incoming values or are variadic, the pattern in ActOnCallExpr should be followed.

That pattern has the disadvantage that it sometimes requires either that the Expr is modifiable after creation, or weird restructuring of the method. Take, for example, ActOnCXXNew: if I want to pass everything to the constructor, I need to do three overload resolutions before creating the CXXNewExpr. The alternative is creating the Expr with fewer parameters and adding the rest later - but that means I have to add setters to CXXNewExpr, something I'm not a fan of.

Sebastian

This is too late. We want the arguments to the ActOnBinOp to be some
kind of smart pointer that provides transfer-of-ownership semantics
(basically, a C++'0x move_ptr) and knows how to delete expressions
properly (via Action::DestroyExpr). Similarly for ExprResult and
friends. That way, the type system enforces our ownership model
(transfer into Sema for the parameters of ActOn*, transfer out of Sema
on return), and we're safe from early exits.

  - Doug

If it helps, here's a C++03 emulation of the C++0X std::unique_ptr (what Doug refers to as move_ptr):

http://home.twcny.rr.com/hinnant/cpp_extensions/unique_ptr_03.html

You can set its "deleter" to call Action::DestroyExpr. It might look something like:

                           unique_ptr<ExprTy, CallActionDestroyExpr> lhs,
                           unique_ptr<ExprTy, CallActionDestroyExpr> rhs) {
  BinaryOperator::Opcode Opc = ConvertTokenKindToBinaryOpcode(Kind);
  ...

CallActionDestroyExpr will need to be a function pointer or functor that accepts a ExprTy*. It can be stateful (hold a this for a member function pointer). If stateful, that state can be supplied in the unique_ptr constructor, and will be transferred from copy to copy along with pointer ownership.

If we need a unique_ptr that isn't dependent on boost, I can supply that (it would require parts of <type_traits> and something to replace boost::compressed_pair). I haven't tested the link above in several years. I know it used to work but you know how software decays. The unique_ptr<T[N]> specialization (which you don't need anyway) is no longer part of the working draft.

For a quick unique_ptr tutorial, think of this as like auto_ptr except that you can supply a second templated de-allocator, just like containers and their allocators. It has release() and get() just like auto_ptr. It has unique ownership semantics just like auto_ptr.

-Howard

Howard Hinnant wrote:

If we need a unique_ptr that isn't dependent on boost, I can supply that (it would require parts of <type_traits> and something to replace boost::compressed_pair). I haven't tested the link above in several years. I know it used to work but you know how software decays. The unique_ptr<T[N]> specialization (which you don't need anyway) is no longer part of the working draft.
  

That would be exactly what I need. Right now I'm doing the ownership transfer manually, and that's really not the right way. If you could de-boost your unique_ptr, I'd be really grateful.

However, using this consistently would mean a pretty big API change. I'd start out using it in the parser only.

Sebastian

Chris Lattner wrote:

Is this correct? Because I want to clean this up. I want to make the
Sema routines (and later the parser, too) not leak. So I need to know
exactly how this works.

This would be great. Probably the first place to start is to use audit the parser for places where early exits cause parsed stmts to be leaked.

OK, I've audited the parser. The result is attached. No regressions, and a clean valgrind run on Parse/statements.c. (Not leak-free. Just invalid-access-free.)

Not all places are fixed. In particular, array bounds and default arguments hidden inside a Declarator might be leaked in some places. Also, the additional guard is not pretty - ideally, ActionResult itself should be replaced by the smart pointer. This is more of a patch than a thorough solution.

Still, it's a start.

Sebastian

parser-leak.patch (35.8 KB)

Is this correct? Because I want to clean this up. I want to make the
Sema routines (and later the parser, too) not leak. So I need to know
exactly how this works.

This would be great. Probably the first place to start is to use
audit the parser for places where early exits cause parsed stmts to be
leaked. Once that is done, start going through sema. The pattern we
want to use in Sema for statements is using OwningPtr to hold incoming
arguments. For example, ActOnBinOp should look like:

                                   ExprTy *LHS, ExprTy *RHS) {
BinaryOperator::Opcode Opc = ConvertTokenKindToBinaryOpcode(Kind);
OwningPtr<Expr> lhs = (Expr *)LHS, rhs = (Expr*)RHS;

This is too late. We want the arguments to the ActOnBinOp to be some
kind of smart pointer that provides transfer-of-ownership semantics
(basically, a C++'0x move_ptr) and knows how to delete expressions
properly (via Action::DestroyExpr).

Sema doesn't need to call DestroyExpr: it knows these are Expr*'s. The Parser needs to call Action::DestroyExpr because these are just void*'s to it.

Similarly for ExprResult and
friends. That way, the type system enforces our ownership model
(transfer into Sema for the parameters of ActOn*, transfer out of Sema
on return), and we're safe from early exits.

ExprResult is just a pair of success + pointer. The actions in Sema are only called in the success case.

-Chris

This doesn't work. The problem is that the C++ type system is insufficiently powerful (!!) to express what we need here. The parser needs to be independent of any specific actions implementation and AST form. This is why it needs to call callbacks on Actions to delete expressions.

Sema, on the other hand, implements these actions but does know the specific concrete types. It isn't acceptable to encode "Expr" into the interface used by the parser, that would be a severe layering violation.

-Chris

Cool!

the Parser.cpp part is independent and can be committed separately (and looks fine)

+ class AstGuard {

AST is a contraction, so it should be capitalized. The name "Guard" is somewhat strange for this. How about "ASTOwner" or "ASTHolder" or something like that?

+ public:
+ enum AstType {
+ Stmt,
+ Expr
+ };

Would it make sense to make an ExprAstGuard and StmtAstGuard class (or a template parameterized on ExprResult/StmtResult)? In other words, make this determination statically instead of dynamically?

+ // RAII guard for freeing SmallVectors of Stmts and Exprs on early exit
+ // in the parser.
+ class AstArrayGuard {

instead of having a pointer to a smallvector, how about just making this contain the temporary smallvector?

Since these things affect the rest of the patch, I'm going to wait to review the rest until we converge on the API.

-Chris

Chris Lattner wrote:

Chris Lattner wrote:

Is this correct? Because I want to clean this up. I want to make the
Sema routines (and later the parser, too) not leak. So I need to know
exactly how this works.

This would be great. Probably the first place to start is to use audit the parser for places where early exits cause parsed stmts to be leaked.

OK, I've audited the parser. The result is attached. No regressions, and a clean valgrind run on Parse/statements.c. (Not leak-free. Just invalid-access-free.)

Not all places are fixed. In particular, array bounds and default arguments hidden inside a Declarator might be leaked in some places. Also, the additional guard is not pretty - ideally, ActionResult itself should be replaced by the smart pointer. This is more of a patch than a thorough solution.

Cool!

the Parser.cpp part is independent and can be committed separately (and looks fine)

+ class AstGuard {

AST is a contraction, so it should be capitalized.

I don't particularly like making even abbreviations all-uppercase in CamelCase names, but OK.

  The name "Guard" is somewhat strange for this. How about "ASTOwner" or "ASTHolder" or something like that?

I use Guard for most things that guard against early exit from a frame. Or sentry, which is a synonym. I didn't use a name like Holder because the ExprResult/ExprTy/Stmt equivalents aren't actually part of the class - the class is an external addon to guard against early exit, and it needs to be told to watch a variable that you already have.

+ public:
+ enum AstType {
+ Stmt,
+ Expr
+ };

Would it make sense to make an ExprAstGuard and StmtAstGuard class

Yes.

(or a template parameterized on ExprResult/StmtResult)?

Feels wrong.

In other words, make this determination statically instead of dynamically?

Yes, it is. I went for the dynamic way because I didn't have any immediately obvious idea of what to template on (and specializing individual members is a hassle, if at all possible), and I didn't want to duplicate the code.
But I think I should just make it

template <void (Action::*Destroyer)(void*)> class ASTGuard;
typedef ASTGuard<&Action::DestroyExpr> ExprGuard;
typedef ASTGuard<&Action::DestroyStmt> StmtGuard;

Then the destructor is

~ASTGuard() { (Actions.*Destroyer)(Node); }

with Node being a new name for AstElement. If the compiler is smart, this should be equivalent to a direct function call.

+ // RAII guard for freeing SmallVectors of Stmts and Exprs on early exit
+ // in the parser.
+ class AstArrayGuard {

instead of having a pointer to a smallvector, how about just making this contain the temporary smallvector?

I suppose that's possible in the actual cases where I use it. I didn't before, because I wasn't sure where I would put it. And of course, this means making it a template on the smallvector size.

Sebastian

The name "Guard" is somewhat strange for this. How about "ASTOwner" or "ASTHolder" or something like that?

I use Guard for most things that guard against early exit from a frame. Or sentry, which is a synonym. I didn't use a name like Holder because the ExprResult/ExprTy/Stmt equivalents aren't actually part of the class - the class is an external addon to guard against early exit, and it needs to be told to watch a variable that you already have.

Ok!

+ public:
+ enum AstType {
+ Stmt,
+ Expr
+ };

Would it make sense to make an ExprAstGuard and StmtAstGuard class

Yes.

(or a template parameterized on ExprResult/StmtResult)?

Feels wrong.

Ok.

In other words, make this determination statically instead of dynamically?

Yes, it is. I went for the dynamic way because I didn't have any immediately obvious idea of what to template on (and specializing individual members is a hassle, if at all possible), and I didn't want to duplicate the code.
But I think I should just make it

template <void (Action::*Destroyer)(void*)> class ASTGuard;
typedef ASTGuard<&Action::DestroyExpr> ExprGuard;
typedef ASTGuard<&Action::DestroyStmt> StmtGuard;

Then the destructor is

~ASTGuard() { (Actions.*Destroyer)(Node); }

with Node being a new name for AstElement. If the compiler is smart, this should be equivalent to a direct function call.

Yeah, I'd be very happy with that.

+ // RAII guard for freeing SmallVectors of Stmts and Exprs on early exit
+ // in the parser.
+ class AstArrayGuard {

instead of having a pointer to a smallvector, how about just making this contain the temporary smallvector?

I suppose that's possible in the actual cases where I use it. I didn't before, because I wasn't sure where I would put it. And of course, this means making it a template on the smallvector size.

In practice, the size of a small vector is pretty arbitrary. Would it be fine to just fix it to 16 or something like that?

-Chris

Chris Lattner wrote:

In practice, the size of a small vector is pretty arbitrary. Would it be fine to just fix it to 16 or something like that?

I'm sure you have more experience there than me.

Sebastian

Ok, I guess I'm asking if there are cases in the parser that will use this class, that aren't currently using a size of ~16. Have you seen any cases like that?

-Chris

Is this correct? Because I want to clean this up. I want to make the
Sema routines (and later the parser, too) not leak. So I need to know
exactly how this works.

This would be great. Probably the first place to start is to use
audit the parser for places where early exits cause parsed stmts to be
leaked. Once that is done, start going through sema. The pattern we
want to use in Sema for statements is using OwningPtr to hold incoming
arguments. For example, ActOnBinOp should look like:

                                  ExprTy *LHS, ExprTy *RHS) {
BinaryOperator::Opcode Opc = ConvertTokenKindToBinaryOpcode(Kind);
OwningPtr<Expr> lhs = (Expr *)LHS, rhs = (Expr*)RHS;

This is too late. We want the arguments to the ActOnBinOp to be some
kind of smart pointer that provides transfer-of-ownership semantics
(basically, a C++'0x move_ptr) and knows how to delete expressions
properly (via Action::DestroyExpr).

Sema doesn't need to call DestroyExpr: it knows these are Expr*'s. The
Parser needs to call Action::DestroyExpr because these are just void*'s to
it.

Then Sema is free to take() from the smart pointer it is given into a
data structure that actually knows about Expr*'s. We do that anyway,
since we need to get the type right. The point is for the Parser-Sema
interface to clearly express when ownership is transferred: when the
Parser passes an ExprTy* it got into Sema, it's transferring ownership
to Sema; similarly, when Sema is returning an ExprResult, it is
transferring ownership of that pointer back to the Parser.

Similarly for ExprResult and
friends. That way, the type system enforces our ownership model
(transfer into Sema for the parameters of ActOn*, transfer out of Sema
on return), and we're safe from early exits.

ExprResult is just a pair of success + pointer. The actions in Sema are
only called in the success case.

Of course. But ExprResult also (implicitly!) carries with it the
ownership of the pointer, and that's not at all clear now.

  - Doug

Yes. There are three or four instances of the size 4 (e.g. parsing extended
ASM decls), and one or two of the size 8 (can't remember where).

Sebastian

Sebastian Redl wrote:

OK, revised version of the patch. AstGuard is now ASTGuard<Destroyer>, with typedef ExprGuard and StmtGuard.

AstArrayGuard now holds and owns (in fact, derives from) the SmallVector and was renamed to ASTVector, with template aliases (derived classes) ExprVector<unsigned N> and StmtVector<unsigned N>.
This has greatly simplified its use, as long as people remember that you have to explicitly give up ownership with take() instead of using &ar[0] to pass the content to the Sema.

Sebastian

parser-leak.patch (36 KB)

Ok, I agree with you. However, making the contract more clear seems conceptually separate from making sema or the parser not leak. Are you saying that both problems could be solved in the same way?

-Chris

Ok, it would be fine to just go with 8 or 12 then. It's all pretty arbitrary :slight_smile:

The major key is for SmallVectors to live on the stack (where space is relatively free). We don't want them to end up on the heap (i.e., 'new' a SmallVector).

-Chris