C++ Constructors & Destructors in the AST

Anders, Doug, Daniel and I discussed the representation of temporaries, constructors and destructors in the AST.

The basic goal is to make it explicit what temporaries are available, allow (only) the C++ front-end to reason about the lifetimes of temporaries (e.g. temps destroyed at the end of statements), allow the c++ front-end to do copy constructor elision optimizations, and to allow downstream clients (codegen, static analyzer etc) to know where destructors are run, etc.

We decided to make a new form of VarDecl, named CXXTempVarDecl. These are always implicitly generated by the compiler, they never have an explicit representation in the source code. These CXXTempVarDecls are inserted into the declcontext of the function, but they have no declstmt or other defining point in the code.

We introduce two new expressions "CXXConstruct" and "CXXDestroy". CXXConstruct becomes the only way to reference a constructor decl, and DeclRefExpr will assert if you try to create it with a constructor. CXXConstruct is a variadic node similar to a call, which takes a decl to initialize, a decl for the constructor to call (e.g. copy ctor, default ctor, converting ctor, etc) and an optional variadic list of arguments to pass into the constructor. CXXDestroy takes a decl that is destroyed at that point - it basically represents running the destructor on the decl.

Some examples:

This:
{ T x;
...
}

Turns into:

(VarDecl 'x' Type=T,
    Init = CXXConstruct("x", "T::T()")
...
(CXXDestroy "x")

If you have a constructor with direct initialization, we'd get:

{ T x(4);
...
}

(VarDecl 'x' Type=T,
    Init = CXXConstruct("x", "T::T(int)", 4)
...
(CXXDestroy "x")

If you have indirect initialization and the copy constructor has not been elided, then we get a temporary:

{ T x = 4;
...
}

(VarDecl 'x' Type=T,
    Init = CXXConstruct("x", "T::T(const T&)",
                 CXXConstruct("somecxxtempdecl", "T::T(int)", 4))
(CXXDestroy "somecxxtempdecl")
...
(CXXDestroy "x")

This illustrates the need for an explicit CXXDestroy: statement local temporaries need to have specific controlled lifetimes that we don't want clients to have to reason about.

Clients that want to walk the AST, such as codegen, need to maintain a cleanup stack. This is already required for VLAS, try block, @synchronized, etc. This stack would get an entry for a "constructible" decl when a CXXConstruct is codegen'd. For example, if an exception is thrown after a VarDecl for a constructible type is seen but before it is constructed, it should not be destroyed. Between these two points, the VarDecl wouldn't exist on the cleanup stack.

The idea is that CXXDestroy would explicitly remove these decls from the cleanup stack, which is why we need an explicit marker that sema can place to say where things are known to be constructed and when they are known to be destroyed. If there is no CXXDestroy for a decl, then it is live until the containing scope is done. This is useful for the common case when a decl is live to the end of its scope, as when extending the lifetime of a temporary with const& or in the case of normal variable decls.

Still to discuss after the basic pieces are done:
1. Global variable initialization, where to CXXTempVarDecls go? Just cram them into the containing declcontext? Should sema explicitly generate the "translation unit constructor" function or not?
2. Conditional liveness of temporaries, how to we represent the condition to destroy a temp.

-Chris

If you have indirect initialization and the copy constructor has not
been elided, then we get a temporary:

{ T x = 4;
...
}

(VarDecl 'x' Type=T,
   Init = CXXConstruct("x", "T::T(const T&)",
                CXXConstruct("somecxxtempdecl", "T::T(int)", 4))
(CXXDestroy "somecxxtempdecl")
...
(CXXDestroy "x")

I think that it's good to always have copy ctors represented in the AST, and then have an "canBeElided" bit on either the temp decl or the CXXConstruct call.

Here's another example that was on the board but that wasn't written down:

{ const T& x = 4;
...
}

(VarDecl 'x' Type=const T&,
   Init = CXXConstruct("somecxxtempdecl", "T::T(int)", 4))
...
(CXXDestroy("somecxxtempdecl")

Still to discuss after the basic pieces are done:
1. Global variable initialization, where to CXXTempVarDecls go? Just
cram them into the containing declcontext? Should sema explicitly
generate the "translation unit constructor" function or not?

Another option would be to not insert the CXXTempVarDecls in a DeclContext, since we know that they must belong to a single CXXConstructDecl. Doug pointed out that it might be good to be able to iterate over temp decls though.

2. Conditional liveness of temporaries, how to we represent the
condition to destroy a temp.

3. How should we represent compound literal expressions and init list expressions? For example:

{ T xs = { 1, 2 }; }

or even

(T[2]){1, 2 };

Both these have copy ctors that can be elided, but what's the VarDecl they go to?

Anders

So where do CXXTemporaryObjectExpr and CXXExprWithCleanup fit into this
scheme?

Especially CXXExprWithCleanup has me stumped. How is it going to be
used? Where is it inserted into the AST? What does it mean?

Sebastian

I don't know either. Anders can you outline where you're going?

-Chris

26 apr 2009 kl. 04.35 skrev Sebastian Redl:

So where do CXXTemporaryObjectExpr and CXXExprWithCleanup fit into this
scheme?

Hi Sebastian!

Sorry for not replying earlier. Here's the current design:

CXXConstructExpr represents a (possibly implicit) call to a constructor. For example the declaration

T t;

will have its initializer set to a CXXConstructExpr. (Assuming T is a class type with a non-trivial constructor of course)

CXXTemporaryObjectExpr represents a temporary - it inherits from CXXConstructExpr since it creates a temporary expr.

T();

will be represented by a CXXTemporaryObjectExpr.

Especially CXXExprWithCleanup has me stumped. How is it going to be
used? Where is it inserted into the AST? What does it mean?

A CXXExprWithCleanup represents a full expression that creates temporaries that needs to have their destructors called.

For example the example above,

T();

would look something like

(CXXExprWithCleanup
   (CXXTemporaryObjectExpr("temp", "T::T")
   ("temp"))

In the first design that we came up with, we would have explicit CXXDestroyExprs that would be inserted after statements, but that won't work with things like the for loop condition expr. I plan to remove the CXXDestroyExpr node.

Does this sound OK? Maybe we should rename CXXExprWithCleanup to CXXExprWithTemporaries? I'll add some documentation shortly.

Anders

Hi Anders,

I thought the idea was to make CXXDestroyExpr be an expression, and use a comma expression (or its moral equivalent) where needed. This allows temporaries to occur anywhere an expressions does (including the initializer for a global, etc).

-Chris

Anders Carlsson wrote:

26 apr 2009 kl. 04.35 skrev Sebastian Redl:

Especially CXXExprWithCleanup has me stumped. How is it going to be
used? Where is it inserted into the AST? What does it mean?

A CXXExprWithCleanup represents a full expression that creates
temporaries that needs to have their destructors called.

For example the example above,

T();

would look something like

(CXXExprWithCleanup
  (CXXTemporaryObjectExpr("temp", "T::T")
  ("temp"))

In the first design that we came up with, we would have explicit
CXXDestroyExprs that would be inserted after statements, but that
won't work with things like the for loop condition expr. I plan to
remove the CXXDestroyExpr node.

Does this sound OK? Maybe we should rename CXXExprWithCleanup to
CXXExprWithTemporaries? I'll add some documentation shortly.

I like the design, and I like WithTemporaries better.

So basically, we give Sema a SmallVector of temporaries that we've
created. All statement actions check this buffer, and if it's not empty,
a CXXExprWithTemporaries is created and wrapped around the current
statement, and gets all the temporaries added. Then the buffer is cleared.
Is this right?

Still doesn't solve the issue of conditional creation, though.

Sebastian

26 apr 2009 kl. 12.42 skrev Chris Lattner:

(CXXExprWithCleanup
(CXXTemporaryObjectExpr("temp", "T::T")
("temp"))

In the first design that we came up with, we would have explicit CXXDestroyExprs that would be inserted after statements, but that won't work with things like the for loop condition expr. I plan to remove the CXXDestroyExpr node.

Does this sound OK? Maybe we should rename CXXExprWithCleanup to CXXExprWithTemporaries? I'll add some documentation shortly.

Hi Anders,

I thought the idea was to make CXXDestroyExpr be an expression, and use a comma expression (or its moral equivalent) where needed. This allows temporaries to occur anywhere an expressions does (including the initializer for a global, etc).

Would a comma expression really work? Isn't the type of a comma expression the type of its second argument? How would you represent

if (T().f()) { ... }

in that case? We could add a separate expr node but then we'd end up with something very similar to CXXExprWithTemporaries :slight_smile:

Anders

26 apr 2009 kl. 13.11 skrev Sebastian Redl:

Anders Carlsson wrote:

26 apr 2009 kl. 04.35 skrev Sebastian Redl:

Especially CXXExprWithCleanup has me stumped. How is it going to be
used? Where is it inserted into the AST? What does it mean?

A CXXExprWithCleanup represents a full expression that creates
temporaries that needs to have their destructors called.

For example the example above,

T();

would look something like

(CXXExprWithCleanup
(CXXTemporaryObjectExpr("temp", "T::T")
("temp"))

In the first design that we came up with, we would have explicit
CXXDestroyExprs that would be inserted after statements, but that
won't work with things like the for loop condition expr. I plan to
remove the CXXDestroyExpr node.

Does this sound OK? Maybe we should rename CXXExprWithCleanup to
CXXExprWithTemporaries? I'll add some documentation shortly.

I like the design, and I like WithTemporaries better.

Cool, I'll change that!

So basically, we give Sema a SmallVector of temporaries that we've
created. All statement actions check this buffer, and if it's not empty,
a CXXExprWithTemporaries is created and wrapped around the current
statement, and gets all the temporaries added. Then the buffer is cleared.
Is this right?

Sort of - You will have to do this check before you call the statement action, because
statements can take more than one expression (like for statements).

I'm working on a design that involves something like

   /// ActOnFinishFullExpr - Called whenever a full expression has been parsed.
   /// (C++ [intro.execution]p12.
   virtual OwningExprResult ActOnFinishFullExpr(ExprArg Expr) {
     return move(Expr);
   }

where a client can return a new expr when a full expr is being created.

We also need to come up with a solid design where it's next to impossible to forget to
have this callback invoked, I'm toying with the idea of adding a new FullExprArg type,
and where the only way to create a FullExprArg from an ExprArg is to go through this callback.

Anders

We could define a new expression that evaluates the the value of its LHS (unlike a comma) but is defined to evaluate the LHS before the RHS (like a comma).

That said, if you guys are all happy with a different approach, go for it :). I thought that the destroyexpr thing was a simple way to model this, that's all.

-Chris

26 apr 2009 kl. 13.41 skrev Anders Carlsson:

Cool, I’ll change that!

So basically, we give Sema a SmallVector of temporaries that we’ve

created. All statement actions check this buffer, and if it’s not

empty,

a CXXExprWithTemporaries is created and wrapped around the current

statement, and gets all the temporaries added. Then the buffer is

cleared.

Is this right?

We also need to come up with a solid design where it’s next to
impossible to forget to
have this callback invoked, I’m toying with the idea of adding a new
FullExprArg type,
and where the only way to create a FullExprArg from an ExprArg is to
go through this callback.

Here’s what I’ve come up with. A FullExprArg right now is a very simple wrapper around ExprArg:

class FullExprArg {
ExprArg &Arg;
friend class Action;
FullExprArg(ExprArg& a) : Arg(a) {}

public:
FullExprArg& operator=(void *raw) {
Arg.operator=(raw);
return *this;
}

void* release() { return Arg.release(); }
void* get() { return Arg.get(); }

template
T takeAs() {
return static_cast<T
>(Arg.take());
}

};

Its constructor is private, and the Action class is a friend, so the only way to make one is to call

FullExprArg Action::FullExpr(ExprArg&);

which will make sure to call ActOnFinishFullExpr.

I’m not entirely happy with this solution since it sidesteps all the move security we have in the smart pointer classes. Sebastian, do you have a better idea?

Anders

Here's what I've come up with. A FullExprArg right now is a very
simple wrapper around ExprArg:

<snip>

I'm not entirely happy with this solution since it sidesteps all the
move security we have in the smart pointer classes. Sebastian, do you
have a better idea?

Hi Anders,

I'll try to look at this in detail soon, but Tuesday always very busy for
me.

Sebastian

Here's what I've come up with. A FullExprArg right now is a very
simple wrapper around ExprArg:

Actually, you've made it a wrapper around ExprArg&. Why?

I'm not entirely happy with this solution since it sidesteps all the
move security we have in the smart pointer classes. Sebastian, do you
have a better idea?

Without understanding your motive for wrapping a reference, I cannot say.
But the move security should be preserved if you wrap an object instead.

Sebastian

3 maj 2009 kl. 10.07 skrev Sebastian Redl:

Here's what I've come up with. A FullExprArg right now is a very
simple wrapper around ExprArg:

Actually, you've made it a wrapper around ExprArg&. Why?

Because that was the simplest thing to do :slight_smile:

The main reason was that I want to forbid construction of a FullExprArg everywhere except from within the Action class (not its subclasses). Maybe there's a simpler and more clever way to do it.

Anders

Here's a patch that makes FullExprArg wrap an object instead.

One idea I have is to make FullExprArg just be a smart pointer style wrapper around FullExpr, so we don't have to duplicate the ExprArg API.

Comments, thoughts?

Thanks,
Anders

textmate stdin DgA6VZ.txt (6.91 KB)

This is one ugly const_cast…

  • FullExprArg(const FullExprArg& Other)
  • : Expr(move(const_cast<FullExprArg&>(Other).Expr)) {}

but, since this will be so much nicer with rvalue references (eventually), I think we can cope with the ugliness for now, since the alternative is a lot of code duplicating. Please put in a FIXME noting that this is could/should be tightened up at some point.

  • Doug

If FullExprArg is itself movable/not copyable, wouldn't accepting Other by value (and adding an unimplemented copy ctor) eliminate the need for the const_cast?

I've only needed const_cast's with C++03 move emulation in cases where VC++ is too daft to apply the RVO.

— Gordon