Parser Stmt/Expr Owning Pointer

Hi,

In my quest to have parser and sema use smart pointers to ensure that all nodes are always freed, I have now replaced ASTGuard with ASTOwner. ASTGuard was a class that simply sat by and watched expressions, deleting them if necessary, but it was an add-on, which led to very ugly code:

ExprResult Res = Parse...();
ExprGuard Guard(Actions);

The new ASTOwner is a proper move-emulated smart pointer, and also holds the validity information of ActionResult. Thus, the code becomes

ExprOwner Res(Actions, Parse...());

Passing Actions to every pointer is an unfortunate necessity.

The move emulation of this pointer is specialized to support backwards compatibility. The ASTMove class, core of the move emulation, cannot only be implicitly converted to a new ASTOwner (which will take ownership), but also to an ActionResult or a void pointer, which also have ownership. This allows the use of move() in many contexts, such as returning an ASTOwner where an ActionResult is expected, or passing a void* to the Action by calling move(). This is a significant advantage in making the use of the smart pointer intuitive: move() is used whenever you give up ownership, no matter what receives it.

The change is simple but very far-reaching, because I've excised every single ExprResult/StmtResult that isn't part of a function signature. Thus, I'll await comments before committing it. It also influences the way everyone should write code in the parser.

I'd also like to see if the use of the smart pointer, which is a word larger than ActionResult and two words larger than a raw pointer, negatively affects performance. Do you people have a hint, or perhaps even a complete framework, for tracking parser performance?

In the next step, I'll make the Parser's internal signatures use the smart pointer, which should reduce the number of times Actions is passed greatly, but increases the number of times move() is called greatly, too. After that, we'll want to use smart pointers in the Action interface, and finally use smart pointers inside Sema.

Sebastian

parser-leak.patch (85.6 KB)

Hi,

In my quest to have parser and sema use smart pointers to ensure that all nodes are always freed,

Yay!

I have now replaced ASTGuard with ASTOwner. ASTGuard was a class that simply sat by and watched expressions, deleting them if necessary, but it was an add-on, which led to very ugly code:

ExprResult Res = Parse...();
ExprGuard Guard(Actions);

The new ASTOwner is a proper move-emulated smart pointer, and also holds the validity information of ActionResult. Thus, the code becomes

ExprOwner Res(Actions, Parse...());

Passing Actions to every pointer is an unfortunate necessity.

Ok.

The move emulation of this pointer is specialized to support backwards compatibility. The ASTMove class, core of the move emulation, cannot only be implicitly converted to a new ASTOwner (which will take ownership), but also to an ActionResult or a void pointer, which also have ownership. This allows the use of move() in many contexts, such as returning an ASTOwner where an ActionResult is expected, or passing a void* to the Action by calling move(). This is a significant advantage in making the use of the smart pointer intuitive: move() is used whenever you give up ownership, no matter what receives it.

In terms of API, can we have a better name than .move() ? From the API perspective, seeing:
+ return Idx.move();
is really strange.

How about .take(), as in "take ownership"?

Is there any good way to make the type checker catch cases where ownership is not transfered? Using "return Idx;" accidentally would be a very subtle mistake would would end up with a dangling pointer (if I understand). Maybe if the *default* was to take ownership it would lead to accidental null-dereference bugs or instead of dangling pointer bugs?

The change is simple but very far-reaching, because I've excised every single ExprResult/StmtResult that isn't part of a function signature. Thus, I'll await comments before committing it. It also influences the way everyone should write code in the parser.

I really like the idea, but lets talk about the meta issues above. I haven't had a chance to review the patch in detail yet.

I'd also like to see if the use of the smart pointer, which is a word larger than ActionResult and two words larger than a raw pointer, negatively affects performance. Do you people have a hint, or perhaps even a complete framework, for tracking parser performance?

Are you on a mac? If so, "time clang INPUTS/Cocoa.m" with a release-asserts build is the benchmark of choice right now :slight_smile:

In the next step, I'll make the Parser's internal signatures use the smart pointer, which should reduce the number of times Actions is passed greatly, but increases the number of times move() is called greatly, too. After that, we'll want to use smart pointers in the Action interface, and finally use smart pointers inside Sema.

Very cool. Thanks you for working on this Sebastian!

-Chris

Chris Lattner wrote:

The move emulation of this pointer is specialized to support backwards compatibility. The ASTMove class, core of the move emulation, cannot only be implicitly converted to a new ASTOwner (which will take ownership), but also to an ActionResult or a void pointer, which also have ownership. This allows the use of move() in many contexts, such as returning an ASTOwner where an ActionResult is expected, or passing a void* to the Action by calling move(). This is a significant advantage in making the use of the smart pointer intuitive: move() is used whenever you give up ownership, no matter what receives it.

In terms of API, can we have a better name than .move() ? From the API perspective, seeing:
+ return Idx.move();
is really strange.

How about .take(), as in "take ownership"?

I prefer move() over take() for two reasons:
1) It is more consistent with the name std::move(), which is what moving will inevitably be called in C++0x.
2) take() is loaded with the semantics of llvm::OwningPtr; although the function in question has equivalent semantics in general, there are some subtle differences.

I personally don't find move() strange at all, but then I've been preoccupied with move semantics and their emulation since I learned of the concept two years ago. (Note: with proper C++0x moving, there'll be no need to call move() anymore. However, due to the reference binding rules of C++03, the only way to achieve this is to allow the highly unsafe practice of moving from const references.)

Is there any good way to make the type checker catch cases where ownership is not transfered? Using "return Idx;" accidentally would be a very subtle mistake would would end up with a dangling pointer (if I understand).

You'd just get a compile error. ASTOwner is neither copyable nor implicitly convertible to a pointer. You move() from it, or get() its pointer, but otherwise there's no way to get the pointer out.

  Maybe if the *default* was to take ownership it would lead to accidental null-dereference bugs or instead of dangling pointer bugs?

There is no default. But there aren't bugs either. Just a slightly annoying requirement to type .move() a lot. You cannot really go wrong, unless you explicitly move from a pointer and try to use it afterwards.

I'd also like to see if the use of the smart pointer, which is a word larger than ActionResult and two words larger than a raw pointer, negatively affects performance. Do you people have a hint, or perhaps even a complete framework, for tracking parser performance?

Are you on a mac? If so, "time clang INPUTS/Cocoa.m" with a release-asserts build is the benchmark of choice right now :slight_smile:

Nope, no Mac anywhere in the vicinity (i.e. in my collection, or belonging to anyone I know).

On a side note, I've written the smart pointers that I ultimately want to use, which are going to be used in the parser and action interfaces. I'll get to change a lot of code again, but I don't think that's a big problem. Anyway, the conversion there will have to be done more gradually, because I can't change Sema and Parser in one big block like I've done the internal parser stuff so far.

Sebastian

Chris Lattner wrote:

The move emulation of this pointer is specialized to support backwards compatibility. The ASTMove class, core of the move emulation, cannot only be implicitly converted to a new ASTOwner (which will take ownership), but also to an ActionResult or a void pointer, which also have ownership. This allows the use of move() in many contexts, such as returning an ASTOwner where an ActionResult is expected, or passing a void* to the Action by calling move(). This is a significant advantage in making the use of the smart pointer intuitive: move() is used whenever you give up ownership, no matter what receives it.

In terms of API, can we have a better name than .move() ? From the API perspective, seeing:
+ return Idx.move();
is really strange.

How about .take(), as in "take ownership"?

I prefer move() over take() for two reasons:
1) It is more consistent with the name std::move(), which is what moving will inevitably be called in C++0x.
2) take() is loaded with the semantics of llvm::OwningPtr; although the function in question has equivalent semantics in general, there are some subtle differences.

The similarity with owningptr was why I liked it :), what subtle differences do you see?

I personally don't find move() strange at all, but then I've been preoccupied with move semantics and their emulation since I learned of the concept two years ago. (Note: with proper C++0x moving, there'll be no need to call move() anymore. However, due to the reference binding rules of C++03, the only way to achieve this is to allow the highly unsafe practice of moving from const references.)

Yep yep, rvalue refs are very nice. Unfortunately we probably can't actually *use* them in llvm/clang until they are widely deployed in vendor compilers. :frowning: It sucks to have to write portable code.

What do other people think about this name?

Is there any good way to make the type checker catch cases where ownership is not transfered? Using "return Idx;" accidentally would be a very subtle mistake would would end up with a dangling pointer (if I understand).

You'd just get a compile error. ASTOwner is neither copyable nor implicitly convertible to a pointer. You move() from it, or get() its pointer, but otherwise there's no way to get the pointer out.

Aha! That is very very cool. I really like that, ok :slight_smile:

Maybe if the *default* was to take ownership it would lead to accidental null-dereference bugs or instead of dangling pointer bugs?

There is no default. But there aren't bugs either. Just a slightly annoying requirement to type .move() a lot. You cannot really go wrong, unless you explicitly move from a pointer and try to use it afterwards.

Gotcha, excellent!

I'd also like to see if the use of the smart pointer, which is a word larger than ActionResult and two words larger than a raw pointer, negatively affects performance. Do you people have a hint, or perhaps even a complete framework, for tracking parser performance?

Are you on a mac? If so, "time clang INPUTS/Cocoa.m" with a release-asserts build is the benchmark of choice right now :slight_smile:

Nope, no Mac anywhere in the vicinity (i.e. in my collection, or belonging to anyone I know).

Ok, just give a head's up when you commit and someone else can test before/after your change. If C++ support were farther along, you could just #include <iostream> or something, which is a somewhat decent metric.

On a side note, I've written the smart pointers that I ultimately want to use, which are going to be used in the parser and action interfaces. I'll get to change a lot of code again, but I don't think that's a big problem. Anyway, the conversion there will have to be done more gradually, because I can't change Sema and Parser in one big block like I've done the internal parser stuff so far.

I'm all for doing incremental changes where possible, thanks for splitting this part out.

+#include "AstGuard.h"

Is "AstGuard.h" going to be renamed at some point?

Overall, I really like the patch. This is great work, please feel free to commit it and if others think "move" should be renamed, it can be done as a follow-on.

-Chris

It shouldn't be surprising that I prefer move(). In fact I'd prefer a namespace scope move, but I have no real problem with the member move. The name take() reminds me more of release(), and on inspection of OwningPtr.h appears to have the same semantics. release() says that you're giving up ownership, and somebody better take over. move() says: treat this object as a temporary and do with it what you want. If you do nothing, a move() still shouldn't leak:

{
unique_ptr<int> p(new int);
move(p);
} // the memory is not leaked

{
unique_ptr<int> p(new int);
p.release();
} // the memory is leaked

release/take returns a raw pointer. Somebody has to catch it. The C++0X move(unique_ptr<T>) returns a unique_ptr<T>&&. The C++03 emulation I'm working on returns a unique_ptr<T>. The resource is never exposed. You can not pass it to somebody and have them forget to own it (like you can a raw pointer).

Sebastian's ASTOwner::move() returns a ASTMove. I'm not yet positive what it will do with:

{
ASTOwner<&Action::DeleteStmt> p(actions);
p.move();
} // leaked?

And has a disadvantage that if passed to a templated function, the wrong type gets deduced.

template <class T> void foo(T);

foo(p.move()); // T deduced as ASTMove instead of as ASTOwner

Fwiw I'm pretty close to a C++03 unique_ptr that is passing all my tests so far. It needs a <type_traits> (which I also have). It doesn't have the "moves from const" bug that my previous emulation attempt had.

I just ran it through this demo (using g++-4.0 and g++-4.2):

// Move mini-tutorial with unique_ptr.
//
// If the unique_ptr has a name and you want
// to transfer ownership to a new scope,
// then you have to use std::move().
// If the unique_ptr doesn't have a name
// (because it is a temporary), then
// you don't need to wrap it in a
// std::move() to move it. It will move
// anyway.
// If you accidently forget a move where it is
// needed (such as from an lvalue), then your
// code will not compile.

#include <memory>
#include <cstdio>

class Action
{
public:
     void DeleteStmt(void* p) {std::printf("DeleteStmt operating on %p\n", p);}
     void DeleteExpr(void* p) {std::printf("DeleteExpr operating on %p\n", p);}
};

template <void (Action::*Destroyer)(void*)>
class ASTDeleter {
   Action &Actions;

   // Reference member prevents copy assignment.

public:
   explicit ASTDeleter(Action &actions) : Actions(actions) {}

   void operator()(void* Node) {
       (Actions.*Destroyer)(Node); // no null check needed
   }
};

typedef std::unique_ptr<void, ASTDeleter<&Action::DeleteStmt> > StmtGuard;
typedef std::unique_ptr<void, ASTDeleter<&Action::DeleteExpr> > ExprGuard;

ExprGuard
ExprGuardSource(Action& Actions)
{
     std::printf("ExprGuardSource: begin\n");
     // Exactly one "new"
     ExprGuard ex((void*)1, ExprGuard::deleter_type(Actions));
     std::printf("ExprGuardSource: end\n");
     return move(ex); // explicit move needed for C++03 emulation
}

void
ExprGuardSink(ExprGuard p1)
{
     std::printf("ExprGuardSink: begin\n");
     {
     ExprGuard p2 = std::move(p1); // won't compile without move
     } // DeleteExpr called here, exactly once
     std::printf("ExprGuardSink: end\n");
}

int main()
{
     std::printf("main: begin\n");
     void* Body = 0;
     Action Actions;
     std::printf("main: constructing BodyGuard1\n");
     {
     StmtGuard BodyGuard1(Body, StmtGuard::deleter_type(Actions));
     }
     std::printf("main: BodyGuard1 is destructed\n");
     std::printf("main: calling ExprGuardSource\n");
     {
     std::printf("main: constructing BodyGuard2\n");
     ExprGuard BodyGuard2 = ExprGuardSource(Actions);
     std::printf("main: calling ExprGuardSink\n");
     ExprGuardSink(std::move(BodyGuard2)); // won't compile without move
     }
     std::printf("main: BodyGuard2 is destructed\n");
     std::printf("main: end\n");
}

and the output is:

main: begin
main: constructing BodyGuard1
main: BodyGuard1 is destructed
main: calling ExprGuardSource
main: constructing BodyGuard2
ExprGuardSource: begin
ExprGuardSource: end
main: calling ExprGuardSink
ExprGuardSink: begin
DeleteExpr operating on 0x1
ExprGuardSink: end
main: BodyGuard2 is destructed
main: end

Like Sebastian's ASTOwner, it needs an explicit move on return statements, unless you are returning an rvalue. This move would best be removed when C++03 compilers no longer need to be supported. However in C++0X the extraneous move will inhibit RVO, but otherwise will not impact the logic.

-Howard

Chris Lattner wrote:

Chris Lattner wrote:

The move emulation of this pointer is specialized to support backwards compatibility. The ASTMove class, core of the move emulation, cannot only be implicitly converted to a new ASTOwner (which will take ownership), but also to an ActionResult or a void pointer, which also have ownership. This allows the use of move() in many contexts, such as returning an ASTOwner where an ActionResult is expected, or passing a void* to the Action by calling move(). This is a significant advantage in making the use of the smart pointer intuitive: move() is used whenever you give up ownership, no matter what receives it.

In terms of API, can we have a better name than .move() ? From the API perspective, seeing:
+ return Idx.move();
is really strange.

How about .take(), as in "take ownership"?

I prefer move() over take() for two reasons:
1) It is more consistent with the name std::move(), which is what moving will inevitably be called in C++0x.
2) take() is loaded with the semantics of llvm::OwningPtr; although the function in question has equivalent semantics in general, there are some subtle differences.

The similarity with owningptr was why I liked it :), what subtle differences do you see?

The main difference is what Howard pointed out: move() doesn't immediately give up ownership, but only when someone subsequently takes it.

I personally don't find move() strange at all, but then I've been preoccupied with move semantics and their emulation since I learned of the concept two years ago. (Note: with proper C++0x moving, there'll be no need to call move() anymore. However, due to the reference binding rules of C++03, the only way to achieve this is to allow the highly unsafe practice of moving from const references.)

Yep yep, rvalue refs are very nice. Unfortunately we probably can't actually *use* them in llvm/clang until they are widely deployed in vendor compilers. :frowning: It sucks to have to write portable code.

Fortunately, rvalue references are the first thing any compiler implements from C++0x, apparently. GCC 4.3+ supports them, CodeWarrior supports them, Visual Studio 2010 supports them. They will definitely be the first thing that can be used portably.

Ok, just give a head's up when you commit and someone else can test before/after your change. If C++ support were farther along, you could just #include <iostream> or something, which is a somewhat decent metric.

I'm not so sure about that. I think the parser time might well be drowned out by the template instantiation time. <iostream> instantiates a lot of templates: the iostream hierarchy (3 classes, plus members), the streambuf hierarchy (2 classes), locales (lots of facet templates), and who knows what else. We'd need the extern template extension, anyway.

I'm on a Linux system. Surely there must be some huge system headers out there. But in my search, I can't find anything. A combination of gtk.h, ncurses.h and WINE's windows.h merely yields 2.6 MB.

+#include "AstGuard.h"

Is "AstGuard.h" going to be renamed at some point?

Yes. In fact, it's probably going to be removed, because the smart pointers will move out of Parser-private into Parser-interface.

Sebastian

Howard Hinnant wrote:

I personally don't find move() strange at all, but then I've been
preoccupied with move semantics and their emulation since I learned
of the concept two years ago. (Note: with proper C++0x moving,
there'll be no need to call move() anymore. However, due to the
reference binding rules of C++03, the only way to achieve this is to
allow the highly unsafe practice of moving from const references.)
      

Yep yep, rvalue refs are very nice. Unfortunately we probably can't
actually *use* them in llvm/clang until they are widely deployed in
vendor compilers. :frowning: It sucks to have to write portable code.

What do other people think about this name?
    
It shouldn't be surprising that I prefer move(). In fact I'd prefer a namespace scope move, but I have no real problem with the member move.

I'm looking forward to seeing your tricks to allow a namespace scope move bind to temporaries but not const lvalues. I don't know how to do it, so my move() is a member.

Sebastian's ASTOwner::move() returns a ASTMove. I'm not yet positive what it will do with:

{
ASTOwner<&Action::DeleteStmt> p(actions);
p.move();
} // leaked?
  

Nope. ASTMove emulates an rvalue reference. It's patterned after Boost.Thread's move helper.

And has a disadvantage that if passed to a templated function, the wrong type gets deduced.
  

True. I find the likelihood of this in the Clang source low.

Fwiw I'm pretty close to a C++03 unique_ptr that is passing all my tests so far. It needs a <type_traits> (which I also have). It doesn't have the "moves from const" bug that my previous emulation attempt had.
  

I'd love to see it.

template <void (Action::*Destroyer)(void*)>
class ASTDeleter {
   Action &Actions;

   // Reference member prevents copy assignment.

public:
   explicit ASTDeleter(Action &actions) : Actions(actions) {}

   void operator()(void* Node) {
       (Actions.*Destroyer)(Node); // no null check needed
   }
};
  

Is there a specific reason the constructor can't be implicit? Then you could leave out the long typedef here:

     ExprGuard ex((void*)1, ExprGuard::deleter_type(Actions));
  

and just write ExprGuard ex(ptr, Actions).
Also, can I trick your unique_ptr to give up its raw pointer after it has been moved from? For convenience in the transition phase, that would be invaluable. In my current implementation you can do:

void takesOwnershipButHasNotBeenConvertedToBeSmart(ExprTy*);

ExprOwner ptr(getExprFromSomewhere());
takesOwnershipButHasNotBeenConvertedToBeSmart(ptr.move());

The important part is that it's always move(), even after converting the function to take an ExprOwner.

Sebastian

I'm looking forward to seeing your tricks to allow a namespace scope move bind to temporaries but not const lvalues. I don't know how to do it, so my move() is a member.

The design is pretty much taken from auto_ptr. It is a painful but effective means of installing move semantics into a class.

And has a disadvantage that if passed to a templated function, the wrong type gets deduced.

True. I find the likelihood of this in the Clang source low.

Agreed.

Fwiw I'm pretty close to a C++03 unique_ptr that is passing all my tests so far. It needs a <type_traits> (which I also have). It doesn't have the "moves from const" bug that my previous emulation attempt had.

I'd love to see it.

I'll try to get it in soon. I'm not yet positive what "in" means. Does the clang project want unique_ptr? How should it handle pre-std library components (what namespace)? Does clang want it prepped for rvalue-ref support via #ifdef's, or just have the emulation?

template <void (Action::*Destroyer)(void*)>
class ASTDeleter {
  Action &Actions;

  // Reference member prevents copy assignment.

public:
  explicit ASTDeleter(Action &actions) : Actions(actions) {}

  void operator()(void* Node) {
      (Actions.*Destroyer)(Node); // no null check needed
  }
};

Is there a specific reason the constructor can't be implicit? Then you could leave out the long typedef here:

    ExprGuard ex((void*)1, ExprGuard::deleter_type(Actions));

and just write ExprGuard ex(ptr, Actions).

No reason at all. I was just "winging" an ASTDeleter by imitating your code without really understanding it at all.

Also, can I trick your unique_ptr to give up its raw pointer after it has been moved from? For convenience in the transition phase, that would be invaluable. In my current implementation you can do:

void takesOwnershipButHasNotBeenConvertedToBeSmart(ExprTy*);

ExprOwner ptr(getExprFromSomewhere());
takesOwnershipButHasNotBeenConvertedToBeSmart(ptr.move());

The important part is that it's always move(), even after converting the function to take an ExprOwner.

I'm not positive I understand your question 100%, but I'll try a shotgun answer. If I don't hit the target, just ask again. :slight_smile:

unique_ptr is basically designed to be API compatible with auto_ptr as much as possible. Like auto_ptr it has members:

T* get() const; // return pointer without giving up ownership
T* release(); // return pointer and give up ownership. postcondition is get() == 0.

After a move, the source.get() == 0. You can also source.release() and it will also return 0.

If I'm reading your code example correctly, I think this is as simple as:

    ExprOwner ptr(getExprFromSomewhere());
    takesOwnershipButHasNotBeenConvertedToBeSmart(ptr.release());

However, after takesOwnershipButHasNotBeenConvertedToBeSmart is converted to take a unique_ptr:

     void takesOwnership(unique_ptr<T>);

then the client code could change in one of several ways, but would have to change (else compile-time error):

    takesOwnership(unique_ptr<T>(ptr.release()));

or more simply:

    takesOwnership(move(ptr));

or if logic allows, just:

    takesOwnership(getExprFromSomewhere()); // assumes getExprFromSomewhere() returns a unique_ptr

So the syntax is different for releasing ownership to a raw pointer, and transferring ownership to another unique_ptr. And the compiler will tell you when and where you need to change when takesOwnershipButHasNotBeenConvertedToBeSmart gets updated. You won't create run time errors. I believe having different syntax is safer. release() is by nature more dangerous than move(). If you're using release(), there is a potential for leaking - you are transferring ownership to a raw pointer. If you're using move(), there is no potential for leaking. After the transition period, release() should be used relatively rarely. It will be helpful to be able to search for these places and not get drowned out by all the "moves".

If it helps, here is the current C++0X synopsis:

template <class T, class D = default_delete<T>>
class unique_ptr
{
public:
     typedef implementation-defined pointer;
     typedef T element_type;
     typedef D deleter_type;

     // constructors
     unique_ptr();
     explicit unique_ptr(pointer p);
     unique_ptr(pointer p, implementation-defined d);
     unique_ptr(uniquePtr&& u);
     unique_ptr(nullptr_t);
     template <class U, class E> unique_ptr(unique_ptr<U, E>&& u);

     // destructor
     ∼unique_ptr();

     // assignment
     unique_ptr& operator=(unique_ptr&& u);
     template <class U, class E> unique_ptr& operator=(unique_ptr<U,

&& u);

     unique_ptr& operator=(unspecified-pointer-type);

     // observers
     typename add_lvalue_reference<T>::type operator*() const;
     pointer operator->() const;
     pointer get() const;
     deleter_type& get_deleter();
     const deleter_type& get_deleter() const;
     explicit operator bool() const;

     // modifiers
     pointer release();
     void reset(pointer p = pointer());
     void swap(unique_ptr&& u);

     // disable copy from lvalue
     unique_ptr(const unique_ptr&) = delete;
     template <class U, class E> unique_ptr(const unique_ptr<U, E>&) = delete;
     unique_ptr& operator=(const unique_ptr&) = delete;
     template <class U, class E> unique_ptr& operator=(const unique_ptr<U, E>&) = delete;
};

Ref: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2008/n2798.pdf 20.8.12 [unique.ptr].

The emulation I have neglects construction from nullptr_t, but does support construction and assignment from 0. The member swap is likely soon to be changed to take unique_ptr& instead of unique_ptr&& (a minor change). It supports custom pointer types, T == void, incomplete T except for when the deleter is called, stateful deleters, and deleters which are lvalue reference types. The ASTDeleter deleter example I gave is something I would call a stateful deleter. T == void is good for the kinds of things we want to do in clang: keep specific types out of the interface. Use of incomplete (but non-void) T is also helpful in this department (pimpl pattern).

-Howard

Howard Hinnant wrote:

Also, can I trick your unique_ptr to give up its raw pointer after it has been moved from? For convenience in the transition phase, that would be invaluable. In my current implementation you can do:

void takesOwnershipButHasNotBeenConvertedToBeSmart(ExprTy*);

ExprOwner ptr(getExprFromSomewhere());
takesOwnershipButHasNotBeenConvertedToBeSmart(ptr.move());

The important part is that it's always move(), even after converting the function to take an ExprOwner.
    
I'm not positive I understand your question 100%, but I'll try a shotgun answer. If I don't hit the target, just ask again. :slight_smile:
  

In terms of C++0x rvalue references, I've basically implemented an implicit conversion from "unique_ptr<void, ASTDeleter>&&" to void* that is equivalent to release(). Obviously, this is not possible in C++0x, but since my rvalue reference is a special object, it's very easy.

The advantage of this is that I can pass my pointer's mover to a function that is updated to take my pointer as easily as I can pass it to a function that still takes a raw pointer, with no change in the caller's syntax. Since the update is the business of the callee, I don't see this as unsafe, I see it as convenient.
We're not redefining ownership here, we're just trying to enforce the existing rules through the type system. So the caller moves the pointer to the callee, no matter how the callee decides to handle it further.

I'll try to get it in soon. I'm not yet positive what "in" means.

For starters, you could send me a link. :slight_smile: I'm more interested in the technique, though. Using unique_ptr directly is difficult because we need to store the success flag alongside the pointer (and if I wrap unique_ptr, I need to write all the emulation stuff again anyway).

The design is pretty much taken from auto_ptr. It is a painful but effective means of installing move semantics into a class.

But auto_ptr doesn't work with temporaries, does it? I've never used it, really. I'm quite wary of something that has such an incredibly broken MS implementation.

Hm, a small example shows that it does work. Tricky business, that.

Sebastian

I personally don't find move() strange at all, but then I've been
preoccupied with move semantics and their emulation since I learned
of the concept two years ago. (Note: with proper C++0x moving,
there'll be no need to call move() anymore. However, due to the
reference binding rules of C++03, the only way to achieve this is to
allow the highly unsafe practice of moving from const references.)

Yep yep, rvalue refs are very nice. Unfortunately we probably can't
actually *use* them in llvm/clang until they are widely deployed in
vendor compilers. :frowning: It sucks to have to write portable code.

What do other people think about this name?

I'm all for using "move". It's the right name for the semantics of this smart pointer.

Overall, I really like the patch. This is great work, please feel
free to commit it and if others think "move" should be renamed, it can
be done as a follow-on.

Yes, this is great work. Thanks, Sebastian!

  - Doug

In terms of C++0x rvalue references, I've basically implemented an implicit conversion from "unique_ptr<void, ASTDeleter>&&" to void* that is equivalent to release(). Obviously, this is not possible in C++0x, but since my rvalue reference is a special object, it's very easy.

The advantage of this is that I can pass my pointer's mover to a function that is updated to take my pointer as easily as I can pass it to a function that still takes a raw pointer, with no change in the caller's syntax. Since the update is the business of the callee, I don't see this as unsafe, I see it as convenient.
We're not redefining ownership here, we're just trying to enforce the existing rules through the type system. So the caller moves the pointer to the callee, no matter how the callee decides to handle it further.

People have been warning against implicit conversions from smart pointers to raw pointers for a very long time...

http://www.aristeia.com/Papers/C++ReportColumns/jun96.pdf

The bottom line is simple: don’t provide implicit conversion operators to dumb point-
ers unless there is a compelling reason to do so.

Scott Meyers, 1996.

I realize you're only converting implicitly from an ASTMove (result of member move). However this is at least conceptually the same as implicitly converting from an rvalue smart pointer.

I'll try to get it in soon. I'm not yet positive what "in" means.

For starters, you could send me a link. :slight_smile: I'm more interested in the technique, though. Using unique_ptr directly is difficult because we need to store the success flag alongside the pointer (and if I wrap unique_ptr, I need to write all the emulation stuff again anyway).

Info sent offline.

-Howard

Howard Hinnant wrote:

The similarity with owningptr was why I liked it :), what subtle differences do you see?

The main difference is what Howard pointed out: move() doesn't immediately give up ownership, but only when someone subsequently takes it.

Ok!

I personally don't find move() strange at all, but then I've been preoccupied with move semantics and their emulation since I learned of the concept two years ago. (Note: with proper C++0x moving, there'll be no need to call move() anymore. However, due to the reference binding rules of C++03, the only way to achieve this is to allow the highly unsafe practice of moving from const references.)

Yep yep, rvalue refs are very nice. Unfortunately we probably can't actually *use* them in llvm/clang until they are widely deployed in vendor compilers. :frowning: It sucks to have to write portable code.

Fortunately, rvalue references are the first thing any compiler implements from C++0x, apparently. GCC 4.3+ supports them, CodeWarrior supports them, Visual Studio 2010 supports them. They will definitely be the first thing that can be used portably.

Excellent, they are one of my favorite features of '0x.

Ok, just give a head's up when you commit and someone else can test before/after your change. If C++ support were farther along, you could just #include <iostream> or something, which is a somewhat decent metric.

I'm not so sure about that. I think the parser time might well be drowned out by the template instantiation time. <iostream> instantiates a lot of templates: the iostream hierarchy (3 classes, plus members), the streambuf hierarchy (2 classes), locales (lots of facet templates), and who knows what else. We'd need the extern template extension, anyway.

Fair enough. C headers don't have this problem.

I'm on a Linux system. Surely there must be some huge system headers out there. But in my search, I can't find anything. A combination of gtk.h, ncurses.h and WINE's windows.h merely yields 2.6 MB.

Gosh you guys have anemic headers :). Here's a dump of carbon.h on a leopard system with -E -P, it should work for you as a C parser test:

http://nondot.org/sabre/carbon.i.gz
and here's cocoa.h, an objc testcase:
http://nondot.org/sabre/Cocoa.mi.gz

+#include "AstGuard.h"

Is "AstGuard.h" going to be renamed at some point?

Yes. In fact, it's probably going to be removed, because the smart pointers will move out of Parser-private into Parser-interface.

Ok!

Thanks again Sebastian!

-Chris

Chris Lattner wrote:

widely deployed in vendor compilers. :frowning: It sucks to have to write portable code.

Fortunately, rvalue references are the first thing any compiler implements from C++0x, apparently. GCC 4.3+ supports them, CodeWarrior supports them, Visual Studio 2010 supports them. They will definitely be the first thing that can be used portably.

Excellent, they are one of my favorite features of '0x.

Yes, mine too. So deceptively simple, so far-reaching in consequence.

I'm on a Linux system. Surely there must be some huge system headers out there. But in my search, I can't find anything. A combination of gtk.h, ncurses.h and WINE's windows.h merely yields 2.6 MB.

Gosh you guys have anemic headers :). Here's a dump of carbon.h on a leopard system with -E -P, it should work for you as a C parser test:

http://nondot.org/sabre/carbon.i.gz
and here's cocoa.h, an objc testcase:
http://nondot.org/sabre/Cocoa.mi.gz

Thanks, but I get several errors like this:

carbon.i:965:33: error: typedef redefinition with different types ('typeof(sizeof(int))' vs '__darwin_size_t')
typedef __typeof__(sizeof(int)) size_t;
                                ^
carbon.i:157:25: note: previous definition is here
typedef __darwin_size_t size_t;
                        ^

Both with default arch (-triple=x86_64-linux-gnu) and forcing 32-bit (-triple=i686-linux-gnu).

Sebastian

Try building with -triple=i686-apple-darwin9. This is the problem with stripping out macro expansions :frowning:

-Chris

I haven't traced this down exactly, so I'm not sure there's a problem. But in:

   template <void (Action::*Destroyer)(void*)>
   class ASTMove {
     ASTOwner<Destroyer> &Moved;

   public:
     explicit ASTMove(ASTOwner<Destroyer> &moved) : Moved(moved) {}
     ...

we store a reference to the ASTOwner. And I also see client code that looks like:

     ExprOwner Idx(Actions, ParseAssignmentExpression());
     if (Idx.isInvalid()) {
       SkipUntil(tok::r_square);
       return Idx.move();
     }

Are we in danger of returning a reference to the local variable Idx?

-Howard

Howard Hinnant wrote:

I haven't traced this down exactly, so I'm not sure there's a problem. But in:

   template <void (Action::*Destroyer)(void*)>
   class ASTMove {
     ASTOwner<Destroyer> &Moved;

   public:
     explicit ASTMove(ASTOwner<Destroyer> &moved) : Moved(moved) {}
     ...

we store a reference to the ASTOwner. And I also see client code that looks like:

     ExprOwner Idx(Actions, ParseAssignmentExpression());
     if (Idx.isInvalid()) {
       SkipUntil(tok::r_square);
       return Idx.move();
     }

Are we in danger of returning a reference to the local variable Idx?
  

No. The return type is an ExprResult, which is initialized with the ASTMove. As long as no one gets the idea that using an ASTMove as a return type is a good thing (it *isn't*), there's no problem.

Sebastian

Here's one way you can have the compiler enforce that good rule:

   /// RAII owning pointer for StmtTys and ExprTys. Simple move emulation.
   template <void (Action::*Destroyer)(void*)>
   class ASTOwner {

     Action &Actions;
     void *Node;
     bool Invalid;

     void destroy() {
       if (Node)
         (Actions.*Destroyer)(Node);
     }

     // Move emulation
     ASTOwner(ASTOwner&); // DO NOT IMPLEMENT
     ASTOwner& operator=(ASTOwner&); // DO NOT IMPLEMENT

     struct ASTMove
     {
         Action &Actions;
         void *Node;
         bool Invalid;

         ASTMove(Action &actions, void *p, bool Inv)
             : Actions(actions), Node(p), Invalid(Inv) {}
     };

   public:
     operator ASTMove() {return ASTMove(Actions, take(), Invalid);}
     ASTOwner(ASTMove r)
       : Actions(r.Actions), Node(r.Node), Invalid(r.Invalid) {}
     ASTOwner& operator=(ASTMove r) {
       reset(r.ptr_);
       Invalid = r.Invalid;
       // Actions = r.Actions;?
       return *this;
     }
     friend ASTOwner move(ASTOwner& p) {return ASTOwner(ASTMove(p));}
     // End move emulation

     explicit ASTOwner(Action &actions, bool invalid = false)
       : Actions(actions), Node(0), Invalid(invalid) {}
     ASTOwner(Action &actions, void *node)
       : Actions(actions), Node(node), Invalid(false) {}

     void reset(void *node = 0) {
       destroy();
       Node = node;
       Invalid = false;
     }

     void *take() {
       void *Temp = Node;
       Node = 0;
       return Temp;
     }
     void *get() const { return Node; }
     bool isInvalid() const { return Invalid; }
     bool isUsable() const { return !Invalid && Node; }

   };

ASTMove is now a private nested class of ASTOwner. No one can even mention it's existence. And ASTOwner is moveable but not copyable. You can return an ASTOwner from a function (with move(expr)), and pass it by value to a sink, as long as the expression is an rvalue or the result of move(expr) (which is nothing but an rvalue ASTOwner).

Disclaimer: I may have some details particular to ASTOwner in the above sketch incorrect. The intent is to show how to add move emulation to C++03 types (it's easier when you don't have to deal with converting constructors).

-Howard

Howard Hinnant wrote:

No. The return type is an ExprResult, which is initialized with the ASTMove. As long as no one gets the idea that using an ASTMove as a return type is a good thing (it *isn't*), there's no problem.
    
Here's one way you can have the compiler enforce that good rule:
  

While good in principle, I've got a very special case here, because of the Parser/Sema separation. To further process the stuff coming from the parser in the sema, I'll need another smart pointer there, one that actually deals in AST's Expr and Stmt, not just void typedefs. This smart pointer must be move-constructible *from ASTOwner*, i.e. have a constructor that takes an ASTMove. ASTMove thus must be accessible to this new smart pointer.

Further, because of the separation, ASTOwner cannot, in any way, know about this smart pointer (in order to make it a friend). This would be a serious layering violation.

So ASTMove will have to remain on namespace scope for the time being. But I'm considering moving it to a sub-namespace, so that it doesn't appear in the clang namespace directly.

Sebastian