New type of smart pointer for LLVM

Hello everyone,

I bring to discussion the necessity/design of a new type of smart pointer. r215176 and r217791 rise the problem, D5443 is devoted to the solution.
r215176 applies several temporary ugly fixes of memory leaks in TGParser.cpp which would be great to be refactored using smart pointers. D5443 demonstrates how the solution with a certain type of smart pointer would look like (see changes in TGParser::ParseDef(), TGParser::InstantiateMulticlassDef() and TGParser::ParseSimpleValue()).

Briefly:
consider a leaky example:
{
T* p = new T;
if (condition1) {
f(p); // takes ownership of p
}
p->SomeMethod();

if (condition2) {
return nullptr; // Leak!
}

g(p); // don’t take ownership of p
return p;
}

The preferred solution would look like:
{
smart_ptr p(new T);
if (condition1) {
f(p.StopOwn()); // takes ownership of p
}
p->SomeMethod();

if (condition2) {
return nullptr; //
}

g(p.Get()); // don’t take ownership of p
return p.StopOwn();
}

Neither unique_ptr nor shared_ptr can be used in the place of smart_ptr as unique_ptr sets the raw pointer to nullptr after release() (StopOwn() in the example above) whereas shared_ptr is unable to release.

Attached is a scratch that illustrates how the minimal API/implementation of a desired smart pointer sufficient for refactoring would look like. Any ideas and suggestions are appreciated.

CondOwnershipPtr.h (1.13 KB)

Anton Yartsev <anton.yartsev@gmail.com> writes:

Hello everyone,

I bring to discussion the necessity/design of a new type of smart pointer.
r215176 and r217791 rise the problem, D5443 is devoted to the solution.
r215176 applies several temporary ugly fixes of memory leaks in TGParser.cpp
which would be great to be refactored using smart pointers. D5443 demonstrates
how the solution with a certain type of smart pointer would look like (see
changes in TGParser::ParseDef(), TGParser::InstantiateMulticlassDef() and
TGParser::ParseSimpleValue()).

Briefly:
consider a leaky example:
{
  T* p = new T;
  if (condition1) {
    f(p); // takes ownership of p
  }
  p->SomeMethod();

  if (condition2) {
    return nullptr; // Leak!
  }

  g(p); // don't take ownership of p
  return p;
}

The preferred solution would look like:
{
  smart_ptr<T> p(new T);
  if (condition1) {
    f(p.StopOwn()); // takes ownership of p

So this takes ownership, but promises not to destroy the pointee in some
way?

  }
  p->SomeMethod();

  if (condition2) {
    return nullptr; //

I guess p is sometimes destroyed here, depending on condition1?

  }

  g(p.Get()); // don't take ownership of p
  return p.StopOwn();
}

What does it mean to stop owning the pointer twice? Doesn't this leak p
in the case where condition1 was false?

Neither unique_ptr nor shared_ptr can be used in the place of smart_ptr as
unique_ptr sets the raw pointer to nullptr after release() (StopOwn() in the
example above) whereas shared_ptr is unable to release.

I don't understand why shared_ptr wouldn't suffice for the example
above. There are two cases in your example where you try to release the
pointer - in one of them it seems you don't actually want to release it,
because you continue using it after, and in the other the scope ends
immediately after the release. The shared_ptr releases when it goes out
of scope, so it seems to be exactly what you want here.

Maybe I'm just misunderstanding the use case here, but I fear that a
type like this one would just serve to paper over problems where the
ownership is very unclear, without really helping much.

Attached is a scratch that illustrates how the minimal
API/implementation of a desired smart pointer sufficient for
refactoring would look like. Any ideas and suggestions are
appreciated.

//===-- CondOwnershipPtr.h - Smart ptr with conditional release -*- C++ -*-===//
//
// The LLVM Compiler Infrastructure
//
// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.
//===----------------------------------------------------------------------===//

#ifndef LLVM_ADT_CONDOWNERSHIPPTR_H
#define LLVM_ADT_CONDOWNERSHIPPTR_H

namespace llvm {

template<class T>
class CondOwnershipPtr {
  T* Ptr;
  bool Own;

  void Delete() {
    if (Ptr && !Own)
      delete Ptr;
  }

This seems to delete the pointer iff we *don't* own it. I think you have
that backwards...

Anton Yartsev <anton.yartsev@gmail.com> writes:
> Hello everyone,
>
> I bring to discussion the necessity/design of a new type of smart
pointer.
> r215176 and r217791 rise the problem, D5443 is devoted to the solution.
> r215176 applies several temporary ugly fixes of memory leaks in
TGParser.cpp
> which would be great to be refactored using smart pointers. D5443
demonstrates
> how the solution with a certain type of smart pointer would look like
(see
> changes in TGParser::ParseDef(), TGParser::InstantiateMulticlassDef() and
> TGParser::ParseSimpleValue()).
>
> Briefly:
> consider a leaky example:
> {
> T* p = new T;
> if (condition1) {
> f(p); // takes ownership of p
> }
> p->SomeMethod();
>
> if (condition2) {
> return nullptr; // Leak!
> }
>
> g(p); // don't take ownership of p
> return p;
> }
>
> The preferred solution would look like:
> {
> smart_ptr<T> p(new T);
> if (condition1) {
> f(p.StopOwn()); // takes ownership of p

So this takes ownership, but promises not to destroy the pointee in some
way?

> }
> p->SomeMethod();
>
> if (condition2) {
> return nullptr; //

I guess p is sometimes destroyed here, depending on condition1?

> }
>
> g(p.Get()); // don't take ownership of p
> return p.StopOwn();
> }

What does it mean to stop owning the pointer twice? Doesn't this leak p
in the case where condition1 was false?

> Neither unique_ptr nor shared_ptr can be used in the place of smart_ptr
as
> unique_ptr sets the raw pointer to nullptr after release() (StopOwn() in
the
> example above) whereas shared_ptr is unable to release.

I don't understand why shared_ptr wouldn't suffice for the example
above. There are two cases in your example where you try to release the
pointer - in one of them it seems you don't actually want to release it,
because you continue using it after, and in the other the scope ends
immediately after the release. The shared_ptr releases when it goes out
of scope, so it seems to be exactly what you want here.

Maybe I'm just misunderstanding the use case here, but I fear that a
type like this one would just serve to paper over problems where the
ownership is very unclear, without really helping much.

Perhaps this is a case of poor examples, I'm not sure. I certainly share
your concern that this might not be the right path forward, but it's
something that I've seen come up repeatedly when doing unique_ptr
migrations, perhaps moreso in Clang's frontend code.

Essentially there are a variety of data structures that "sometimes" own
their underlying data. Often implemented as a raw pointer and a boolean.
(sometimes I've migrated this to a raw pointer and a unique_ptr - the raw
pointer always points to the thing of interest, the unique_ptr is sometimes
non-null and points to the same thing to keep it alive (or destroy it when
it's no longer needed, more specifically))

I can go & dredge up some examples if we want to discuss the particular
merits & whether each of those cases would be better solved in some other
way, but it seemed pervasive enough in the current codebase that some
incremental improvement could be gained by replacing these cases with a
more specific tool for the job. We might still consider this tool to be
"not ideal".

Having done that in another code base, I can see both merits and problems.

Our smart pointer behaved more or less like it's described above
(explicit acquire/release) and we've seen around 20% performance
improvement over shared_ptr due to savings on acquire/release
semantics. We had three derivative types of pointers
(Shared/Owned/Linked) which would just differ on the default behaviour
of construction / destruction, but all of them could still explicitly
call getClear / getLink / etc.

The downside was that almost every interaction with smart pointers had
to be carefully planned and there was a lot of room for errors,
especially from people that didn't know the context. In the end, the
amount of work that had to be put to make it work was similar than
when dealing with normal pointers, but you had to learn yet another
pointer semantics.

The marginal gain was that pointer interactions were explicit, making
it easier for someone *not* used to C++ pointer semantics to
understand when reading code, not necessarily when writing it. The
marginal lost was getting people that already knew the STL and Boost
smart pointer semantics to get confused.

Having done that, I still rather use normal pointers and have valgrind
/ sanitizers tell me when I screwed up.

My tuppence.

cheers,
--renato

Yes. Yes. The above example just illustrate the typical common patterns. By the way, this particular example with several insignificant replaces is much similar to the logic of TGParser::InstantiateMulticlassDef() (, memory is allocated at line 2406, at line 2457 Records.addDef(CurRec) takes ownership of CurRec). More examples starting from line 2020, method TGParser::ParseDef(). I agree the solution is not ideal, but it looks much safer to me then just adding and switching extra boolean variables indicating whether the memory should be freed or not and inserting missing, sometimes conditional, 'delete’s. With the suggested solution you should only notice and call StopOwn() in places, where ownership is transferred. The ownership logic is quite simple: “the wrapped memory should be freed until otherwise stated”. To be more clear I also intend to design the wrapper for local scope usage only preventing it from copiying/moving. Right, thanks!

> I can go & dredge up some examples if we want to discuss the particular
> merits & whether each of those cases would be better solved in some other
> way, but it seemed pervasive enough in the current codebase that some
> incremental improvement could be gained by replacing these cases with a
more
> specific tool for the job. We might still consider this tool to be "not
> ideal".

Having done that in another code base, I can see both merits and problems.

Our smart pointer behaved more or less like it's described above
(explicit acquire/release)

I'm not quite sure I follow you (or that you're following me). The goal
isn't about being explicit about acquire/release (indeed I wouldn't mind
more implicit acquisition from an always-owning pointer (unique_ptr) to a
sometimes-owning pointer (whatever we end up calling it), though release is
always going to be explicit I suspect).

and we've seen around 20% performance
improvement over shared_ptr due to savings on acquire/release
semantics.

The existing use cases I'm interested in aren't using shared_ptr and
wouldn't be ideally migrated to it (in some cases the existing ownership
might be on the stack (so it can't be shared) or several layers up the call
stack through which raw pointers have been passed (eg: build something,
pass it through a few APIs, underlying thing 'wants' to take ownership, but
it will be constructed/destroyed before this API returns - so we hack
around it either by having a "OwnsThing" flag in the underlying thing, or
having a "takeThing" we hope we call before the underlying thing dies and
destroys the thing)).

We're dealing with types that have raw pointer + bool indicating ownership
members or worse, types which take ownership but we lie to about giving
them ownership (so some API takes a non-owning T* (or T&), passes it as
owning to some other thing, then is sure to take ownership back from that
thing and call 'release()" on the unique_ptr to ensure ownership was never
really taken).

We had three derivative types of pointers
(Shared/Owned/Linked) which would just differ on the default behaviour
of construction / destruction, but all of them could still explicitly
call getClear / getLink / etc.

The downside was that almost every interaction with smart pointers had
to be carefully planned and there was a lot of room for errors,

My hope is that having a single construct for conditional ownership will be
less confusing than the existing ad-hoc solutions in many places. It's
possible that the right answer is to remove the conditional ownership in
these APIs entirely, but I'm not sure that's possible/practical/desired (it
might be - which is why I've been hesitant to write this abstraction myself
just yet - sort of letting it stew in my head a bit to see what feels
right).

especially from people that didn't know the context. In the end, the
amount of work that had to be put to make it work was similar than
when dealing with normal pointers,

I rather hope that's not the case - these conditionally owning raw pointers
are pretty subtle, easy to miss a delete and leak, easy to have an early
return and fail to take back and release ownership from something that
wasn't really owning in the first place, etc.

but you had to learn yet another
pointer semantics.

The marginal gain was that pointer interactions were explicit, making
it easier for someone *not* used to C++ pointer semantics to
understand when reading code, not necessarily when writing it. The
marginal lost was getting people that already knew the STL and Boost
smart pointer semantics to get confused.

This is another pointer semantic - I'm not suggesting replacing unique_ptr
or shared_ptr - I want to use them as much as possible where they represent
the right semantic. I'm just dealing with a situation that doesn't map
directly to either of those: conditional ownership.

    > I can go & dredge up some examples if we want to discuss the particular
    > merits & whether each of those cases would be better solved in some other
    > way, but it seemed pervasive enough in the current codebase that some
    > incremental improvement could be gained by replacing these cases with a more
    > specific tool for the job. We might still consider this tool to be "not
    > ideal".

    Having done that in another code base, I can see both merits and
    problems.

    Our smart pointer behaved more or less like it's described above
    (explicit acquire/release)

I'm not quite sure I follow you (or that you're following me). The goal
isn't about being explicit about acquire/release (indeed I wouldn't mind
more implicit acquisition from an always-owning pointer (unique_ptr) to
a sometimes-owning pointer (whatever we end up calling it), though
release is always going to be explicit I suspect).

Given the example, the smart_ptr can't deal with ownership at all, it merely destroys the pointee in the face of exceptions or forgetfulness (which, in the example, can't really be tolerated).

    and we've seen around 20% performance
    improvement over shared_ptr due to savings on acquire/release
    semantics.

The existing use cases I'm interested in aren't using shared_ptr and
wouldn't be ideally migrated to it (in some cases the existing ownership
might be on the stack (so it can't be shared)

null_deleter?

or several layers up the
call stack through which raw pointers have been passed (eg: build
something, pass it through a few APIs, underlying thing 'wants' to take
ownership, but it will be constructed/destroyed before this API returns
- so we hack around it either by having a "OwnsThing" flag in the
underlying thing, or having a "takeThing" we hope we call before the
underlying thing dies and destroys the thing)).

Obviously combing the ownership flag and the pointer into some kind of "smart type" would be preferable, since they must be dealt with together.

We're dealing with types that have raw pointer + bool indicating
ownership members or worse, types which take ownership but we lie to
about giving them ownership (so some API takes a non-owning T* (or T&),
passes it as owning to some other thing, then is sure to take ownership
back from that thing and call 'release()" on the unique_ptr to ensure
ownership was never really taken).

Confusing. null_deleter, again?

    We had three derivative types of pointers
    (Shared/Owned/Linked) which would just differ on the default behaviour
    of construction / destruction, but all of them could still explicitly
    call getClear / getLink / etc.

    The downside was that almost every interaction with smart pointers had
    to be carefully planned and there was a lot of room for errors,

My hope is that having a single construct for conditional ownership will
be less confusing than the existing ad-hoc solutions in many places.
It's possible that the right answer is to remove the conditional
ownership in these APIs entirely, but I'm not sure that's
possible/practical/desired (it might be - which is why I've been
hesitant to write this abstraction myself just yet - sort of letting it
stew in my head a bit to see what feels right).

If the code really does look like the example, hesitation is probably wise.

    especially from people that didn't know the context. In the end, the
    amount of work that had to be put to make it work was similar than
    when dealing with normal pointers,

I rather hope that's not the case - these conditionally owning raw
pointers are pretty subtle, easy to miss a delete and leak, easy to have
an early return and fail to take back and release ownership from
something that wasn't really owning in the first place, etc.

    but you had to learn yet another
    pointer semantics.

    The marginal gain was that pointer interactions were explicit, making
    it easier for someone *not* used to C++ pointer semantics to
    understand when reading code, not necessarily when writing it. The
    marginal lost was getting people that already knew the STL and Boost
    smart pointer semantics to get confused.

This is another pointer semantic - I'm not suggesting replacing
unique_ptr or shared_ptr - I want to use them as much as possible where
they represent the right semantic. I'm just dealing with a situation
that doesn't map directly to either of those: conditional ownership.

I think I may have mentioned null_deleter.

    Having done that, I still rather use normal pointers and have valgrind
    / sanitizers tell me when I screwed up.

    My tuppence.

consider a leaky example:
{
   T* p = new T;
   if (condition1) {
     f(p); // takes ownership of p
   }
   p->SomeMethod();

   if (condition2) {
     return nullptr; // Leak!
   }

   g(p); // don't take ownership of p
   return p;
}

Let's see what we can do:

{
   auto p = llvm::make_unique<T>();
   if (condition1) {
     f(p); // takes ownership of p
   }
   p->SomeMethod();

   if (condition2) {
     return nullptr; // no leak;
   }

   g(p.get()); // don't take ownership of p
   return p; // not p.get()!
}

f(llvm::unique_ptr<T>& p)
{
    if(!condition2) // I guess
       p = llvm::make_unique<T>(p, null_deleter()); [1]
}

Clearly the code is a mess as f(llvm::unique_ptr<T>&) needs to know condition2, but smart_ptr as described isn't helping anything.

[1] std::unique_ptr doesn't have a type erased deleter, but perhaps a unique_ptr with a type erased deleter is actually the right tool for the job.

Ben

    > I can go & dredge up some examples if we want to discuss the
particular
    > merits & whether each of those cases would be better solved in some
other
    > way, but it seemed pervasive enough in the current codebase that
some
    > incremental improvement could be gained by replacing these cases
with a more
    > specific tool for the job. We might still consider this tool to be
"not
    > ideal".

    Having done that in another code base, I can see both merits and
    problems.

    Our smart pointer behaved more or less like it's described above
    (explicit acquire/release)

I'm not quite sure I follow you (or that you're following me). The goal
isn't about being explicit about acquire/release (indeed I wouldn't mind
more implicit acquisition from an always-owning pointer (unique_ptr) to
a sometimes-owning pointer (whatever we end up calling it), though
release is always going to be explicit I suspect).

Given the example, the smart_ptr can't deal with ownership at all, it
merely destroys the pointee in the face of exceptions or forgetfulness
(which, in the example, can't really be tolerated).

     and we've seen around 20% performance

    improvement over shared_ptr due to savings on acquire/release
    semantics.

The existing use cases I'm interested in aren't using shared_ptr and
wouldn't be ideally migrated to it (in some cases the existing ownership
might be on the stack (so it can't be shared)

null_deleter?

or several layers up the

call stack through which raw pointers have been passed (eg: build
something, pass it through a few APIs, underlying thing 'wants' to take
ownership, but it will be constructed/destroyed before this API returns
- so we hack around it either by having a "OwnsThing" flag in the
underlying thing, or having a "takeThing" we hope we call before the
underlying thing dies and destroys the thing)).

Obviously combing the ownership flag and the pointer into some kind of
"smart type" would be preferable, since they must be dealt with together.

We're dealing with types that have raw pointer + bool indicating

ownership members or worse, types which take ownership but we lie to
about giving them ownership (so some API takes a non-owning T* (or T&),
passes it as owning to some other thing, then is sure to take ownership
back from that thing and call 'release()" on the unique_ptr to ensure
ownership was never really taken).

Confusing. null_deleter, again?

     We had three derivative types of pointers

    (Shared/Owned/Linked) which would just differ on the default behaviour
    of construction / destruction, but all of them could still explicitly
    call getClear / getLink / etc.

    The downside was that almost every interaction with smart pointers had
    to be carefully planned and there was a lot of room for errors,

My hope is that having a single construct for conditional ownership will
be less confusing than the existing ad-hoc solutions in many places.
It's possible that the right answer is to remove the conditional
ownership in these APIs entirely, but I'm not sure that's
possible/practical/desired (it might be - which is why I've been
hesitant to write this abstraction myself just yet - sort of letting it
stew in my head a bit to see what feels right).

If the code really does look like the example, hesitation is probably wise.

     especially from people that didn't know the context. In the end, the

    amount of work that had to be put to make it work was similar than
    when dealing with normal pointers,

I rather hope that's not the case - these conditionally owning raw
pointers are pretty subtle, easy to miss a delete and leak, easy to have
an early return and fail to take back and release ownership from
something that wasn't really owning in the first place, etc.

    but you had to learn yet another
    pointer semantics.

    The marginal gain was that pointer interactions were explicit, making
    it easier for someone *not* used to C++ pointer semantics to
    understand when reading code, not necessarily when writing it. The
    marginal lost was getting people that already knew the STL and Boost
    smart pointer semantics to get confused.

This is another pointer semantic - I'm not suggesting replacing
unique_ptr or shared_ptr - I want to use them as much as possible where
they represent the right semantic. I'm just dealing with a situation
that doesn't map directly to either of those: conditional ownership.

I think I may have mentioned null_deleter.

     Having done that, I still rather use normal pointers and have valgrind

    / sanitizers tell me when I screwed up.

    My tuppence.

consider a leaky example:
{
  T* p = new T;
  if (condition1) {
    f(p); // takes ownership of p
  }
  p->SomeMethod();

  if (condition2) {
    return nullptr; // Leak!
  }

  g(p); // don't take ownership of p
  return p;
}

Yeah, imho this is a somewhat uninteresting example - the issue is just
that ownership is passed elsewhere, then after that point a non-owning
pointer is required. This is statically known and can be coded statically
quite simply:

  unique_ptr ownT = ...;
  /* stuff */
  T *t = ownT.get();
  sink(std::move(ownT));
  return t;

I don't think this particular use case really merits designing a new
(fairly powerful) abstraction. But it's not necessarily easy to provide a
nice example of the issue that's sufficiently motivating.

The use case I have in mind (of which I've seen several in LLVM and Clang)
looks something more like this:

  struct foo {
    unique_ptr<T> u;
    foo(unique_ptr<T> u) : u(std::move(u)) {}
    int stuff();
    unique_ptr<T> take() { return std::move(u); }
  };

  int f1(T& t) {
    // I want to do "stuff" to 't' but I don't own it...

    // lying through my teeth
    foo f(std::unique_ptr<T>(&t));

    // I really hope 'stuff' doesn't throw, because then we'll end up with
a double delete
    int i = f.stuff();
    f.take().release(); // not really leaking because I lied in the first
place
    return i;
  }

That's one case - check the revision numbers mentioned in the first message
on this thread for another example that doesn't involve a member, just some
rather convoluted ownership semantics (some codepaths there's a default
object to point at, other codepaths there's a newly allocated object) it
looks something like this:

  T *t = nullptr;
  bool Owning = false;
  if (x) {
    t = new T;
    Owning = true;
  } else
    t = &default;
  func(*t);
  if (Owning)
    delete t;

This comes up not infrequently. The code in func doesn't care if it owns,
it just wants to do something. Now obviously in the code I just wrote it
could be easily refactored without much duplication:

  if (x)
    func(*make_unique<T>());
  else
    func(default);

But it's not always that simple. Maybe it can/should be made that simple
and we conclude that we don't want/need conditional ownership, but that's
what this thread is hopefully here to help decide.

Let's see what we can do:

{
  auto p = llvm::make_unique<T>();
  if (condition1) {
    f(p); // takes ownership of p
  }
  p->SomeMethod();

  if (condition2) {
    return nullptr; // no leak;
  }

  g(p.get()); // don't take ownership of p
  return p; // not p.get()!
}

f(llvm::unique_ptr<T>& p)
{
   if(!condition2) // I guess
      p = llvm::make_unique<T>(p, null_deleter()); [1]
}

Clearly the code is a mess as f(llvm::unique_ptr<T>&) needs to know
condition2, but smart_ptr as described isn't helping anything.

[1] std::unique_ptr doesn't have a type erased deleter, but perhaps a
unique_ptr with a type erased deleter is actually the right tool for the
job.

Yes, the lack of type erasure is the issue - and then the lack of
compatibility with existing unique_ptrs (I'd obviously like to be able to
make a conditional ownership unique_ptr from an unconditional ownership
unique_ptr&&, for example, and to be able to take unconditional ownership
if it is owned, etc)

The use case I have in mind (of which I've seen several in LLVM and
Clang) looks something more like this:

   struct foo {
     unique_ptr<T> u;
     foo(unique_ptr<T> u) : u(std::move(u)) {}
     int stuff();
     unique_ptr<T> take() { return std::move(u); }
   };

   int f1(T& t) {
     // I want to do "stuff" to 't' but I don't own it...

     // lying through my teeth
     foo f(std::unique_ptr<T>(&t));

     // I really hope 'stuff' doesn't throw, because then we'll end up
with a double delete
     int i = f.stuff();
     f.take().release(); // not really leaking because I lied in the
first place
     return i;
   }

Right, I see...

That's one case - check the revision numbers mentioned in the first
message on this thread for another example that doesn't involve a
member, just some rather convoluted ownership semantics (some codepaths
there's a default object to point at, other codepaths there's a newly
allocated object) it looks something like this:

   T *t = nullptr;
   bool Owning = false;
   if (x) {
     t = new T;
     Owning = true;
   } else
     t = &default;
   func(*t);
   if (Owning)
     delete t;

This comes up not infrequently. The code in func doesn't care if it
owns, it just wants to do something. Now obviously in the code I just
wrote it could be easily refactored without much duplication:

OK, I've seen this pattern in code I really wish I didn't own. I'm especially reminded of a case where the ownership of memory in this example is replaced with ownership of a lock wrapped in a type-erased accessor that can lock at construction or during access further down the stack. I feel your pain.

   if (x)
     func(*make_unique<T>());
   else
     func(default);

But it's not always that simple. Maybe it can/should be made that simple
and we conclude that we don't want/need conditional ownership, but
that's what this thread is hopefully here to help decide.

    Let's see what we can do:

    {
       auto p = llvm::make_unique<T>();
       if (condition1) {
         f(p); // takes ownership of p
       }
       p->SomeMethod();

       if (condition2) {
         return nullptr; // no leak;
       }

       g(p.get()); // don't take ownership of p
       return p; // not p.get()!
    }

    f(llvm::unique_ptr<T>& p)
    {
        if(!condition2) // I guess
           p = llvm::make_unique<T>(p, null_deleter()); [1]
    }

    Clearly the code is a mess as f(llvm::unique_ptr<T>&) needs to know
    condition2, but smart_ptr as described isn't helping anything.

    [1] std::unique_ptr doesn't have a type erased deleter, but perhaps
    a unique_ptr with a type erased deleter is actually the right tool
    for the job..

Yes, the lack of type erasure is the issue - and then the lack of
compatibility with existing unique_ptrs (I'd obviously like to be able
to make a conditional ownership unique_ptr from an unconditional
ownership unique_ptr&&, for example, and to be able to take
unconditional ownership if it is owned, etc)

Your examples are enlightening, thanks.

I can see how a paper_over_the_cracks_ptr<> is tempting during a transitioning period, but I think that's all it can really be, a stop gap to a better world where the logic and ownership are separated.

Much refactoring is afoot.

Ben

Ping!

Suggested is a wrapper over a raw pointer that is intended for freeing wrapped memory at the end of wrappers lifetime if ownership of a raw pointer was not taken away during the lifetime of the wrapper.
The main difference from unique_ptr is an ability to access the wrapped pointer after the ownership is transferred.
To make the ownership clearer the wrapper is designed for local-scope usage only.

The main goal of the wrapper is to eliminate leaks like those in TGParser.cpp -r215176 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/TableGen/TGParser.cpp?r1=215176&r2=215175&pathrev=215176&diff_format=f&gt;\. With the wrapper the fixes applied at r215176 <http://llvm.org/viewvc/llvm-project/llvm/trunk/lib/TableGen/TGParser.cpp?r1=215176&r2=215175&pathrev=215176&diff_format=f&gt; could be refactored in the more easy and safe way.

Attached is a proposed interface/implementation.
Any ideas, suggestions? Is it OK to move forward with the solution?

CondOwnershipPtr.h (1.47 KB)

Ping!

Suggested is a wrapper over a raw pointer that is intended for freeing
wrapped memory at the end of wrappers lifetime if ownership of a raw
pointer was not taken away during the lifetime of the wrapper.
The main difference from unique_ptr is an ability to access the wrapped
pointer after the ownership is transferred.
To make the ownership clearer the wrapper is designed for local-scope
usage only.

I'm still concerned this isn't the right direction, and that we instead
want a more formal "maybe owning pointer". The specific use case you've
designed for here, in my experience, doesn't seem to come up often enough
to justify a custom ADT - but the more general tool of sometimes-owning
smart pointer seems common enough (& problematic enough) to warrant a
generic data structure (and such a structure would also be appliable to the
TGParser use case where this came up).

I'd love to hear some more opinions, but maybe we're not going to get
them...

I strongly agree that the example here isn't compelling.

I think it is a very fundamental design problem when there is a need for a
pointer value after it has been deallocated...

I even question whether we need a "maybe owning" smart pointer, or whether
this is just an indication that the underlying data structure is *wrong*.
The idea of "maybe" and "owning" going to gether, in any sense, seems flat
out broken to me from a design perspective, and so I'm not enthused about
providing abstractions that make it easier to build systems with unclear
ownership semantics.

David, could you, please, clarify the concept of the “maybe owning pointer”? Not deallocated but stored to the long-living storage. I agree, the problem is in design, the suggested wrapper is an intermediate solution, it was just designed to replace the existing ugly fixes.

I even question whether we need a "maybe owning" smart pointer, or whether this is just an indication that the underlying data structure is *wrong*. The idea of "maybe" and "owning" going to gether, in any sense, seems flat out broken to me from a design perspective,

I absolutely agree with this.

and so I'm not enthused about providing abstractions that make it easier to build systems with unclear ownership semantics.

We have a number of APIs that use this model, without abstractions. To
the extent that providing the abstraction encourages the model, it's
dangerous. But if it helps to clarify the code that's already there and
facilitates a transition to better models, then it might be worthwhile
anyway.

I think the idea is something like the compromise I suggested [1] as an
alternative to misusing `std::unique_ptr<>`. But it's even better to
design away "maybe-owning pointers" entirely.

[1]: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20140811/230727.html

+1

--renato

Ping!

Suggested is a wrapper over a raw pointer that is intended for freeing
wrapped memory at the end of wrappers lifetime if ownership of a raw
pointer was not taken away during the lifetime of the wrapper.
The main difference from unique_ptr is an ability to access the wrapped
pointer after the ownership is transferred.
To make the ownership clearer the wrapper is designed for local-scope
usage only.

I'm still concerned this isn't the right direction, and that we instead
want a more formal "maybe owning pointer". The specific use case you've
designed for here, in my experience, doesn't seem to come up often enough
to justify a custom ADT - but the more general tool of sometimes-owning
smart pointer seems common enough (& problematic enough) to warrant a
generic data structure (and such a structure would also be appliable to the
TGParser use case where this came up).

I'd love to hear some more opinions, but maybe we're not going to get
them...

I strongly agree that the example here isn't compelling.

I think it is a very fundamental design problem when there is a need for a
pointer value after it has been deallocated...

The use case here wasn't so much needing the pointer value after its been
deallocated, but after ownership has been passed.

Essentially:

  auto p = make_unique<T>(...);
  addThing(std::move(p));
  /* do more things with *p here... */

so you end up writing code like this:

  auto pOwner = make_unique...
  auto p = pOwner.get();
  addThing(std::move(pOwner)
  /* do things with *p here... */

Or you have the opposite:

  std::unique_ptr<T> owner;
  T *t;
  if (x)
    t = &defaultT;
  else {
    owner = make_unique...
    t = owner.get();
  }
  /* stuff with t here... */

I even question whether we need a "maybe owning" smart pointer, or whether
this is just an indication that the underlying data structure is *wrong*.
The idea of "maybe" and "owning" going to gether, in any sense, seems flat
out broken to me from a design perspective, and so I'm not enthused about
providing abstractions that make it easier to build systems with unclear
ownership semantics.

Yep - this is the big open question for me & the dialog I really want to
have about this (& why I've put off writing a maybe owning smart pointer so
far). As I transform things to unique_ptr I tend to punt on any of the
'hard' ones, and this is one of them - so the ownership semantics
non-unique_ptr-ified these days are becoming more and more these leftover
cases.

The clearest example of such an API is something like Module... Duncan
linked to a thread on that issue.

But I see a lot of it in Clang, simply grepping for "Owns" comesup with the
following boolean members:

CodeGenAction::OwnsVMContext
ASTReader::OwnsDeserializationListener
Diagnostic::OwnsDiagClient
Preprocessor::OwnsHeaderSearch
TokenLexer::OwnsTokens
Action::OwnsInputs (this ones trickier - it's a boolean that indicates
whether all the elements of a vector<T*> are owned or unowned)
ASTUnit::OwnsRemappedFileBuffers
VerifyDiagnosticConsumer::OwnsPrimaryClient
TextDiagnosticPrinter::OwnsOutputStream
FixItRewriter::OwnsClient
Tooling::OwnsAction

Some in LLVM:

circular_raw_ostream::OwnsStream
Arg::OwnsValues (another tricky one with a bool flag and a vector of raw
pointers, if I recall correctly)

And a couple that I changed {T*, bool} to {T*, unique_ptr<T>}:

LogDiagnosticPrinter::StreamOwner
ASTUnit::ComputedPreamble::Owner

>
> I even question whether we need a "maybe owning" smart pointer, or
whether this is just an indication that the underlying data structure is
*wrong*. The idea of "maybe" and "owning" going to gether, in any sense,
seems flat out broken to me from a design perspective,

I absolutely agree with this.

I'm uncertain about this, but willing to believe it's a bad idea. I've been
looking rather narrowly at code when I do unique_ptr cleanup and sometimes
don't have the context (or the time/inclination to build the context) to
consider larger design changes that might remove the need for this semantic.

> and so I'm not enthused about providing abstractions that make it easier
to build systems with unclear ownership semantics.

We have a number of APIs that use this model, without abstractions. To
the extent that providing the abstraction encourages the model, it's
dangerous. But if it helps to clarify the code that's already there and
facilitates a transition to better models, then it might be worthwhile
anyway.

Agreed. If we come to the conclusion that this idiom is wrong, I might
still be happy to migrate these existing APIs (listed 10-15 I could find
with a quick search) under the notion that this device is still a sign that
something cunning is going on and we'd rather remove the cunning if/when
possible.

Alternatively we decide this is just totally bad ownership design, push
back against it & come up with some relatively regular refactoring I can
make when I come across these - and I'll just apply that and remove the
semantic that way, up-front, rather than migrating to a stop-gap.

- David

Thanks for the feedback!

Ping!

Suggested is a wrapper over a raw pointer that is intended for freeing
wrapped memory at the end of wrappers lifetime if ownership of a raw
pointer was not taken away during the lifetime of the wrapper.
The main difference from unique_ptr is an ability to access the wrapped
pointer after the ownership is transferred.
To make the ownership clearer the wrapper is designed for local-scope
usage only.

I'm still concerned this isn't the right direction, and that we instead
want a more formal "maybe owning pointer". The specific use case you've
designed for here, in my experience, doesn't seem to come up often enough
to justify a custom ADT - but the more general tool of sometimes-owning
smart pointer seems common enough (& problematic enough) to warrant a
generic data structure (and such a structure would also be appliable to the
TGParser use case where this came up).

  David, could you, please, clarify the concept of the "maybe owning
pointer"?

See my reply to Chandler with a list of classes that hold {T*, bool}
members where the bool indicates whether the T* needs to be deleted or not.
My original goal here was to provide an API to make those more robust (more
precisely: to remove the need to call "release()" and allow us to stay in a
clear(er) owning domain).

[+cfe-dev]

This conversation has already been happening on llvm-dev so there’s no good way for me to capture the entire existing discussion (so I’m jumping you in part-way) & the subject line could be more descriptive, but I wanted to add Clang developers since many of the interesting cases of conditional ownership I’ve seen were in Clang.

I know some of you are also on llvm-dev but not active readers, so it might be worth using this as a jumping off point to go & find the full llvm-dev thread, read that and, when replying, add cfe-dev.

If anyone not on llvm-dev wants some more context there’s the email archive here ( http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/thread.html#77136 ) and/or I’m happy to provide more context/summary myself.