Use of Smart Pointers in LLVM Projects

There seems to be some uncertainty about the use of smart pointers
(previously OwningPtr, now std::unique_ptr and std::shared_ptr
predominantly) in the LLVM project as a whole, so here's a thread to
discuss/clarify/etc the project preferences/direction with regard to
smart pointer usage.

For some context, see discussions in LLVM r212403 and Clang r213307.

The basic question here seems to be whether smart pointer ownership is
a direction we want to take the LLVM project in general.

I've seen others contribute and have myself contributed many patches
moving towards smart pointer ownership (both in the pre-C++11 days of
OwningPtr, and much moreso in the post-C++11 world with
std::unique_ptr and std::shared_ptr being usable inside containers, as
return values, etc, allowing many more opportunities).

std::unique_ptr's been used in LLD as far back as r153620.
std::unique_ptr appeared in LLVM shortly after the C++11 switch with
Ahmed's work to migrate the project from OwningPtr to std::unique_ptr
(starting with r202609 and ending with r211259). Originally OwningPtr
was added in r45261.
Something in the order of 60 changes across clang and LLVM mention
unique_ptr in their subject and migrate various APIs to use unique_ptr
for ownership. Many of which remove uses of explicit delete or helpers
like DeleteContainerPointers (and removing explicit dtors in many of
those cases).

Are people OK with/prefer the use of owning smart pointers in APIs?
Are there places where you've found them to be too noisy/burdensome
and would rather use raw pointers or some other abstraction? Would you
prefer pre-commit review of such changes to adequately consider
alternatives (such as?)?

Thanks,
- David

Where unique_ptr fits the memory ownership model I’m all for using it. The syntactic overhead is a small price to pay for self-documentation and compile-time guarantees.

What are the objections?

Cheers,
Lang.

Are people OK with/prefer the use of owning smart pointers in APIs?

I think smart pointers are great to use for storage, or as struct members where there's actually a clear need for ownership. Just by virtue of getting rid of destructors containing single delete expressions this is already a win.

In many cases there shouldn't be a need for smart pointers at all, because the object is owned by a central facility's smart pointer so it's safe to just pass around the pointer in ephemeral contexts like stack-based instances. In such context raw pointers and references aren't scary -- they're just fine.

(All the above goes equally for uniquely owned pointers, shared pointers and intrusive refcounted pointers.)

Are there places where you've found them to be too noisy/burdensome
and would rather use raw pointers or some other abstraction?

A clear place that smart pointers *shouldn't* be used are 'create' functions.

There's already a strong convention in the thousands of 'create' functions across the codebase and they have the clear, well-defined behaviour of returning a new pointer.

We all know what to expect from these functions, much like new/delete in C++ and malloc/free. They can be easily adopted by std::shared_ptr or std::unique_ptr as needed, so there's no benefit in enforcing smart pointer usage at creation.

Apart from the complexity and visual clutter, it's pretty annoying to have to decide whether to get(), release() or somehow else acquire the object you just created. The result of using smart pointers as returns isn't pretty -- you end up with awkward code like createFoo().get()->func() or other.reset(createFoo().release()).

Transfer of new pointers from 'create' functions and usage is already trivially clear so we shouldn't change these.

Would you
prefer pre-commit review of such changes to adequately consider
alternatives (such as?)?

The clear-cut two use-cases above are reasonable for post-commit review, certainly in the modules I'm helping to maintain.

The creation functions shouldn't be changed because of the reasons given above, and I think it makes sense to put that in policy rather than making the decision on a per-module basis.

Thanks for evaluating this David!

Are people OK with/prefer the use of owning smart pointers in APIs?

I think smart pointers are great to use for storage, or as struct members
where there's actually a clear need for ownership. Just by virtue of getting
rid of destructors containing single delete expressions this is already a
win.

Great!

In many cases there shouldn't be a need for smart pointers at all, because
the object is owned by a central facility's smart pointer so it's safe to
just pass around the pointer in ephemeral contexts like stack-based
instances. In such context raw pointers and references aren't scary --
they're just fine.

There's no contention there - if the pointer is non-owning, there's no
smart pointer to use. Indeed using a smart pointer for a non-owning
pointer would actually break the code by causing double deletes, etc.

Well, shared_ptr notwithstanding - but, yes, choosing between
unique_ptr + non-owning pointers and shared_ptr can sometimes be
non-obvious, but I'm hopeful we generally agree that if there is a
dominating owner they can be given exclusive ownership through a
unique_ptr and everyone else can use raw pointers.

(All the above goes equally for uniquely owned pointers, shared pointers and
intrusive refcounted pointers.)

Are there places where you've found them to be too noisy/burdensome
and would rather use raw pointers or some other abstraction?

A clear place that smart pointers *shouldn't* be used are 'create'
functions.

There's already a strong convention in the thousands of 'create' functions
across the codebase and they have the clear, well-defined behaviour of
returning a new pointer.

We all know what to expect from these functions, much like new/delete in C++
and malloc/free. They can be easily adopted by std::shared_ptr or
std::unique_ptr as needed,

They'd actually only ever need to return std::unique_ptr.
std::shared_ptrs can be constructed from xvalue std::unique_ptrs
without any overhead:

std::unique_ptr<T> create();
std::shared_ptr<T> x = create();

(make_shared has a small allocation (and destruction) benefit that
might sometimes be useful to exploit, of course, but you don't benefit
from that with raw new either, so unique_ptr is no /worse/ off than a
raw implicitly-owning pointer in that regard)

so there's no benefit in enforcing smart pointer usage at creation.

I believe there is. Like const member functions it helps ensure that
ownership is transferred clearly. It's not hard to write bugs like:

T *t = create();
...
if (error)
  return; // leak

set_thing(t); // transfer ownership here

By making create return a std::unique_ptr, it discourages these kinds
of misuses.

Apart from the complexity and visual clutter, it's pretty annoying to have
to decide whether to get(), release() or somehow else acquire the object you
just created.

That generally shouldn't be complicated unless the consumer has weird
ownership semantics, in which case that's worth highlighting.

The result of using smart pointers as returns isn't pretty --
you end up with awkward code like createFoo().get()->func()

This isn't an expression you should need to write, it'd simply be
"createFoo()->func()".

> other.reset(createFoo().release()).

This is also an expression you shouldn't need to write, it'd simply be
"other = createFoo()". Indeed the alternative here (if createFoo
returns a raw pointer) is "other.reset(createFoo())" which is, imho,
less obvious than simple assignment.

Transfer of new pointers from 'create' functions and usage is already
trivially clear so we shouldn't change these.

The need to use reset rather than assignment and the ease with which
ownership can become uncertain once you leave the create function
don't seem to me to be ideal outcomes.

Ultimately, by moving to a world in which objects are most often
allocated with make_unique (or make_shared) means things like reset,
release, raw new and delete can be given extra scrutiny and memory
management bugs can be better avoided.

For myself, it also just helps me follow/understand code when I can
see ownership explicit in the code rather than having to trace back
through the pointer assignments and function calls.

- David

Hi Alp,

A clear place that smart pointers *shouldn't* be used are 'create'

functions.

There's already a strong convention in the thousands of 'create' functions
across the codebase and they have the clear, well-defined behaviour of
returning a new pointer.

We all know what to expect from these functions, much like new/delete in
C++ and malloc/free. They can be easily adopted by std::shared_ptr or
std::unique_ptr as needed, so there's no benefit in enforcing smart pointer
usage at creation.

I strongly disagree with this: If knowing that new'ed things must be
deleted was enough we would never have to worry about memory leaks. It's a
huge benefit to have the compiler catch our mistakes for us - it's one less
thing to worry about.

Apart from the complexity and visual clutter, it's pretty annoying to have
to decide whether to get(), release() or somehow else acquire the object
you just created. The result of using smart pointers as returns isn't
pretty -- you end up with awkward code like createFoo().get()->func() or
other.reset(createFoo().release()).

I think if you were ever writing createFoo()->func() before you were
leaking memory. And actually you can now safely use the simpler syntax with
std::unique_ptr: createFoo()->func() will work fine, there's no need for
the .get().

Ditto other.reset(createFoo.release()). I think this would just be: other =
createFoo(), (or a = std::move(b);).

There may be occasions where you need to call .get() to pass a pointer as
an argument to a function that isn't taking ownership, but that doesn't
seem odious to me.

Cheers,
Lang.

Hi Alp,

    A clear place that smart pointers *shouldn't* be used are 'create'
    functions.

    There's already a strong convention in the thousands of 'create'
    functions across the codebase and they have the clear,
    well-defined behaviour of returning a new pointer.

    We all know what to expect from these functions, much like
    new/delete in C++ and malloc/free. They can be easily adopted by
    std::shared_ptr or std::unique_ptr as needed, so there's no
    benefit in enforcing smart pointer usage at creation.

I strongly disagree with this: If knowing that new'ed things must be deleted was enough we would never have to worry about memory leaks. It's a huge benefit to have the compiler catch our mistakes for us - it's one less thing to worry about.

I don't remember the last time we had a leak because of a 'create' function. Tricky ownership is going to continue to be tricky either way so I'd tend towards keeping the status quo for these functions.

The bad patterns seem come about mostly when smart pointers are used as return values. We already avoid returning structures by value, so perhaps any policy can be expressed as a continuation of that?

    Apart from the complexity and visual clutter, it's pretty annoying
    to have to decide whether to get(), release() or somehow else
    acquire the object you just created. The result of using smart
    pointers as returns isn't pretty -- you end up with awkward code
    like createFoo().get()->func() or other.reset(createFoo().release()).

Creating an object for single use and having it destroyed implicitly doesn't sound like a great pattern. I almost wonder if it isn't better to write the ownership explicitly in such cases:

   unique_ptr<T> t = createFoo();
   t->func();

A createFoo() statement with an unused pointer new return can be identified by leak detectors, whereas an unused smart pointer return will be created and deleted silently, which isn't that desirable. If you really want to delete an object as soon as it's created that seems to be worth keeping explicit.

I think if you were ever writing createFoo()->func() before you were leaking memory. And actually you can now safely use the simpler syntax with std::unique_ptr: createFoo()->func() will work fine, there's no need for the .get().

Ditto other.reset(createFoo.release()). I think this would just be: other = createFoo(), (or a = std::move(b);).

The nice thing about a traditional 'create' function is that there's one clear way to do that -- by assignment, instead of deciding between two or three. It should be equally safe.

Hi Alp,

    A clear place that smart pointers *shouldn't* be used are 'create'
    functions.

    There's already a strong convention in the thousands of 'create'
    functions across the codebase and they have the clear,
    well-defined behaviour of returning a new pointer.

    We all know what to expect from these functions, much like
    new/delete in C++ and malloc/free. They can be easily adopted by
    std::shared_ptr or std::unique_ptr as needed, so there's no
    benefit in enforcing smart pointer usage at creation.

I strongly disagree with this: If knowing that new'ed things must be
deleted was enough we would never have to worry about memory leaks. It's a
huge benefit to have the compiler catch our mistakes for us - it's one less
thing to worry about.

I don't remember the last time we had a leak because of a 'create' function.
Tricky ownership is going to continue to be tricky either way so I'd tend
towards keeping the status quo for these functions.

I've explained a way to make pretty simple code even simpler. It
doesn't seem like factory functions returning raw pointers is helping
us.

The bad patterns seem come about mostly when smart pointers are used as
return values.

Which bad patterns are you referring to?

We already avoid returning structures by value, so perhaps
any policy can be expressed as a continuation of that?

    Apart from the complexity and visual clutter, it's pretty annoying
    to have to decide whether to get(), release() or somehow else
    acquire the object you just created. The result of using smart
    pointers as returns isn't pretty -- you end up with awkward code
    like createFoo().get()->func() or other.reset(createFoo().release()).

Creating an object for single use and having it destroyed implicitly doesn't
sound like a great pattern. I almost wonder if it isn't better to write the
ownership explicitly in such cases:

  unique_ptr<T> t = createFoo();

This would have to be:

unique_ptr<T> t(createFoo());

if createFoo returns a raw pointer.

  t->func();

A createFoo() statement with an unused pointer new return can be identified
by leak detectors,

Seems like a bit of a stretch to detect unused variables by using a
leak checker.

whereas an unused smart pointer return will be created
and deleted silently, which isn't that desirable. If you really want to
delete an object as soon as it's created that seems to be worth keeping
explicit.

I think if you were ever writing createFoo()->func() before you were
leaking memory. And actually you can now safely use the simpler syntax with
std::unique_ptr: createFoo()->func() will work fine, there's no need for the
.get().

Ditto other.reset(createFoo.release()). I think this would just be: other
= createFoo(), (or a = std::move(b);).

The nice thing about a traditional 'create' function is that there's one
clear way to do that -- by assignment, instead of deciding between two or
three. It should be equally safe.

Actually there isn't and at least one (raw pointer) is casually unsafe:

1) std::unique_ptr<T> foo(createFoo());

2) foo.reset(createFoo());

3) T *foo = createFoo();
  ...
  delete foo;

Just for the initialization - for assignment with raw pointers, that's
where it gets scarier - if you're just using raw assignment, now
you've got two raw pointers pointing to the same object, one of which,
presumably, owns the object.

with unique_ptr, it's more like:

1) std::unique_ptr<T> foo = createFoo();

2) foo = createFoo();

and optionally: auto foo = createFoo();

- David

+1. I liked the patches that added unique_ptrs I've seen so far. WebKit
also heavily uses pointer classes (OwnPtr etc) heavily, and I think it
works well there too.

Branching off in a completely different direction, I want to point out that
there is an abstraction penalty to passing unque_ptr by value.

By it's nature, unique_ptr is not trivially copyable: it has no copy ctor
and it's move ctor and dtor are non-trivial. Therefore, it cannot be
passed in registers, or even directly on the stack on most platforms.
Typically, you end up with a hidden pointer to a temporary unique_ptr on
the caller's stack when you pass or return one.

However, ownership transfers are typically accompanied by new/delete, which
are more expensive altogether than one more pointer chase.

2c

Hi Alp,

I don't remember the last time we had a leak because of a 'create' function.

I'm not sure I follow you here. Are you suggesting that we've never leaked memory returned by a create function? I'll wager we have. I'm also certain that we've double deleted such memory, which is something unique_ptr should also prevent.

The bad patterns seem come about mostly when smart pointers are used as return values. We already avoid returning structures by value, so perhaps any policy can be expressed as a continuation of that?

Is there an official injunction against returning structs by value? Perhaps we should reconsider that in light of move semantics? In any case I think the only reason to avoid returning structs by value is the overhead of the copies, which doesn't apply to unique_ptr. Any policy regarding smart pointers use in return types should be separate from the struct usage policy.

Creating an object for single use and having it destroyed implicitly doesn't sound like a great pattern. I almost wonder if it isn't better to write the ownership explicitly in such cases:

unique_ptr<T> t = createFoo();
t->func();

A createFoo() statement with an unused pointer new return can be identified by leak detectors, whereas an unused smart pointer return will be created and deleted silently, which isn't that desirable. If you really want to delete an object as soon as it's created that seems to be worth keeping explicit.

I only mentioned it since you offered it as an example of something that would be uncomfortable to write, which it turns out not to be. As you say - this is very unlikely to come up at all. The idea of using memory leaks to diagnose dead code sounds less than ideal.

I think if you were ever writing createFoo()->func() before you were leaking memory. And actually you can now safely use the simpler syntax with std::unique_ptr: createFoo()->func() will work fine, there's no need for the .get().

Ditto other.reset(createFoo.release()). I think this would just be: other = createFoo(), (or a = std::move(b);).

The nice thing about a traditional 'create' function is that there's one clear way to do that -- by assignment, instead of deciding between two or three. It should be equally safe.

The only difference between regular pointers and smart pointers is precisely the difference you want: Smart pointers keep the ownership explicit. Consider:

Foo *A = createFoo();
Foo *B = A; // Who owns the object now?

vs

std::unique_ptr<Foo> A = createFoo();
std::unique_ptr<Foo> B = std::move(A); // Clear transfer of ownership.

David mentioned other smart pointer types (e.g. shared_ptr) in his email, and I should be clear: I'm advocating for std::unique_ptr usage where appropriate, as that's what I have experience with. I'm less familiar with other smart pointers, and happy to hear from the experts on those.

Cheers,
Lang.

What about weak_ptr?

If you need the auto-nulling behavior (usually because you've got a
non-owning, but longer-lifetime than the owner), sure. & maybe if
needed we can do something like AssertingVH (essentially a weak_ptr
that asserts on null access in debug builds and it's just a raw
pointer in release builds - a debugging aid for the cases where you
think you can use a raw pointer (shorter lifetime than the owning
pointer) but are investigating possible lifetime bugs) around
weak_ptr.

- David

+1. I think there is a lot of value in using standard library’s smart pointers over custom smart pointers or raw pointers where the callers are expected to understand the object’s lifetime. They communicate the intended semantics to users (leading to less errors), and because they are broadly used and well defined they will be more amenable to tooling both debugging and optimization advances in the future.

Louis

I don't have much to add here besides +1. I think using std::unique_ptr even for create* functions/methods is the right way to go. Reid's point about an abstraction penalty is interesting, but I don't think we do ownership transfers often enough to actually see a performance hit. (Of course, in the non-transferring case we'd just pass the pointer, not a 'const std::unique_ptr &' or anything.)

Jordan

I don't have much to add here besides +1. I think using std::unique_ptr
even for create* functions/methods is the right way to go.

+1 smart pointers here are a win in terms of safety and self-documentation.
I don't see why create* factories should be treated differently.

Eli

Sounds like we've got sufficient amount of momentum here for Dave to
go ahead and recommit. Any last objections?

-eric

No objections here. I’d be glad to see this go in. :slight_smile:

  • Lang.

A +1 from me as well. I really like having the ownership on the type
instead of a comment or naming convention. Hopefully the overhead is
never an issue as most functions do not take part in the memory
management and can take a raw pointer or a reference.

Cheers,
Rafael

Hi All,

I'd like to revisit the MCJIT debugger-registration system, as the existing system has a few flaws, some of which are seriously problematic.

The 20,000 foot overview of the existing scheme (implemented in llvm/lib/ExecutionEngine/RuntimeDyld/GDBRegistrar.cpp and friends), as I understand it, is as follows:

We have two symbols in MCJIT that act as fixed points for the debugger to latch on to:

__jit_debug_register_code is a no-op function that the debugger can set a breakpoint on. MCJIT will call this function to notify the debugger when an object file is loaded.

__jit_debug_descriptor is the head of a C linked list data structure that contains pointers to in-memory object files. The ELF/MachO headers of the in memory object files will have had their vaddrs fixed up by the JIT to point to where each of the linked sections reside in memory.

There are a couple of problems with this system: (1) Modifying object-file headers in-place violates some internal LLVM contracts. In particular, the object files may be backed by read-only memory. This has caused crashes in the JIT that have forced me to revert support for debugger registration on the MachO side (We really want to replace this on the ELF side soon too). (2) The JIT has no way of knowing whether a debugger is attached, which means keeping object files in memory even if they're not being used, just in case there an attached debugger that needs them.

We'd really like to come up with a system that doesn't have these drawbacks. That is, a system where the object files remain unmodified, and the JIT knows if/when a debugger attaches so that it can generate the relevant information on the fly.

It would be great if the debugger experts (and particularly anyone who has experience on both the debugger and the JIT side of things) could weigh in on these issues. In particular:

(1) Is there a reason we bake the vmaddrs into the object file headers, or could they just as easily be passed in a side-table so as to keep the object untouched?

(2) Is there a canonical way for the debugger to communicate to a JIT that it's interested in inspecting the JIT's output? If we're going to use breakpoints (or something like that) to signal to the debugger when objects have been linked, is it reasonable to have an API that the debugger can call in to to request the information it's looking for? If the JIT actually receives a call then it would give us a chance to lazily populate the necessary data structures.

Regards,
Lang.

Thanks for writing up the summary. We'd like to come up with a simple
scheme that addresses the problems we've found in practice with the
current interface, but isn't unnecessarily complex. The current
interface is basically fine except for the points mentioned above
which make using it somewhat of a pain. This is also the chance to
address any other pain points we may have with this interface. For
example, the current scheme is disabled on OS X at the moment in LLDB
since looking for the symbol to set the breakpoint is too slow.
Perhaps we can come up with something better here as well (I guess
this is related to point 2 above). It would be good to get some input
from people more experienced with the debugger side on that point.