TargetRegistry and MC object ownership.

Hi All,

While trying to put together an MC-based assembler the other day, I again encountered MC’s non-obvious memory ownership rules. The most egregious example of these is MCObjectStreamer’s destructor:

MCObjectStreamer::~MCObjectStreamer() {
delete &Assembler->getBackend();
delete &Assembler->getEmitter();
delete &Assembler->getWriter();
delete Assembler;
}

In the depths of a fever from a head-cold, I snapped. I’ve been hacking MC to convert these raw pointers (and worse: references!) to unique_ptrs (apologies to people whose backbends I’ve broken), but I hit a big blocker when I get to Target in “llvm/Support/TargetRegistry.h”.

Target vends MC objects by calling registered ctor functions. E.g.:

MCAsmBackend *createMCAsmBackend(const MCRegisterInfo &MRI,

StringRef TheTriple, StringRef CPU,
const MCTargetOptions &Options) const {
if (!MCAsmBackendCtorFn)
return nullptr;
return MCAsmBackendCtorFn(*this, MRI, Triple(TheTriple), CPU, Options);
}

The callee owns the resulting object so ideally we would return a unique_ptr, but to do this we’d need access to the definition of MCAsmBackend which we can’t get without violating layering. (We need the definition because unique_ptr can notionally call MCAsmBackend’s destructor, though we know it won’t here as the whole point is to hand ownership back to the caller).

There are three approaches I can think of to improve things:

(1) Keep the raw pointers, but document very clearly, recommending the caller stash the result in a unique_ptr immediately.

(2) Have the ctor functions take a unique_ptr* as their first argument and store the result there (thanks to Dave Blaikie for this suggestion). Passing the unique_ptrs by pointer means we don’t need a definition for T. Usage looks like:

std::unique_ptr MAB;
Target.createMCAsmBackend(&MAB, TheTriple, CPU, Options);

(3) Introduce a ‘transient_unique_ptr’ type that never calls T’s destructor, and whose purpose is to pass off ownership to a real unique_ptr:

template
class transient_unique_ptr {
public:
transient_unique_ptr(std::unique_ptr InVal) : Val(InVal.release()) {}
transient_unique_ptr(transient_unique_ptr &&Other) : Val(Other.Val) { Other.Val = nullptr; }
transient_unique_ptr&& operator=(transient_unique_ptr &&Other) { std::swap(Val, Other.Val); return *this; }
~transient_unique_ptr() { assert(!Val && “value should have been handed off”); }
std::unique_ptr take() { auto Tmp = Val; Val = nullptr; return unique_ptr(Val); }
private:
T *Val = nullptr;
};

This type can also operate without needing a T definition. Usage looks like:

auto MAB = Target.createMCAsmBackend(TheTriple, CPU, Options).take();

Option (1) is a minimum. Option (2) and (3) are both compromises to deal with layering rules, but of those I think (2) probably comes closest to adhering to the principle of least surprise.

Who has thoughts? If we can settle on an approach I’d love to press forward with the cleanup.

Cheers,
Lang.

Hi All,

While trying to put together an MC-based assembler the other day, I again encountered MC’s non-obvious memory ownership rules. The most egregious example of these is MCObjectStreamer’s destructor:

MCObjectStreamer::~MCObjectStreamer() {
delete &Assembler->getBackend();
delete &Assembler->getEmitter();
delete &Assembler->getWriter();
delete Assembler;
}

In the depths of a fever from a head-cold, I snapped. I’ve been hacking MC to convert these raw pointers (and worse: references!) to unique_ptrs (apologies to people whose backbends I’ve broken), but I hit a big blocker when I get to Target in “llvm/Support/TargetRegistry.h”.

Target vends MC objects by calling registered ctor functions. E.g.:

MCAsmBackend *createMCAsmBackend(const MCRegisterInfo &MRI,

StringRef TheTriple, StringRef CPU,
const MCTargetOptions &Options) const {
if (!MCAsmBackendCtorFn)
return nullptr;
return MCAsmBackendCtorFn(*this, MRI, Triple(TheTriple), CPU, Options);
}

The callee owns the resulting object so ideally we would return a unique_ptr, but to do this we’d need access to the definition of MCAsmBackend which we can’t get without violating layering. (We need the definition because unique_ptr can notionally call MCAsmBackend’s destructor, though we know it won’t here as the whole point is to hand ownership back to the caller).

I realize your goals are somewhat more immediate, but I wouldn’t mind better understanding the layering here and what might be a good end goal to know how much work it is, where people should spend it if they want to move towards that goal, etc. Any ideas how this should be designed to have good layering?

There are three approaches I can think of to improve things:

(1) Keep the raw pointers, but document very clearly, recommending the caller stash the result in a unique_ptr immediately.

(2) Have the ctor functions take a unique_ptr* as their first argument and store the result there (thanks to Dave Blaikie for this suggestion). Passing the unique_ptrs by pointer means we don’t need a definition for T. Usage looks like:

std::unique_ptr MAB;
Target.createMCAsmBackend(&MAB, TheTriple, CPU, Options);

This could be a reference rather than a pointer, FWIW - and while the callsite might look a bit wonky, no one could readily misuse it and anyone who went looking to fix it would quickly find the comments explaining why, or if they ignored it, would hit the layering issue we know about already.

(3) Introduce a ‘transient_unique_ptr’ type that never calls T’s destructor, and whose purpose is to pass off ownership to a real unique_ptr:

template
class transient_unique_ptr {
public:
transient_unique_ptr(std::unique_ptr InVal) : Val(InVal.release()) {}
transient_unique_ptr(transient_unique_ptr &&Other) : Val(Other.Val) { Other.Val = nullptr; }
transient_unique_ptr&& operator=(transient_unique_ptr &&Other) { std::swap(Val, Other.Val); return *this; }
~transient_unique_ptr() { assert(!Val && “value should have been handed off”); }
std::unique_ptr take() { auto Tmp = Val; Val = nullptr; return unique_ptr(Val); }
private:
T *Val = nullptr;
};

This type can also operate without needing a T definition. Usage looks like:

auto MAB = Target.createMCAsmBackend(TheTriple, CPU, Options).take();

The risk here is that it would be easy to miss the ‘take()’ call (& auto especially makes that risky - if it weren’t for auto we could use unnameable type tricks to make it obvious that someone shouldn’t make a local variable (that wouldn’t help if template argument deduction were used like: func(createMCAsmBackend(…)); ))

Option (1) is a minimum. Option (2) and (3) are both compromises to deal with layering rules, but of those I think (2) probably comes closest to adhering to the principle of least surprise.

Yeah - I think people will find (2) quirky but doesn’t mean building whole new constructs (that might end up with more usage than we’d like) for these kinds of layering problems. It’s where I’d lean.

Lang Hames via llvm-dev <llvm-dev@lists.llvm.org> writes:

Hi All,

While trying to put together an MC-based assembler the other day, I again
encountered MC's non-obvious memory ownership rules. The most egregious
example of these is MCObjectStreamer's destructor:

MCObjectStreamer::~MCObjectStreamer() {
  delete &Assembler->getBackend();
  delete &Assembler->getEmitter();
  delete &Assembler->getWriter();
  delete Assembler;
}

In the depths of a fever from a head-cold, I snapped. I've been hacking MC
to convert these raw pointers (and worse: references!) to unique_ptrs
(apologies to people whose backbends I've broken), but I hit a big blocker
when I get to Target in "llvm/Support/TargetRegistry.h".

Target vends MC objects by calling registered ctor functions. E.g.:

MCAsmBackend *createMCAsmBackend(const MCRegisterInfo &MRI,
                                 StringRef TheTriple, StringRef CPU,
                                 const MCTargetOptions &Options) const {
  if (!MCAsmBackendCtorFn)
    return nullptr;
  return MCAsmBackendCtorFn(*this, MRI, Triple(TheTriple), CPU, Options);
}

The callee owns the resulting object so ideally we would return a
unique_ptr<MCAsmBackend>, but to do this we'd need access to the definition
of MCAsmBackend which we can't get without violating layering. (We need the
definition because unique_ptr<MCAsmBackend> can notionally call
MCAsmBackend's destructor, though we know it won't here as the whole point
is to hand ownership back to the caller).

There are three approaches I can think of to improve things:

(1) Keep the raw pointers, but document *very* clearly, recommending the
caller stash the result in a unique_ptr immediately.

(2) Have the ctor functions take a unique_ptr<T>* as their first argument
and store the result there (thanks to Dave Blaikie for this suggestion).
Passing the unique_ptrs by pointer means we don't need a definition for T.
Usage looks like:

std::unique_ptr<MCAsmBackend> MAB;
Target.createMCAsmBackend(&MAB, TheTriple, CPU, Options);

(3) Introduce a 'transient_unique_ptr<T>' type that never calls T's
destructor, and whose purpose is to pass off ownership to a real unique_ptr:

template <typename T>
class transient_unique_ptr {
public:
  transient_unique_ptr(std::unique_ptr<T> InVal) : Val(InVal.release()) {}
  transient_unique_ptr(transient_unique_ptr<T> &&Other) :
Val(Other.Val) { Other.Val
= nullptr; }
  transient_unique_ptr&& operator=(transient_unique_ptr<T> &&Other) {
std::swap(Val,
Other.Val); return *this; }
  ~transient_unique_ptr() { assert(!Val && "value should have been handed
off"); }
  std::unique_ptr<T> take() { auto Tmp = Val; Val = nullptr; return
unique_ptr<T>(Val); }
private:
  T *Val = nullptr;
};

(4) Move target registration out of Support so that it can include the
required definitions from MC.

Cheers,
Rafael

(4) Move target registration out of Support so that it can include the
required definitions from MC.

Yep – left that one out. Chalk that up to the fever. This seems like an ideal solution if we can do it.

– Lang.

I realize your goals are somewhat more immediate…

There’s a lot of useful cleanup that can be done behind this interface so this isn’t immediately blocking anything, but I want to start the discussion early so it doesn’t become a blocker.

The risk here is that it would be easy to miss the ‘take()’ call (& auto especially makes that risky…

In transient_unique_ptr’s defense that’s really only going to shift your compile-time error a few lines: with no dereference operator and no implicit conversion you’ll find out MAB isn’t a unique_ptr really quickly. :slight_smile:

I still like “fix the layering” the most, but don’t have a good intuition for the impact of that yet. Failing that (2) seems fine to me.

– Lang.

Lang Hames <lhames@gmail.com> writes:

(4) Move target registration out of Support so that it can include the
required definitions from MC.

Yep -- left that one out. Chalk that up to the fever. This seems like an
ideal solution if we can do it.

Yes, that would be my favorite too. I don't know the history of why it
is in Support.

BTW, thanks for changes adding std::unique_ptr.

Cheers,
Rafael