GCC's "Temporaries May Vanish Before You Expect"

Throughout Clang we have a dangerous little helper method that looks like this:

  template<class T> const T *getAs() const {
    if (isa<T>(this))
      return static_cast<const T*>(this);
    return 0;
  }

or sometimes, more simply

  template<class T> const T *getAs() const {
    return dyn_cast<T>(this);
  }

The problem? This method is very dangerous to use with value objects, since you could accidentally do this:

CFGStmt *S = getTemporaryObject().getAs<CFGStmt>();

And indeed we've found two such cases in the last week where we've actually done this in the analyzer (the value object being CFGElement).

I'm half-inclined to remove 'getAs' from CFGElement, but it is useful when you are, say, iterating over a set of known-valid objects. It's the difference between

  I->getAs<CFGStmt>()

and

  dyn_cast<CFGStmt>(&*I)

, the latter of which is slightly more ugly.

The thing that keeps me from removing it is that it IS more convenient and it CAN be made safe. The first way is to use C++11 to forbid the use of this function on rvalues; that looks like this:

  template<class T> const T *getAs() const && = delete;
  template<class T> const T *getAs() const & {
    return dyn_cast<T>(this);
  }

This could be hidden behind a compatibility macro:

  template<class T> const T *getAs() const LLVM_RVALUE_DELETE;
  template<class T> const T *getAs() const LLVM_LVALUE {
    return dyn_cast<T>(this);
  }

The second way is if -fcatch-undefined-behavior could be augmented to handle this case. I have no idea how to do this, though -- perhaps it requires something ASan-ish to poison the temporary memory once its lifetime ends.

Any chance of either of these happening on trunk?
Thanks,
Jordan

Throughout Clang we have a dangerous little helper method that looks like this:

  template<class T> const T *getAs() const {
    if (isa<T>(this))
      return static_cast<const T*>(this);
    return 0;
  }

or sometimes, more simply

  template<class T> const T *getAs() const {
    return dyn_cast<T>(this);
  }

The problem? This method is very dangerous to use with value objects, since you could accidentally do this:

CFGStmt *S = getTemporaryObject().getAs<CFGStmt>();

And indeed we've found two such cases in the last week where we've actually done this in the analyzer (the value object being CFGElement).

I'm half-inclined to remove 'getAs' from CFGElement, but it is useful when you are, say, iterating over a set of known-valid objects. It's the difference between

  I->getAs<CFGStmt>()

and

  dyn_cast<CFGStmt>(&*I)

, the latter of which is slightly more ugly.

The thing that keeps me from removing it is that it IS more convenient and it CAN be made safe. The first way is to use C++11 to forbid the use of this function on rvalues; that looks like this:

  template<class T> const T *getAs() const && = delete;
  template<class T> const T *getAs() const & {
    return dyn_cast<T>(this);
  }

Indeed, my thoughts precisely. [I wonder at what point it'll be worth
moving default development to C++11 and just keeping some bots around
for C++98 back compat testing.]

This could be hidden behind a compatibility macro:

  template<class T> const T *getAs() const LLVM_RVALUE_DELETE;
  template<class T> const T *getAs() const LLVM_LVALUE {
    return dyn_cast<T>(this);
  }

What do you envision those macros being defined to in C++98? Wouldn't
you end up with errors relating to redeclaration of class members if
they expanded to nothing in C++98 builds.

Seems to me, unfortunately, like we'd have to #ifdef the extra
declaration entirely, or put the declaration inside a function-like
macro if possible, etc)

The second way is if -fcatch-undefined-behavior could be augmented to handle this case. I have no idea how to do this, though -- perhaps it requires something ASan-ish to poison the temporary memory once its lifetime ends.

Any chance of either of these happening on trunk?

+Richard Smith for the undefined behavior checking possibilities.

The thing that keeps me from removing it is that it IS more convenient and it CAN be made safe. The first way is to use C++11 to forbid the use of this function on rvalues; that looks like this:

template<class T> const T *getAs() const && = delete;
template<class T> const T *getAs() const & {
   return dyn_cast<T>(this);
}

Indeed, my thoughts precisely. [I wonder at what point it'll be worth
moving default development to C++11 and just keeping some bots around
for C++98 back compat testing.]

This could be hidden behind a compatibility macro:

template<class T> const T *getAs() const LLVM_RVALUE_DELETE;
template<class T> const T *getAs() const LLVM_LVALUE {
   return dyn_cast<T>(this);
}

What do you envision those macros being defined to in C++98? Wouldn't
you end up with errors relating to redeclaration of class members if
they expanded to nothing in C++98 builds.

I guess I should have tried it first. I forgot you can't redeclare class members. :frowning:

Seems to me, unfortunately, like we'd have to #ifdef the extra
declaration entirely, or put the declaration inside a function-like
macro if possible, etc)

Yes, that leaves something ugly like this:

LLVM_LVALUE_ONLY(template<class T> const T *getAs() const) {
}

…and I don't even know how that would handle multiple template arguments. Bleah.

The thing that keeps me from removing it is that it IS more convenient and it CAN be made safe. The first way is to use C++11 to forbid the use of this function on rvalues; that looks like this:

template<class T> const T *getAs() const && = delete;
template<class T> const T *getAs() const & {
   return dyn_cast<T>(this);
}

Indeed, my thoughts precisely. [I wonder at what point it'll be worth
moving default development to C++11 and just keeping some bots around
for C++98 back compat testing.]

This could be hidden behind a compatibility macro:

template<class T> const T *getAs() const LLVM_RVALUE_DELETE;
template<class T> const T *getAs() const LLVM_LVALUE {
   return dyn_cast<T>(this);
}

What do you envision those macros being defined to in C++98? Wouldn't
you end up with errors relating to redeclaration of class members if
they expanded to nothing in C++98 builds.

I guess I should have tried it first. I forgot you can't redeclare class members. :frowning:

Seems to me, unfortunately, like we'd have to #ifdef the extra
declaration entirely, or put the declaration inside a function-like
macro if possible, etc)

Yes, that leaves something ugly like this:

LLVM_LVALUE_ONLY(template<class T> const T *getAs() const) {
}

…and I don't even know how that would handle multiple template arguments. Bleah.

Yeah, given the infrequency of this particular idiom (we only have a
couple of these functions, right?) I think it'd still be worth adding
the extra declaration wrapped in an ifdef (& the LLVM_LVALUE could
still be used for the main declaration/definition) to catch these.

I've got a few extra buildbots to bring up at some point (or to
mix/match with the existing ones) a build-clang(&llvm)-as-C++11 will
be one of them.

The thing that keeps me from removing it is that it IS more convenient and it CAN be made safe. The first way is to use C++11 to forbid the use of this function on rvalues; that looks like this:

template const T *getAs() const && = delete;
template const T *getAs() const & {
return dyn_cast(this);
}

Indeed, my thoughts precisely. [I wonder at what point it’ll be worth moving default development to C++11 and just keeping some bots around for C++98 back compat testing.]

This could be hidden behind a compatibility macro:

template const T *getAs() const LLVM_RVALUE_DELETE;
template const T *getAs() const LLVM_LVALUE {
return dyn_cast(this);
}

What do you envision those macros being defined to in C++98? Wouldn’t
you end up with errors relating to redeclaration of class members if
they expanded to nothing in C++98 builds.

I guess I should have tried it first. I forgot you can’t redeclare class members. :frowning:

#define LLVM_LVALUE
#define LLVM_RVALUE_DELETE volatile LLVM_DELETED_FUNCTION

Still quite ugly, though.

The second way is if -fcatch-undefined-behavior could be augmented to handle this case. I have no idea how to do this, though – perhaps it requires something ASan-ish to poison the temporary memory once its lifetime ends.

Any chance of either of these happening on trunk?

We’d want to wait until the end of the storage duration rather than the end of the lifetime of the object, but yes, we could do that (for instance, we could emit llvm lifetime intrinsics for all local variables, and teach ASan to poison memory based on them).

Well, I’m not entirely sure what you mean by this, but AFAICT the issue is that GCC thinks the storage duration ends sooner than we think it does. In this test case (with a recent Xcode clang), we’re definitely calling the destructor before we print the field.

#include

struct A {
int x;
const A *get() const { return this; }
~A() { std::cout << “bye\n”; }
};

int main() {
const A *a = A().get();
std::cout << a->x << “\n”;
}

The issue is that GCC is more aggressive about reusing storage than we are. We’re not miscomputing the storage duration somewhere, we just don’t bother to model it in the IR at all.

I see. That makes sense. I think what I meant to say was that we don’t have to start modeling the storage duration if we just invalidate the object at the end of its lifetime, which we already do correctly. I’m not as worried about this happening for local variables as I am for temporaries, but of course, yes, the same thing could happen.

Does that mean that:

doSomething(StrRef.str().c_str());

is unsafe?

because I've seen a couple of those around.

  I->getAs<CFGStmt>()

and

  dyn_cast<CFGStmt>(&*I)

, the latter of which is slightly more ugly.

Would it be possible to make dyn_cast a bit smarter so that it will
look into an iterator's member typedefs and automatically do the
trivial &* stuff itself?

--Sean Silva

Does that mean that:

doSomething(StrRef.str().c_str());

is unsafe?

because I've seen a couple of those around.

No, that's fine. Temporaries are destroyed "at the end of a full-expression", which is basically "the outermost expression". So the temporary will stay live until the call to doSomething completes.

I->getAs<CFGStmt>()

and

dyn_cast<CFGStmt>(&*I)

, the latter of which is slightly more ugly.

Would it be possible to make dyn_cast a bit smarter so that it will
look into an iterator's member typedefs and automatically do the
trivial &* stuff itself?

In theory, yes. In practice, I'm not sure if we want simplify_type to be that smart. I haven't thought it through, though.

No, that's fine. Temporaries are destroyed "at the end of a full-expression", which is basically "the outermost expression". So the temporary will stay live until the call to doSomething completes.

Oh, ok, that's what I though. Looking at the OP a bit closer, I see
now that the problem isn't that temporary per se, it is stashing the
pointer to it in S.

Would it be possible to make dyn_cast a bit smarter so that it will
look into an iterator's member typedefs and automatically do the
trivial &* stuff itself?

In theory, yes. In practice, I'm not sure if we want simplify_type to be that smart. I haven't thought it through, though.

The discussion pretty quickly veered into macro/#ifdef territory as a
means to solve the problem, perhaps this would be the least bad
workaround.

--Sean Silva

This is an important (and planned) feature. The goal is to get ASan to
catch the bugs, and then Clang to emit the lifetime markers. We want them
because Nadav is implementing exactly the optimization that GCC benefits
from for Clang+LLVM.

Yes, this is http://code.google.com/p/address-sanitizer/issues/detail?id=83 and we hope to get to it in Oct.

–kcc

However, doSomething should not save char * anywhere.