RFD: ASTNode class

Several source-code analysis projects I've talked to (including one
I'm working on, include-what-you-use) have indicated they would
benefit from some way to travel 'up' the AST tree. For instance, iwyu
wants to say "If a type is used in a pointer context, it can be
forward-declared." I've implemented it by looking at the parent of
the type-node, and seeing if it's a PointerType or equivalent.

I've attached my implementation of ASTNode below, and would like to
work on getting this functionality into clang in the proper location.
A few issues:

1) I'm sure it doesn't fit much into clang style. I'm happy to
rewrite it -- or have others rewrite it -- to be more native-looking.

2) I added some state that travels across AST nodes, that was useful
to me -- notably location (Type doesn't have one, so it inherits from
its parent) and whether a node is in a forward-declare context. I
don't know how much these belong in the final implementation: location
probably, fwd-declare maybe.

3) I don't know exactly where this belongs. Right now, it requires some
machinery to hook into (also attached below). I see two options:

  a) Make it part of RecursiveASTVisitor, so when you're in a Visit*()
  routine, you have access to the ASTNode representation of the
  current node. This has costs in terms of both time and space to
  store this information, though the cost isn't massive.

  b) Make it a separate class (where would this class live?), and make
  everyone who wants to use it write the appropriate machinery.

There may be more options I haven't seen.

One thing ASTNode does is provide a union of all kinds of ast-nodes in
one place. This sounds simliar to what dgregor is doing with libclang
(though with a C interface). I wonder if some of this work can be
combined or shared.

craig

--cut here--

// ASTNode represents a single node of the AST tree. An AST node may be
// a statement, declaration, type, template-name, etc. ASTNode keeps
// track of its parent node, as we do AST traversal, allowing queries
// on the "context" of a node.
//
// We also store some state that's useful for iwyu. For instance,
// we store whether a node is in a 'forward-declarable' context
// (such as a function parameter), meaning all types seen below
// that node are legal to fowrard-declare according to c++.
class ASTNode {
public:
  // In each case, the caller owns the object, and must guarantee it
  // lives for at least as long as the ASTNode object does.
  ASTNode(const clang::Decl* decl, const clang::SourceManager& sm)
      : kind_(kDeclKind), as_decl_(decl),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::Stmt* stmt, const clang::SourceManager& sm)
      : kind_(kStmtKind), as_stmt_(stmt),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::Type* type, const clang::SourceManager& sm)
      : kind_(kTypeKind), as_type_(type),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::TypeLoc* typeloc, const clang::SourceManager& sm)
      : kind_(kTypelocKind), as_typeloc_(typeloc),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::NestedNameSpecifier* nns, const clang::SourceManager& sm)
      : kind_(kNNSKind), as_nns_(nns),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::TemplateName* template_name,
          const clang::SourceManager& sm)
      : kind_(kTemplateNameKind), as_template_name_(template_name),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::TemplateArgument* template_arg,
          const clang::SourceManager& sm)
      : kind_(kTemplateArgumentKind), as_template_arg_(template_arg),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }
  ASTNode(const clang::TemplateArgumentLoc* template_argloc,
          const clang::SourceManager& sm)
      : kind_(kTemplateArgumentLocKind), as_template_argloc_(template_argloc),
        parent_(NULL), in_fwd_decl_context_(false), source_manager_(sm) { }

  // A 'forward-declare' context means some parent of us can be
  // forward-declared, which means we can be too. e.g. in
  // MyClass<Foo>* x, Foo is fwd-declarable because MyClass<Foo> is.
  bool in_forward_declare_context() const { return in_fwd_decl_context_; }
  void set_in_forward_declare_context(bool b) { in_fwd_decl_context_ = b; }

  const ASTNode* parent() const { return parent_; }
  void SetParent(const ASTNode* parent) {
    parent_ = parent;
    if (parent) // We inherit this from parent.
      set_in_forward_declare_context(parent->in_forward_declare_context());
  }

  // The number of nodes above this node in the AST tree.
  int depth() const {
    int depth = 0;
    for (const ASTNode* node = this; node != NULL; node = node->parent_)
      depth++;
    return depth - 1; // don't count "this"
  }

  // If this node knows its location, returns it. If not, and it's
  // likely its location is very close (say, within a few lines) of
  // its parent, ask its parent. Unfortunately, there's nothing which
  // tells us whether a parent's location is very close to its child.
  // We assume that they always are (empirically this is true)
  // *except* for the case the parent is in a macro: then it often
  // happens that the parent belongs at the spelling location, while
  // the child is a macro arg and hence belongs in the instantiation
  // location. Those could be far away, even in different files. For
  // example: '#define NEW_FUNC(cls) void Func(cls* x) {}'. Func is
  // at the spelling loc, but its child Type 'cls' is at the
  // instantiation loc. In that case, or if *no* ancestor of the
  // current node knows its location, returns an invalid SourceLocation.
  clang::SourceLocation GetLocation() const {
    clang::SourceLocation retval;
    if (FillLocationIfKnown(&retval))
      return retval;

    // OK, let's ask a parent node.
    for (const ASTNode* node = parent_; node != NULL; node = node->parent_) {
      if (node->FillLocationIfKnown(&retval))
        break;
    }
    // If the parent node shows the spelling and instantiation
    // locations are in a different file, then we're uncertain of our
    // own location. Return an invalid location.
    if (retval.isValid()) {
      clang::FullSourceLoc full_loc(retval, source_manager_);
      const clang::FileEntry* spelling_file =
          source_manager_.getFileEntryForID(
              source_manager_.getFileID(full_loc.getSpellingLoc()));
      const clang::FileEntry* instantiation_file =
          source_manager_.getFileEntryForID(
              source_manager_.getFileID(full_loc.getInstantiationLoc()));
      if (spelling_file != instantiation_file)
        return clang::SourceLocation();
    }

    return retval;
  }

  // Returns true if this node points to the exact same
  // decl/typeloc/etc as the one you pass in. For Decl/Stmt/Type, the
  // pointer is canonical (every instance of type X has the same
  // clang::Type*). But for most, the value is canonical (each type
  // has the same QualType but not QualType*).
  bool ContentIs(const clang::Decl* target_node) const {
    return GetAs<clang::Decl>() == target_node;
  }
  bool ContentIs(const clang::Stmt* target_node) const {
    return GetAs<clang::Stmt>() == target_node;
  }
  bool ContentIs(const clang::Type* target_node) const {
    return GetAs<clang::Type>() == target_node;
  }
  bool ContentIs(const clang::TypeLoc* target_node) const {
    const clang::TypeLoc* type_loc = GetAs<clang::TypeLoc>();
    if (type_loc == NULL || target_node == NULL)
      return type_loc == target_node;
    return *type_loc == *target_node;
  }
  // We don't define ContentIs() for other kinds of AST nodes
  // (e.g. TemplateName) as it's non-trivial (Clang doesn't define
  // equality comparison functions for them) and we don't need that
  // yet.

  // Returns true if the current node or any ancestor of it contains
  // the exact same thing as ptr. One use of this is to check for
  // recursion.
  template<typename T> bool StackContainsContent(const T* ptr) const {
    for (const ASTNode* node = this; node != NULL; node = node->parent_) {
      if (node->ContentIs(ptr))
        return true;
    }
    return false;
  }

  // Generation 0 == you, 1 == parent, etc.
  template<typename To> const To* GetAncestorAs(int generation) const {
    const ASTNode* target_node = this;
    for (; generation > 0; --generation) {
      if (target_node->parent_ == NULL)
        return NULL;
      target_node = target_node->parent_;
    }
    // DynCast needs a dummy argument of type To* to help its resolution.
    const To* dummy = NULL;
    return target_node->DynCast<To>(dummy);
  }

  // Convenience methods.

  template<typename To> bool AncestorIsA(int generation) const {
    return GetAncestorAs<To>(generation) != NULL;
  }

  // Returns true if this node or any of its ancestors contains a T*.
  template<typename T> bool HasAncestorOfType() const {
    for (const ASTNode* node = this; node != NULL; node = node->parent_) {
      if (node->IsA<T>())
        return true;
    }
    return false;
  }

  template<typename To> const To* GetParentAs() const {
    return GetAncestorAs<To>(1);
  }

  template<typename To> bool ParentIsA() const {
    return AncestorIsA<To>(1);
  }

  template<typename To> const To* GetAs() const {
    return GetAncestorAs<To>(0);
  }

  template<typename To> bool IsA() const {
    return AncestorIsA<To>(0);
  }

private:
  enum NodeKind {
    kUnusedKind, kDeclKind, kStmtKind, kTypeKind, kTypelocKind, kNNSKind,
    kTemplateNameKind, kTemplateArgumentKind, kTemplateArgumentLocKind
  };

  // Returns a casted pointer if this object actually is of the given
  // type (or a subclass of the given type), and NULL otherwise. We
  // have to use overloading on To's kind_, in these helper
  // methods, in order to get llvm's dyn_cast to compile -- it gets
  // upset (at compile time, sadly) if from-type and to-type aren't in
  // the same type hierarchy. So To must be specified both in the
  // template arg and in the method parameter.
  template<typename To> const To* DynCast(const clang::Decl*) const {
    if (kind_ != kDeclKind) return NULL;
    return dyn_cast<To>(as_decl_);
  }
  template<typename To> const To* DynCast(const clang::Stmt*) const {
    if (kind_ != kStmtKind) return NULL;
    return dyn_cast<To>(as_stmt_);
  }
  template<typename To> const To* DynCast(const clang::Type*) const {
    // We also will cast ourselves to a type if we're a typeloc.
    // This simplifies a lot of code lower down that doesn't care
    // to distinguish. For code that *does* care to distinguish,
    // it should check for typelocs first:
    // if (node.IsA<FooTypeLoc>()) ... else if (node.IsA<FooType>()) ...
    if (kind_ == kTypelocKind)
      return dyn_cast<To>(as_typeloc_->getTypePtr());
    if (kind_ != kTypeKind) return NULL;
    return dyn_cast<To>(as_type_);
  }
  template<typename To> const To* DynCast(const clang::TypeLoc*) const {
    if (kind_ != kTypelocKind) return NULL;
    return dyn_cast<To>(as_typeloc_);
  }
  template<typename To> const To* DynCast(
      const clang::NestedNameSpecifier*) const {
    if (kind_ != kNNSKind) return NULL;
    return as_nns_;
  }
  template<typename To> const To* DynCast(const clang::TemplateName*) const {
    if (kind_ != kTemplateNameKind) return NULL;
    return as_template_name_;
  }
  template<typename To> const To* DynCast(
      const clang::TemplateArgument*) const {
    // We also will cast ourselves to a templateargument if we're a
    // templateargumentloc. This simplifies a lot of code lower down
    // that doesn't care to distinguish. For code that *does* care to
    // distinguish, it should check for typelocs first.
    if (kind_ == kTemplateArgumentLocKind)
      return &as_template_argloc_->getArgument();
    if (kind_ != kTemplateArgumentKind) return NULL;
    return as_template_arg_;
  }
  template<typename To> const To* DynCast(
      const clang::TemplateArgumentLoc*) const {
    if (kind_ != kTemplateArgumentLocKind) return NULL;
    return as_template_argloc_;
  }

  // If this node is of a type that knows its location, sets loc and
  // returns true, otherwise returns false and leaves loc unchanged.
  bool FillLocationIfKnown(clang::SourceLocation* loc) const {
    switch (kind_) {
      case kDeclKind:
        *loc = as_decl_->getLocation();
        return true;
      case kStmtKind:
        *loc = as_stmt_->getLocStart();
        return true;
      case kTypeKind:
        return false;
      case kTypelocKind:
        *loc = as_typeloc_->getBeginLoc();
        return true;
      case kNNSKind: return false;
      case kTemplateNameKind: return false;
      case kTemplateArgumentKind: return false;
      case kTemplateArgumentLocKind:
        *loc = as_template_argloc_->getLocation();
        return true;
      default:
        assert(false && "Unexpected kind of ASTNode");
        return false;
    }
  }

  const NodeKind kind_;
  union {
    const clang::Decl* as_decl_;
    const clang::Stmt* as_stmt_;
    const clang::Type* as_type_;
    const clang::TypeLoc* as_typeloc_;
    const clang::NestedNameSpecifier* as_nns_;
    const clang::TemplateName* as_template_name_;
    const clang::TemplateArgument* as_template_arg_;
    const clang::TemplateArgumentLoc* as_template_argloc_;
  };
  const ASTNode* parent_;
  bool in_fwd_decl_context_;
  const clang::SourceManager& source_manager_;
};