RFC: Name lookup overhaul

This patch is a follow-on to my qualified/unqualified name lookup patch. It unifies the name-lookup mechanisms used in various parts of the AST and separates lexical name lookup from qualified name lookup. In particular, the patch:

  * Makes DeclContext the central data structure for storing and looking up declarations within existing declarations, e.g., members of structs/unions/classes, enumerators in C++0x enums, members of C++ namespaces, and (later) members of Objective-C interfaces/implementations. DeclContext uses a lazily-constructed data structure optimized for fast lookup (array for small contexts, hash table for larger contexts).
  * Implements C++ qualified name lookup in terms of lookup into DeclContext
  * Implements C++ unqualified name lookup in terms of qualified+unqualified name lookup (since unqualified lookup is not purely lexical in C++!)
  * Limits the use of the chains of declarations stored in IdentifierInfo to those names declared lexically.
  * Eliminates CXXFieldDecl, collapsing its behavior into FieldDecl. (FieldDecl is now a ScopedDecl).
  * Makes RecordDecl into a DeclContext and eliminates its Members/NumMembers fields (since one can just iterate through the DeclContext to get the fields).

Here's a particularly fun example that illustrates why DeclContext is designed the way it is:

  namespace N {
    void f(); // #1

    void f() { } // #2
  }

  namespace N {
    void f(); // #3
    void f(int); // #4
  }

Each of those declarations of namespace N is represented by a NamespaceDecl, which is a DeclContext, so we have two DeclContexts. The first DeclContext contains declarations for #1 and #2, so if you walk the list decls_begin()/decls_end() you'll see just #1 and #2.

The second DeclContext contains declarations for #3 and #4, so if you walk the list decls_begin()/decls_end() you'll see just #3 and #4. So, walking decls_begin()/decls_end() gives you the syntactic view of the declarations in that context.

If you want all of the declarations in namespace N (regardless of which definition of 'N' they show up in), DeclContext provides access to the "primary" (first) DeclContext for a type and a way to retrieve the next DeclContext in the chain.

The lookup mechanism for DeclContext provides lookup results based on the semantics of name lookup. Lookup looks across all DeclContexts for a namespace. More importantly, the lookup structure is built such that redeclarations are filtered out. So if we look up N::f, we get an OverloadedFunctionDecl containing #3 and #4 (one can follow #3 back to #2 and #1).

Comments greatly appreciated! I plan to commit this some time tomorrow unless I hear howls of protest.

  - Doug

decl-context-overhaul-lazy.patch.gz (30.2 KB)

This patch is a follow-on to my qualified/unqualified name lookup patch. It unifies the name-lookup mechanisms used in various parts of the AST and separates lexical name lookup from qualified name lookup. In particular, the patch:

Yay again :slight_smile:

Comments greatly appreciated! I plan to commit this some time tomorrow unless I hear howls of protest.

First howl: if you have to gzip your patch, it is too big :). In the future, please try to break things up into smaller pieces. This is important for review and also tracking changes over time: identifying which patch caused a correctness or perf regression.

Please *commit* this patch before addressing any follow up below, and address them with subsequent patches.

+++ include/clang/AST/DeclBase.h (working copy)
@@ -257,10 +260,27 @@
    /// DeclKind - This indicates which class this is.
    Decl::Kind DeclKind : 8;

- /// DeclChain - Linked list of declarations that are defined inside this
- /// declaration context.
- ScopedDecl *DeclChain;

+ /// LookupPtr - Pointer to a data structure used to lookup
+ /// declarations within this context. If the context contains fewer
+ /// than seven declarations, the number of declarations is provided
+ /// in the 3 lowest-order bits and the upper bits are treated as a
+ /// pointer to an array of NamedDecl pointers. If the context
+ /// contains seven or more declarations, the upper bits are treated
+ /// as a pointer to a DenseMap<DeclarationName, TwoNamedDecls>.
+ llvm::PointerIntPair<void*, 3> LookupPtr;

Comments greatly appreciated! I plan to commit this some time tomorrow unless I hear howls of protest.

First howl: if you have to gzip your patch, it is too big :). In the future, please try to break things up into smaller pieces. This is important for review and also tracking changes over time: identifying which patch caused a correctness or perf regression.

Please *commit* this patch before addressing any follow up below, and address them with subsequent patches.

Will do.

+++ include/clang/AST/DeclBase.h (working copy)
@@ -257,10 +260,27 @@
  /// DeclKind - This indicates which class this is.
  Decl::Kind DeclKind : 8;

- /// DeclChain - Linked list of declarations that are defined inside this
- /// declaration context.
- ScopedDecl *DeclChain;

+ /// LookupPtr - Pointer to a data structure used to lookup
+ /// declarations within this context. If the context contains fewer
+ /// than seven declarations, the number of declarations is provided
+ /// in the 3 lowest-order bits and the upper bits are treated as a
+ /// pointer to an array of NamedDecl pointers. If the context
+ /// contains seven or more declarations, the upper bits are treated
+ /// as a pointer to a DenseMap<DeclarationName, TwoNamedDecls>.
+ llvm::PointerIntPair<void*, 3> LookupPtr;
+
+ /// Decls - Contains all of the declarations that are defined inside
+ /// this declaration context.
+ std::vector<ScopedDecl*> Decls;

This shrinks ScopedDecl by one pointer but grows DeclContext by ~3 pointers, plus the memory that the vector keeps to hold the ScopedDecl's. This is somewhat concerning because you also make every Record a DeclContext, and make FieldDecls into ScopedDecls. There are 1743 records in carbon.h and 1565 in Cocoa.h (according to "clang -print-stats INPUTS/Cocoa_h.m |& grep struct"). There are also thousands of enum decls that will get larger as well because they are already declcontext's (e.g. 2048 in cocoa).

Going forward, is there any way to reduce this memory hit?

Yes. Decls and LookupPtr should be moved out to a separate structure, because declarations that aren't definitions (e.g., "struct X;") don't need the storage for information that's only available in definitions. Fixing this is tied up with moving the definition of entities out of the main declaration structures (e.g., RecordDecl shouldn't store the record's members; it should have a pointer to a RecordDef, which stores the members).

Doing that now would have made the patch even bigger :slight_smile:

Is a pointer chain through ScopeDecls really that bad?

Having the list of ScopeDecls in order really helps when lazily constructing the lookup structure in DeclContext. We could possibly live without it---by implementing a different algorithm depending on whether the DeclContext's chain has been reversed yet or not---but to keep that ScopedDecl list in order we either need to make every ScopedDecl bigger by a pointer or make ScopedDecl smaller and DeclContext larger.

/// FieldDecl - An instance of this class is created by Sema::ActOnField to
/// represent a member of a struct/union/class.
-class FieldDecl : public NamedDecl {
+class FieldDecl : public ScopedDecl {
+ bool Mutable : 1;

Do you forsee elimination of CXXFieldDecl completely forever? If not, it would be nice to sink the Mutable bit down into the CXXFieldDecl class. This would be a good place to put access control, mutable, and other C++ notions.

Yes, I'm intent on killing off CXXFieldDecl. I was planning to kill off CXXRecordDecl, too, but now I recall the discussion we had about this a while back:

  http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-October/003199.html

Perhaps we should discuss this sub-issue off-line and come back with a policy for handling C/C++ distinctions in the ASTs.

+ enumerator_iterator enumerator_begin() const {

Any reason to spell out "enumerator" instead of using enum_begin/enum_iterator/enum_end() etc? I like the iterator interface a lot better than the old getEnumConstantList stuff!

"enum" could be "enumerator" or "enumeration". It's not a big deal---it's pretty obvious from the context that we're talking about enumerators here---but I'm not a fan of ambiguous shorthand.

+ class field_iterator {

There is a lot of commonality between field_iterator and enumerator_iterator, maybe they should be genericized?

field_iterator has a bit of logic to skip over declarations that aren't fields. We could certainly have an iterator adaptor that effectively performs a cast<T> on the underlying iterator.

(All this would be so trivial with Boost's iterator adaptors. *sigh*).

+ typedef field_iterator field_const_iterator;

Naughty naughty :slight_smile:

Busted!

+ /// Entity - The entity with which this scope is associated. For
+ /// example, the entity of a class scope is the class itself, the
+ /// entity of a function scope is a function, etc. This field is
+ /// maintained by the Action implementation.
+ void *Entity;

This is orthogonal to this patch, but I wonder if Scope::Entity could be used to simplify your recent "if(int x = ..)" patch. If the entity for the if-scope was the if statement itself, you wouldn't need an "is defined in an if" bit on decl.

We can detect the "if" part pretty easy, by looking up the scope chain to the control-scope parent. It's knowing that we're in the "else" block that required the extra bits... perhaps there's still a better way. I'll think about it.

+++ lib/Analysis/RegionStore.cpp (working copy)
@@ -433,8 +433,13 @@

  llvm::ImmutableList<SVal> StructVal = getBasicVals().getEmptySValList();

- for (int i = RD->getNumMembers() - 1; i >= 0; --i) {
- FieldRegion* FR = MRMgr.getFieldRegion(RD->getMember(i), R);
+ for (DeclContext::reverse_decl_iterator Mem = RD->decls_rbegin();
+ Mem != RD->decls_rend(); ++Mem) {
+ FieldDecl *FD = dyn_cast<FieldDecl>(*Mem);
+ if (!FD)
+ continue;
+

Please suggest to Ted that he upgrade this to use field_iterator after your patch goes in.

Okay.

+++ lib/CodeGen/CodeGenTypes.cpp (working copy)
@@ -392,7 +392,7 @@
  } else if (TD->isUnion()) {
    // Just use the largest element of the union, breaking ties with the
    // highest aligned member.
- if (RD->getNumMembers() != 0) {
+ if (RD->field_begin() != RD->field_end()) {

Do you think a field_empty() method would be worthwhile?

Sure!

+ // Remove the previous declaration from the scope.
+ if (DeclRegionScope->isDeclScope(OrigNS)) {
+ IdResolver.RemoveDecl(OrigNS);
+ DeclRegionScope->RemoveDecl(OrigNS);
+ }

Funky indentation, not tabs this time :slight_smile:

Oops, that's all me. Not Emacs fault :frowning:

+++ lib/Sema/SemaInit.cpp (working copy)

+ int InitializableMembers
+ = std::count_if(structDecl->field_begin(), structDecl->field_end(),
+ std::mem_fun(&FieldDecl::getDeclName));

Couldn't help yourself huh? :slight_smile: Plz make InitializableMembers while you're poking it.

I even had to look up std::mem_fun, since I'm so accustomed to just using boost::mem_fn or boost::bind :slight_smile:

+++ lib/Sema/SemaCXXScopeSpec.cpp (working copy)
+ if (LookupCtx && !LookInParentCtx) {
+ IdIsUndeclared = true;
+ for (DeclContext::lookup_const_result I = LookupCtx->lookup(Context, Name);
+ I.first != I.second; ++I.first) {
+ IdIsUndeclared = false;
+ if (((*I.first)->getIdentifierNamespace() & Decl::IDNS_Tag) &&
+ !isa<EnumDecl>(*I.first))
+ return *I.first;
+ }

Personally, I find all the parens and *'s and ->'s distracting. Please do something like:

whatever_t I, E;
tie(I, E) = LookupCtx->lookup(Context, Name);
for (; Start != End; ++Start) { ..

tie is defined in llvm/ADT/STLExtras.h.

Hrm, okay.

+++ lib/Sema/SemaDecl.cpp (working copy)

     if (!Ovl) {
        // We haven't yet overloaded this function. Take the existing
        // FunctionDecl and put it into an OverloadedFunctionDecl.
        Ovl = OverloadedFunctionDecl::Create(Context,
                                             FD->getDeclContext(),
                                             FD->getDeclName());
- Ovl->addOverload(dyn_cast<FunctionDecl>(*I));
+ Ovl->addOverload(dyn_cast<FunctionDecl>(Prev));

Unrelated to this patch, but please use cast<> here instead of dyn_cast<>.

Okay.

+ // If there is an name binding for the existing FunctionDecl,
+ // remove it.
+ for (IdentifierResolver::iterator I
+ = IdResolver.begin(FD->getDeclName(), FD->getDeclContext(),
+ false/*LookInParentCtx*/);
+ I != IdResolver.end(); ++I) {

This evaluates IdResolver.end() each time through the loop.

Yup, got it.

+ if (ScopedDecl *SD = dyn_cast<ScopedDecl>(D))
+ CurContext->addDecl(Context, SD);
+ else {
+ // Other kinds of declarations don't currently have a context
+ // where they need to be inserted.
+ }

Instead of the empty else, how about:

+ // Add scope decls to their context. Other kinds of declarations
+ // don't have a context where they need to be inserted.
+ if (ScopedDecl *SD = dyn_cast<ScopedDecl>(D))
+ CurContext->addDecl(Context, SD);

Fair enough.

@@ -193,22 +220,77 @@
/// namespace.
Decl *Sema::LookupDecl(DeclarationName Name, unsigned NSI, Scope *S,

Unrelated to this patch, but please substantially improve the doxygen comment for lookupdecl :slight_smile:

Who, me? (looks around for a volunteer; fails)

LookupDecl now looks like this:

+ if (LookupCtx) {
     // Qualified C++
+ } else if (getLangOptions().CPlusPlus &&
+ (NS & (Decl::IDNS_Ordinary | Decl::IDNS_Tag))) {
     // Unqualified C++
+ } else {
     // C & label.
+ }

This is optimizing (both in performance and for readability) for the wrong case. Please use something like:

if (LookupCtx == 0 &&
   (!getLangOptions().CPlusPlus || (NS is label)))) {
// C & label case.
} else if (LookCtx == 0) {
// unqualified c++
} else {
// qualified c++
}

Alternatively, it might be better to make the qualified C++ case come before the unqualified c++ case since it is shorter (helps readability).

Okay.

+ for (; S; S = S->getParent()) {
+ // Check whether the IdResolver has anything in this scope.
+ // FIXME: The isDeclScope check could be expensive. Can we do better?
+ for (; I != IEnd && S->isDeclScope(*I); ++I)

isDeclScope is a smallptrset lookup, so it isn't *that* expensive (not like a linear scan). Better would be better though :wink:

Yup.

+ // If there is an entity associated with this scope, it's a
+ // DeclContext. We might need to perform qualified lookup into
+ // it.
+ DeclContext *Ctx = static_cast<DeclContext *>(S->getEntity());
+ while (Ctx && Ctx->isFunctionOrMethod())
+ Ctx = Ctx->getParent();

What was the decision about this loop? At the very least, the comment needs to be improved.

"Pending"... will address blocks issue in other thread.

+ for (DeclContext::lookup_const_result I = Ctx->lookup(Context, Name);
+ I.first != I.second; ++I.first) {
+ // FIXME: Cache this result in the IdResolver
+ if ((*I.first)->getIdentifierNamespace() & NS)
+ return *I.first;

This isn't as crazy as the loop above, but using tie might be better here, your call. You have extra an space after the return.

Okay.

+++ lib/Sema/SemaOverload.cpp (working copy)
@@ -1955,10 +1955,9 @@
  // (13.3.1.1.1); otherwise, the set of member candidates is
  // empty.
  if (const RecordType *T1Rec = T1->getAsRecordType()) {
- IdentifierResolver::iterator I
- = IdResolver.begin(OpName, cast<CXXRecordType>(T1Rec)->getDecl(),
- /*LookInParentCtx=*/false);
- NamedDecl *MemberOps = (I == IdResolver.end())? 0 : *I;
+ DeclContext::lookup_const_result Lookup
+ = cast<CXXRecordType>(T1Rec)->getDecl()->lookup(Context, OpName);

Do you still need this and the other similar cast<>'s?

Ooh, you're right, I don't. Yay!

+ case Decl::ObjCInterface:
+ // FIXME: Can Objective-C interfaces be forward-declared?
+ return this;

Yes, but with a completely different implementation than in C. FWIW, they look like this:

@class foo;

foo* P; // ok now.

@interface foo ...
@end

That'll probably effect how we handle the declaration/definition split.

+DeclContext::lookup(ASTContext &Context, DeclarationName Name) {
...
+ if (LookupPtr.getPointer() == 0) {
+ for (DeclContext *DCtx = this; DCtx; DCtx = DCtx->getNextContext())
+ for (decl_iterator D = DCtx->decls_begin(); D != DCtx->decls_end(); ++D)

This evaluates decls_end every time through the loop.

decls_end() is insanely cheap once inlined.

+ if (isLookupMap()) {
+ StoredDeclsMap *Map = static_cast<StoredDeclsMap*>(LookupPtr.getPointer());
...
+ NamedDecl **Array = static_cast<NamedDecl**>(LookupPtr.getPointer());

You do this pattern in many places. How about something like:

if (StoredDeclsMap *LookupMap = getLookupMap()) { // private method obviously.

} else {
   NamedDecl **Array = getLookupArray();

}

That way the cast only happens in the inline private method, instead of in several places.

Fine by me.

+ for (unsigned Idx = 0; Idx < Size; ++Idx)

It doesn't really matter, but please use != for consistency.

+ // We always keep declarations of the same name next to each other
+ // in the array, so that it is easy to return multiple results
+ // from lookup(). There will be zero, one, or two declarations of
+ // the same name.
+ unsigned Match;
+ for (Match = 0; Match < Size; ++Match) {
+ if (Array[Match]->getDeclName() == D->getDeclName())
+ break;

indentation on the break and several other places in this function.

Okay.

+ // Put this declaration into the appropriate slot.
+ TwoNamedDecls Val;
+ Val.Decls[0] = 0;
+ Val.Decls[1] = 0;
+ Val.Decls[IndexOfD] = D;

This is a somewhat strange idiom to me (setting then overwriting one element). How about:

+ // Put this declaration into the appropriate slot.
+ TwoNamedDecls Val;
+ Val.Decls[IndexOfD] = D;
+ Val.Decls[!IndexOfD] = 0;

Blech. Using '!' for numerical values is so... hackish, no?

+ Pos = Map->insert(std::make_pair(D->getDeclName(),Val)).first;

'Pos' is dead here, no need to capture the result of the insert.

Oops, it used to be living. Thanks.

+++ lib/AST/Decl.cpp (working copy)

+ return new (Mem) FieldDecl(Decl::Field, DC, L, Id, T, BW, Mutable, PrevDecl);

If CXXFieldDecl is gone for good, you can hardcode 'Decl::Field' into the FieldDecl ctor.

There are other subclasses of FieldDecl in the Objective-C part.

+ // FIXME: This is linear time. And the fact that we're indexing
+ // into the layout by position in the record means that we're
+ // either stuck numbering the fields in the AST or we have to keep
+ // the linear search (yuck and yuck).
+ unsigned i = 0;

I agree. Indexes have their uses though, we should probably talk (with daniel) about this sometime.

Yup.

+++ lib/AST/ASTContext.cpp (working copy)
@@ -539,7 +539,7 @@
  ASTRecordLayout *NewEntry = new ASTRecordLayout();
  Entry = NewEntry;

- NewEntry->InitializeLayout(D->getNumMembers());
+ NewEntry->InitializeLayout(std::distance(D->field_begin(), D->field_end()));

I'm not sure what InitializeLayout is doing with the size, but if it can be changed, it would be nice to eliminate the linear time walk here. Please add a FIXME.

Will do.

Overall, I'm a big fan of this change. The biggest concern I have is memory use, but I think that can be "fixed" by (in a later patch!) threading a singly linked list through ScopeDecls. Thanks for working on this Doug, this is a huge cleanup!

Thanks for the review. I'll think about the memory issue a bit further and see what more we can do.

  - Doug

This shrinks ScopedDecl by one pointer but grows DeclContext by ~3 pointers, plus the memory that the vector keeps to hold the ScopedDecl's. This is somewhat concerning because you also make every Record a DeclContext, and make FieldDecls into ScopedDecls. There are 1743 records in carbon.h and 1565 in Cocoa.h (according to "clang -print-stats INPUTS/Cocoa_h.m |& grep struct"). There are also thousands of enum decls that will get larger as well because they are already declcontext's (e.g. 2048 in cocoa).

Going forward, is there any way to reduce this memory hit?

Yes. Decls and LookupPtr should be moved out to a separate structure, because declarations that aren't definitions (e.g., "struct X;") don't need the storage for information that's only available in definitions. Fixing this is tied up with moving the definition of entities out of the main declaration structures (e.g., RecordDecl shouldn't store the record's members; it should have a pointer to a RecordDef, which stores the members).

Ok

Doing that now would have made the patch even bigger :slight_smile:

Sure, I'm mostly interested in the final point we'll arrive at, I'm all for incremental steps in the right direction :slight_smile:

/// FieldDecl - An instance of this class is created by Sema::ActOnField to
/// represent a member of a struct/union/class.
-class FieldDecl : public NamedDecl {
+class FieldDecl : public ScopedDecl {
+ bool Mutable : 1;

Do you forsee elimination of CXXFieldDecl completely forever? If not, it would be nice to sink the Mutable bit down into the CXXFieldDecl class. This would be a good place to put access control, mutable, and other C++ notions.

Yes, I'm intent on killing off CXXFieldDecl. I was planning to kill off CXXRecordDecl, too, but now I recall the discussion we had about this a while back:

  http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-October/003199.html

Perhaps we should discuss this sub-issue off-line and come back with a policy for handling C/C++ distinctions in the ASTs.

Doug convinced me that having something like "BitFieldDecl" vs "FieldDecl" is more useful than having "CXXFieldDecl" vs "FieldDecl". :slight_smile:

Both are fine with me, and we both agree that a "BitFieldDecl/CXXFieldDecl" should be an internal implementation detail, the AST clients should just see FieldDecl, and the "give me the number of bits in the bitfield" method should be valid on it, but just return null if the field isn't a bitfield.

+ enumerator_iterator enumerator_begin() const {

Any reason to spell out "enumerator" instead of using enum_begin/enum_iterator/enum_end() etc? I like the iterator interface a lot better than the old getEnumConstantList stuff!

"enum" could be "enumerator" or "enumeration". It's not a big deal---it's pretty obvious from the context that we're talking about enumerators here---but I'm not a fan of ambiguous shorthand.

Shorthand can be ok as long as it is consistent. How about enum for enumerator and enumcst for enumerator constant? Am I missing something?

+ class field_iterator {

There is a lot of commonality between field_iterator and enumerator_iterator, maybe they should be genericized?

field_iterator has a bit of logic to skip over declarations that aren't fields. We could certainly have an iterator adaptor that effectively performs a cast<T> on the underlying iterator.

This can be handled by making the template take a functor that decides whether an entry should be returned or skipped. The default functor (used by enums) would always return "don't skip".

+ typedef field_iterator field_const_iterator;

Naughty naughty :slight_smile:

Busted!

incidentally, this is something else that is useful to genericize on, instead of duplicating the iterator class. That way, field_iterator is just a typedef for a some_iterator<FieldDecl,... and const_iterator is some_iterator<const FieldDecl, ...

+ /// Entity - The entity with which this scope is associated. For
+ /// example, the entity of a class scope is the class itself, the
+ /// entity of a function scope is a function, etc. This field is
+ /// maintained by the Action implementation.
+ void *Entity;

This is orthogonal to this patch, but I wonder if Scope::Entity could be used to simplify your recent "if(int x = ..)" patch. If the entity for the if-scope was the if statement itself, you wouldn't need an "is defined in an if" bit on decl.

We can detect the "if" part pretty easy, by looking up the scope chain to the control-scope parent. It's knowing that we're in the "else" block that required the extra bits... perhaps there's still a better way. I'll think about it.

Ahhh, good point. One somewhat hackish solution would be to only set the entity when parsing the else block instead of setting it when the scope is formed.

+ // Put this declaration into the appropriate slot.
+ TwoNamedDecls Val;
+ Val.Decls[IndexOfD] = D;
+ Val.Decls[!IndexOfD] = 0;

Blech. Using '!' for numerical values is so... hackish, no?

*shrug*, it's a hard call, and I don't have a strong opinion. If you like it the other way, I'm ok with that too.

+++ lib/AST/Decl.cpp (working copy)

+ return new (Mem) FieldDecl(Decl::Field, DC, L, Id, T, BW, Mutable, PrevDecl);

If CXXFieldDecl is gone for good, you can hardcode 'Decl::Field' into the FieldDecl ctor.

There are other subclasses of FieldDecl in the Objective-C part.

Oh right, ok.

Thanks again Doug, this is great work,

-Chris

+ enumerator_iterator enumerator_begin() const {

Any reason to spell out "enumerator" instead of using enum_begin/enum_iterator/enum_end() etc? I like the iterator interface a lot better than the old getEnumConstantList stuff!

"enum" could be "enumerator" or "enumeration". It's not a big deal---it's pretty obvious from the context that we're talking about enumerators here---but I'm not a fan of ambiguous shorthand.

Shorthand can be ok as long as it is consistent. How about enum for enumerator and enumcst for enumerator constant? Am I missing something?

enumerator == enumerator constant, but the latter isn't a standard term. If we were being picky, EnumDecl should be EnumerationDecl (it declares an enumeration type) and EnumConstantDecl should be EnumeratorDecl (it declares an enumerator, e.g., one of the values in the enumeration type). That's why I don't like "enum"... it's shorthand for both of the standard terms.

+ class field_iterator {

There is a lot of commonality between field_iterator and enumerator_iterator, maybe they should be genericized?

field_iterator has a bit of logic to skip over declarations that aren't fields. We could certainly have an iterator adaptor that effectively performs a cast<T> on the underlying iterator.

This can be handled by making the template take a functor that decides whether an entry should be returned or skipped. The default functor (used by enums) would always return "don't skip".

The proper generic factoring is to have an iterator adaptor that does the cast<T> (a "transform" iterator, in Boost parlance) and another that does the skipping based on a function object (a "filter" iterator, in Boost parlance):

  Filter Iterator - 1.37.0
  Transform Iterator - 1.37.0

Right now, we only have one iterator that does the skipping (field_iterator), which also does a dyn_cast rather than a cast, so it's not terribly high priority to factor that out. We could certainly build a simple version of the transform iterator and use that for the casting iterators.

+ typedef field_iterator field_const_iterator;

Naughty naughty :slight_smile:

Busted!

incidentally, this is something else that is useful to genericize on, instead of duplicating the iterator class. That way, field_iterator is just a typedef for a some_iterator<FieldDecl,... and const_iterator is some_iterator<const FieldDecl, ...

This could certainly be a part of the cast<>'ing transform iterator.

+ /// Entity - The entity with which this scope is associated. For
+ /// example, the entity of a class scope is the class itself, the
+ /// entity of a function scope is a function, etc. This field is
+ /// maintained by the Action implementation.
+ void *Entity;

This is orthogonal to this patch, but I wonder if Scope::Entity could be used to simplify your recent "if(int x = ..)" patch. If the entity for the if-scope was the if statement itself, you wouldn't need an "is defined in an if" bit on decl.

We can detect the "if" part pretty easy, by looking up the scope chain to the control-scope parent. It's knowing that we're in the "else" block that required the extra bits... perhaps there's still a better way. I'll think about it.

Ahhh, good point. One somewhat hackish solution would be to only set the entity when parsing the else block instead of setting it when the scope is formed.

It feels so much cleaner to associate the entity with the "if" itself.

+ // Put this declaration into the appropriate slot.
+ TwoNamedDecls Val;
+ Val.Decls[IndexOfD] = D;
+ Val.Decls[!IndexOfD] = 0;

Blech. Using '!' for numerical values is so... hackish, no?

*shrug*, it's a hard call, and I don't have a strong opinion. If you like it the other way, I'm ok with that too.

I went with your solution.

  - Doug

"enum" could be "enumerator" or "enumeration". It's not a big deal---it's pretty obvious from the context that we're talking about enumerators here---but I'm not a fan of ambiguous shorthand.

Shorthand can be ok as long as it is consistent. How about enum for enumerator and enumcst for enumerator constant? Am I missing something?

enumerator == enumerator constant, but the latter isn't a standard term. If we were being picky, EnumDecl should be EnumerationDecl (it declares an enumeration type) and EnumConstantDecl should be EnumeratorDecl (it declares an enumerator, e.g., one of the values in the enumeration type). That's why I don't like "enum"... it's shorthand for both of the standard terms.

Ok, I agree. If you prefer the longer name, I'm completely fine with that.

field_iterator has a bit of logic to skip over declarations that aren't fields. We could certainly have an iterator adaptor that effectively performs a cast<T> on the underlying iterator.

This can be handled by making the template take a functor that decides whether an entry should be returned or skipped. The default functor (used by enums) would always return "don't skip".

The proper generic factoring is to have an iterator adaptor that does the cast<T> (a "transform" iterator, in Boost parlance) and another that does the skipping based on a function object (a "filter" iterator, in Boost parlance):

  Filter Iterator - 1.37.0
  Transform Iterator - 1.37.0

Right now, we only have one iterator that does the skipping (field_iterator), which also does a dyn_cast rather than a cast, so it's not terribly high priority to factor that out. We could certainly build a simple version of the transform iterator and use that for the casting iterators.

Ok. The risk of stacking iterators really deep like that is that many compilers really do suffer from abstraction penalty issues.

+ /// Entity - The entity with which this scope is associated. For
+ /// example, the entity of a class scope is the class itself, the
+ /// entity of a function scope is a function, etc. This field is
+ /// maintained by the Action implementation.
+ void *Entity;

This is orthogonal to this patch, but I wonder if Scope::Entity could be used to simplify your recent "if(int x = ..)" patch. If the entity for the if-scope was the if statement itself, you wouldn't need an "is defined in an if" bit on decl.

We can detect the "if" part pretty easy, by looking up the scope chain to the control-scope parent. It's knowing that we're in the "else" block that required the extra bits... perhaps there's still a better way. I'll think about it.

Ahhh, good point. One somewhat hackish solution would be to only set the entity when parsing the else block instead of setting it when the scope is formed.

It feels so much cleaner to associate the entity with the "if" itself.

Sure, I agree. I'm just saying that the entity would only be associated when the 'else' block is being parsed. When the 'then' part of the if is being parsed, the scope would just have a null entity,

-Chris