Canonical representation of declaration names

Currently in Clang, we have several different wants of representing
the name of a declaration:
  - Most of NamedDecl's descendants use a simple IdentifierInfo*
  - ObjCMethodDecl (which is *not* a NamedDecl) uses a Selector, which
is an optimized representation of Objective-C selectors that is a
masked IdentifierInfo* in common cases (0 or 1 argument selectors) and
a private MultiKeywordSelector* in other cases (>= 2 arguments).
  - C++ constructors, destructors, and conversion functions have a
dummy IdentifierInfo* (with identifiers like "<constructor>" and
"<conversion>"), then override NamedDecl::getName() to provide
human-readable names for these entities.

At Chris's suggestion, the attached patch introduces a new
class---DeclarationName---that serves as a name for a declaration
within the AST. It can represent normal identifiers, C++ special
names, and Objective-C selectors.

Where is DeclarationName Used?

declaration-name.patch (73.6 KB)

I double checked the patch for strict aliasing violations, didn't find any.

Where is DeclarationName Used?

DeclarationName is the type of the name stored in NamedDecl. Most
clients won't see much of a difference here, because an
IdentifierInfo* is implicitly convertible to a DeclarationName, and
those *Decl nodes that can only have normal identifiers for names will
still accept an IdentifierInfo*.

Nice.

*Decl nodes with special names---C++ constructors, destructors, and
conversion functions--- will take in a DeclarationName directly, which
will encode the kind of function and additional information needed to
identify the name (e.g., the type of a conversion function). In
addition, ObjCMethodDecl is now a NamedDecl, and its selector will be
stored in NamedDecl's DeclarationName. Conversion functions between
Selector and DeclarationName make this change mainly an implementation
detail

That is a great cleanup!

Efficiency Concerns

DeclarationName is the size of a single pointer. The lower two bits
are used to determine what kind of name this is. We optimize for three
common cases: an IdentifierInfo*, a 0-argument selector, and a
1-argument selector, and in each case the pointer is just a masked
IdentifierInfo*.

In the less-common case, the pointer is a masked
DeclarationNameExtra*, where the DeclarationNameExpr can either be a
multi-argument Objective-C selector (MultiKeywordSelector) or one of
the C++ special names (constructor, destructor, conversion function).
All of these cases need extra storage anyway.

Makes sense.

The encoded integer in DeclarationName is a unique value which can be
efficiently compared for equality. In the case of Objective-C
selectors and the C++ special functions, the DeclarationNameExtra
pointer is uniqued in a separate table (SelectorTable and
DeclarationNameTable, respectively).

Overall, DeclarationName should be as efficient as an IdentifierInfo*
in both space and time, and as efficient as a Selector for Objective-C
methods. DeclarationName is better than the current hack involving
creating special identifiers for constructors, destructors, and
conversion functions, and will make future work in the area of
conversion functions easier (see below).

Sounds like a win all around,

Known Issues

There's a small layering violation in DeclarationName. DeclarationName
lives in the AST, because that's where it makes sense to talk about
the name of a declaration as a more abstract entity. However,
DeclarationNameExtras and Selector live in Basic, because Selector is
used in the Parse-Sema interaction and Selector's internal
implementation depends on DeclarationNameExtras.

I'm missing something: how is that a layering violation? From your description, it sounds like Sema will depend on Basic. That is ok, but basic can't depend on sema.

We could move DeclarationName into Basic and make the QualType stored
by a C++ constructor/destructor/conversion function into an opaque
type pointer, but that seems wrong somehow. Maybe the real issue is
that Selector doesn't belong in Basic.

I'd be open to moving Selector up the stack, but this will impact ObjC actions callbacks that really do want to pass around selectors (iirc). Another options is to introduce a parser-level abstraction similar to declspec if that were needed. There are some other less appetizing options as well if this is a real problem.

Future Work

Right now, IdentifierResolver still traffics in IdentifierInfo
pointers, and declarations with non-identifier names are never pushed
into a scope. That may change with conversion functions, in which case
the IdentifierResolver will need to work with DeclarationNames. This
should only require a small tweak, so that we have a FETokenInfo field
that is accessible through a DeclarationName. It would, of course, not
be stored in DeclarationName itself but in its IdentifierInfo*
(FETokenInfo is already there) or its DeclarationNameExtras* (we would
need to add it here).

I think Argiris is far more qualified to know the implications of this than I am :slight_smile:

I'll take a look at the patch shortly after lunch, the description makes a lot of sense to me. Thanks a lot for tackling this Doug!

-Chris

Known Issues

There's a small layering violation in DeclarationName. DeclarationName
lives in the AST, because that's where it makes sense to talk about
the name of a declaration as a more abstract entity. However,
DeclarationNameExtras and Selector live in Basic, because Selector is
used in the Parse-Sema interaction and Selector's internal
implementation depends on DeclarationNameExtras.

I'm missing something: how is that a layering violation? From your
description, it sounds like Sema will depend on Basic. That is ok, but
basic can't depend on sema.

The layering violation is that something in Basic declares a friend
that is in Sema, but it's admittedly a very weak form of Basic->Sema
dependency.

We could move DeclarationName into Basic and make the QualType stored
by a C++ constructor/destructor/conversion function into an opaque
type pointer, but that seems wrong somehow. Maybe the real issue is
that Selector doesn't belong in Basic.

I'd be open to moving Selector up the stack, but this will impact ObjC
actions callbacks that really do want to pass around selectors (iirc).
Another options is to introduce a parser-level abstraction similar to
declspec if that were needed. There are some other less appetizing options
as well if this is a real problem.

One option would be to move DeclarationName to Basic and use it to
completely replace Selector. Then everything that can name a
declaration will use a single class type, across all languages.

  - Doug

Known Issues

==========

There’s a small layering violation in DeclarationName. DeclarationName

lives in the AST, because that’s where it makes sense to talk about

the name of a declaration as a more abstract entity. However,

DeclarationNameExtras and Selector live in Basic, because Selector is

used in the Parse-Sema interaction and Selector’s internal

implementation depends on DeclarationNameExtras.

I’m missing something: how is that a layering violation? From your

description, it sounds like Sema will depend on Basic. That is ok, but

basic can’t depend on sema.

The layering violation is that something in Basic declares a friend
that is in Sema, but it’s admittedly a very weak form of Basic->Sema
dependency.

Ahhh ok. If it is just a friend, I wouldn’t worry about it.

One option would be to move DeclarationName to Basic and use it to
completely replace Selector. Then everything that can name a
declaration will use a single class type, across all languages.

Is that the right thing to do independently of the layering issue? If so, then it’s interesting.

Jumping to the patch, it definitely qualifies as “beautiful code”. I like it a lot and you did an outstanding job on the interfaces. Here are some random minor comments:

+/// DeclarationNameExtra - Common base of the MultiKeywordSelector and
+/// CXXSpecialName classes, both of which are private classes that can
+/// be stored by the AST’s DeclarationName class.
+class DeclarationNameExtra {

This and DeclarationName is very clever. Can I con you into describing some of the high level design decisions behind this in docs/InternalsManual.html?

+class DeclarationName {

  • /// getExtra - Get the “extra” information associated with this
  • /// multi-argument selector or C++ special name.
  • DeclarationNameExtra *getExtra() const {
  • return reinterpret_cast<DeclarationNameExtra *>(Ptr & ~PtrMask);
  • }

Should assert that it is the right kind? I guess it doesn’t matter too much since it is private.

+class DeclarationNameTable {

Please add a doxygen comment.

+std::string NamedDecl::getName() const {

  • default:
  • break;

What is the default case here? This seems like it should be an assert(0), no?

-Chris

> One option would be to move DeclarationName to Basic and use it to
> completely replace Selector. Then everything that can name a
> declaration will use a single class type, across all languages.

Is that the right thing to do independently of the layering issue? If so,
then it's interesting.

I know I'd feel more comfortable if we had just one notion of names
for the various declarations in Clang. On the other hand, Selectors
are only used in the (relatively small) subset of Clang that deals
with Objective-C, so it would be somewhat unfortunate to use
DeclarationNames there.

Actually, I think I know what we should do: move all of the logic of
Selector into DeclarationName. Then, make Selector a subclass of
DeclarationName that can only be constructed from Objective-C
selectors. So we'll still have a single, unified notion of a
"declaration name" in Clang, and the Objective-C parts of the code can
still traffic in Selectors where it makes sense (Selector will, of
course, not add any overhead to DeclarationName).

Jumping to the patch, it definitely qualifies as "beautiful code". I like
it a lot and you did an outstanding job on the interfaces. Here are some
random minor comments:

+/// DeclarationNameExtra - Common base of the MultiKeywordSelector and
+/// CXXSpecialName classes, both of which are private classes that can
+/// be stored by the AST's DeclarationName class.
+class DeclarationNameExtra {
This and DeclarationName is very clever. Can I con you into describing some
of the high level design decisions behind this in docs/InternalsManual.html?

Sure thing!

+class DeclarationName {
..
+ /// getExtra - Get the "extra" information associated with this
+ /// multi-argument selector or C++ special name.
+ DeclarationNameExtra *getExtra() const {
+ return reinterpret_cast<DeclarationNameExtra *>(Ptr & ~PtrMask);
+ }
Should assert that it is the right kind? I guess it doesn't matter too much
since it is private.

It's better to have the assert than chase down a bug where this fails.

+class DeclarationNameTable {
Please add a doxygen comment.

Oops, okay.

+std::string NamedDecl::getName() const {
...
+ default:
+ break;

What is the default case here? This seems like it should be an assert(0),
no?

Yes, it should be an assert(0).

Thanks for the review! Unless I hear howls of protest, I'll be making
the Selector change and committing all of this on Monday or Tuesday.

  - Doug

One option would be to move DeclarationName to Basic and use it to
completely replace Selector. Then everything that can name a
declaration will use a single class type, across all languages.

Is that the right thing to do independently of the layering issue? If so,
then it's interesting.

I know I'd feel more comfortable if we had just one notion of names
for the various declarations in Clang. On the other hand, Selectors
are only used in the (relatively small) subset of Clang that deals
with Objective-C, so it would be somewhat unfortunate to use
DeclarationNames there.

Is there a perf/memory cost to ObjC to do this? If not, I think the "conceptual overhead" is just fine.

Actually, I think I know what we should do: move all of the logic of
Selector into DeclarationName. Then, make Selector a subclass of
DeclarationName that can only be constructed from Objective-C
selectors. So we'll still have a single, unified notion of a
"declaration name" in Clang, and the Objective-C parts of the code can
still traffic in Selectors where it makes sense (Selector will, of
course, not add any overhead to DeclarationName).

I'm not sure how this will pan out. Are you ok with committing this patch first, and doing these changes as a follow on?

Thanks for the review! Unless I hear howls of protest, I'll be making
the Selector change and committing all of this on Monday or Tuesday.

Awesome, thanks again for tackling this Doug!

-Chris

One option would be to move DeclarationName to Basic and use it to
completely replace Selector. Then everything that can name a
declaration will use a single class type, across all languages.

Is that the right thing to do independently of the layering issue? If
so,
then it's interesting.

I know I'd feel more comfortable if we had just one notion of names
for the various declarations in Clang. On the other hand, Selectors
are only used in the (relatively small) subset of Clang that deals
with Objective-C, so it would be somewhat unfortunate to use
DeclarationNames there.

Is there a perf/memory cost to ObjC to do this? If not, I think the
"conceptual overhead" is just fine.

There shouldn't be any cost.

Actually, I think I know what we should do: move all of the logic of
Selector into DeclarationName. Then, make Selector a subclass of
DeclarationName that can only be constructed from Objective-C
selectors. So we'll still have a single, unified notion of a
"declaration name" in Clang, and the Objective-C parts of the code can
still traffic in Selectors where it makes sense (Selector will, of
course, not add any overhead to DeclarationName).

I'm not sure how this will pan out. Are you ok with committing this patch
first, and doing these changes as a follow on?

Sure.

  - Doug