Written name of conversion operator

typedef int i1;
typedef int i2;
struct S {
  operator i1() {
    return 0;
  }
};
int f() {
  S a;
  return a;
  return a.operator i1();
  return a.operator i2();
}

template <typename T>
struct R {
  int f(T x) {
    return x.operator i1();
  }
};

We have attempted to extract from AST the name of operator written in
the source but we have failed:

- using the DeclarationName we get the "wrong" name each time the name
is not the one of the canonical type

- using the TypeSourceInfo of CXXConversionDecl should be always correct
for operator declaration, but it fails on operator call if the name
specified in the call is different from the one in the declaration

Is there a know way to obtain the written name?

Unfortunately, no, and I'm not sure there's a straightforward way to make this possible, either. Possibly some extra storage on a DeclRefExpr / UnresolvedExpr?

John.

... and also in the three kinds of MemberExpr :frowning:

I'd like to propose you a whole different approach: non canonical
DeclarationName.

The equality comparison would become:

  /// operator== - Determine whether the specified names are identical..
  friend bool operator==(DeclarationName LHS, DeclarationName RHS) {
    return LHS.getCanonical() == RHS.getCanonical();
  }

The size of DeclarationName would not change.

About getCanonical() we have a few options from which to choose taking
in account speed/space tradeoff:

1) each object pointed by DeclarationName ptr contains a "Canonical"
field that getCanonical() implementation read unconditionally.

2) only c++ conversion operator id has a canonical field so
getCanonical() implementation uses a conditional on name kind

I considered this, but I think we really want to avoid it. The major problem is lookup, where conversion functions (and only conversion functions) have to be hashed by a different value. It's a very fundamental sort of thing to be messing around with and making less efficient just to preserve source information better.

John.

We have attempted to extract from AST the name of operator written in
the source but we have failed:

- using the DeclarationName we get the "wrong" name each time the name
is not the one of the canonical type

- using the TypeSourceInfo of CXXConversionDecl should be always correct
for operator declaration, but it fails on operator call if the name
specified in the call is different from the one in the declaration

Is there a know way to obtain the written name?

Unfortunately, no, and I'm not sure there's a straightforward way to make this possible, either. Possibly some extra storage on a DeclRefExpr / UnresolvedExpr?

... and also in the three kinds of MemberExpr :frowning:

I'd like to propose you a whole different approach: non canonical
DeclarationName.

The equality comparison would become:

/// operator== - Determine whether the specified names are identical..
friend bool operator==(DeclarationName LHS, DeclarationName RHS) {
   return LHS.getCanonical() == RHS.getCanonical();
}

This could work, and perhaps by adding appropriate specializations we could get DenseMaps to work. However, I don't think it's quite the right approach, because it only solves half of the problem: we'll be able to reproduce the spelling of the operator name from a non-canonical DeclarationName, but we won't have source location information for type named by the conversion function.

I have a different idea of how to address this issue, for which I had previously filed this bug report:

  http://llvm.org/bugs/show_bug.cgi?id=6357

Essentially, it suggests adding a DeclarationNameLoc type that, similarly to TypeLoc, contains information about the exact spelling of and the source locations for a declaration name. DeclarationNameLoc should remain small (IdentifierInfo* + SourceLocation for the common case), and use separate storage when we need to store extra information (e.g., a conversion operator would refer to the TypeSourceInfo * of the conversion type as written).

There would have to be a way---probably via the ASTContext---to extract a DeclarationName from a DeclarationNameLoc, for name-lookup purposes.

We would use DeclarationNameLoc within the ASTs (MemberExpr, DeclRefExpr, etc.) in lieu of DeclarationName.

2) only c++ conversion operator id has a canonical field so
getCanonical() implementation uses a conditional on name kind

FWIW, C++ constructors and destructors should also get source information.

  - Doug

You're more inclined toward:

struct DeclarationNameLoc {
  DeclarationName Name;
  char Extra[0];
};

and to have a field DeclarationNameLoc* NameLoc inside AST nodes

or

struct DeclarationNameLoc {
  DeclarationName Name;
  void* Extra;
};

and to have a field DeclarationNameLoc NameLoc inside AST nodes?

Another possibility is to have Extra data allocated at end of containing
node, but I think this might become too messy.

I have a different idea of how to address this issue, for which I had previously filed this bug report:

  http://llvm.org/bugs/show_bug.cgi?id=6357

Essentially, it suggests adding a DeclarationNameLoc type that, similarly to TypeLoc, contains information about the exact spelling of and the source locations for a declaration name. DeclarationNameLoc should remain small (IdentifierInfo* + SourceLocation for the common case), and use separate storage when we need to store extra information (e.g., a conversion operator would refer to the TypeSourceInfo * of the conversion type as written).

There would have to be a way---probably via the ASTContext---to extract a DeclarationName from a DeclarationNameLoc, for name-lookup purposes.

We would use DeclarationNameLoc within the ASTs (MemberExpr, DeclRefExpr, etc.) in lieu of DeclarationName.

You're more inclined toward:

struct DeclarationNameLoc {
DeclarationName Name;
char Extra[0];
};

and to have a field DeclarationNameLoc* NameLoc inside AST nodes

or

struct DeclarationNameLoc {
DeclarationName Name;
void* Extra;
};

and to have a field DeclarationNameLoc NameLoc inside AST nodes?

Something like the latter, since we absolutely must make sure that the common case of (IdentifierInfo*, SourceLocation) doesn't require an extra memory allocation. I would probably end up creating something similar to DeclarationName itself, with a bit-mangled pointer for the name and storage for the first two locations.

  struct DeclarationNameLoc {
    void *Ptr;
    SourceLocations Locs[2];
  };

Another possibility is to have Extra data allocated at end of containing
node, but I think this might become too messy.

I think that would be too messy.

  - Doug

Following your guideline I've done a little case study (I've excluded
some cases I don't know):

struct DeclarationNameLoc {
  union {
    struct {
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation NameLoc;
    } Identifier;
    struct {
      TypeSourceInfo* TInfo;
    } CXXConstructorName;
    struct {
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation TildeLoc;
      TypeSourceInfo* TInfo;
    } CXXDestructorName;
    struct {
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation OperatorLoc;
      TypeSourceInfo* TInfo;
    } CXXConversionFunctionName;
    struct {
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation OperatorLoc;
      SourceLocation OpLoc;
      SourceLocation LSquareLoc;
      // Not so useful
      // SourceLocation RSquareLoc;
    } CXXOperatorName;
    struct {
      SourceLocation LQuoteLoc;
      SourceLocation RQuoteLoc;
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation NameLoc;
    } CXXLiteralOperatorName;
    struct {
      // ??
    } CXXUsingDirective;
    struct {
      // ??
    } ObjCZeroArgSelector;
    struct {
      // ??
    } ObjCOneArgSelector;
    struct {
      // ??
    } ObjCMultiArgSelector;
  };
};

The kind of info contained is identified using the available
DeclarationName.

For the kinds examined we never need to add indirection to extra memory
and the space needed is at most 8 bytes.

I believe that DeclarationNameLoc should be added to:

FunctionDecl
DeclRefExpr
DependentScopeDeclRefExpr
CXXDependentScopeMemberExpr
MemberExpr
UnresolvedMemberExpr

(also it seems there is some intersection between DeclarationNameLoc and
PseudoDestructorExpr)

We might add to these classes a method like:

DeclarationName CLASS::getDeclarationNameLocs(SourceLocation& MainLoc,
DeclarationNameLoc& Locs);

to query for complete info.

I'm missing something?

Suggestions for improvement?

I'm not sure keeping the string's source location is important. It's required to be an empty string literal, so the only errors that can occur there are parse errors, which are caught before the DeclarationName is made. After that, the contents of the string are irrelevant, and the range is subsumed in the range of the operator keyword to the end of the identifier.

Sean

Thanks.

I've also done another mistake here, it should be:

    struct {
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation OperatorLoc;
      // Probably useless (see comment from Sean)
      // SourceLocation LQuoteLoc;
      // SourceLocation RQuoteLoc;
      SourceLocation NameLoc;
    } CXXLiteralOperatorName;

Your comment also suggest me that this also have to be changed so to
have proper source range:

    struct {
      // Already in NamedDecl, *MemberExpr or *DeclRefExpr
      // SourceLocation OperatorLoc;
      SourceLocation OpLoc;
      // Not so useful
      // SourceLocation LSquareLoc;
      SourceLocation RSquareLoc;
    } CXXOperatorName;

Something like the latter, since we absolutely must make sure that the common case of (IdentifierInfo*, SourceLocation) doesn't require an extra memory allocation. I would probably end up creating something similar to DeclarationName itself, with a bit-mangled pointer for the name and storage for the first two locations.

  struct DeclarationNameLoc {
    void *Ptr;
    SourceLocations Locs[2];
  };

Following your guideline I've done a little case study (I've excluded
some cases I don't know):

struct DeclarationNameLoc {
union {
   struct {
     // Already in NamedDecl, *MemberExpr or *DeclRefExpr
     // SourceLocation NameLoc;
   } Identifier;

Okay.

   struct {
     TypeSourceInfo* TInfo;
   } CXXConstructorName;
   struct {
     // Already in NamedDecl, *MemberExpr or *DeclRefExpr
     // SourceLocation TildeLoc;
     TypeSourceInfo* TInfo;
   } CXXDestructorName;
   struct {
     // Already in NamedDecl, *MemberExpr or *DeclRefExpr
     // SourceLocation OperatorLoc;
     TypeSourceInfo* TInfo;
   } CXXConversionFunctionName;

Looks good.

   struct {
     // Already in NamedDecl, *MemberExpr or *DeclRefExpr
     // SourceLocation OperatorLoc;
     SourceLocation OpLoc;
     SourceLocation LSquareLoc;
     // Not so useful
     // SourceLocation RSquareLoc;
   } CXXOperatorName;

Since we'll have space for two locations here, I suggest having the location of the first token in the operator (e.g., "delete" in "operator delete [ ]") and the last token in the operator (e.g., "]" in "operator delete [ ]"). That way, we can get the range of the tokens, and from that form the full range of the name.

   struct {
     SourceLocation LQuoteLoc;
     SourceLocation RQuoteLoc;
     // Already in NamedDecl, *MemberExpr or *DeclRefExpr
     // SourceLocation NameLoc;
   } CXXLiteralOperatorName;

We can just store the location of the start of the string literal, then re-parse the literal if we need more detailed location information (which will be very, very rare).

   struct {
     // ??
   } CXXUsingDirective;

Don't worry about this one; it's a hack in DeclarationName that doesn't need any representation in DeclarationNameLoc.

   struct {
     // ??
   } ObjCZeroArgSelector;
   struct {
     // ??
   } ObjCOneArgSelector;
   struct {
     // ??
   } ObjCMultiArgSelector;
};

Unless you're interested in Objective-C, you don't have to worry about this one. Just use a single SourceLocation that will point to the first identifier in the select, and that'll be fine.

What I'd like to do here is to keep the source locations of the identifiers in the selector: for selectors with 0 or 1 arguments, we'll just have a single SourceLocation. For selectors with 2 arguments, we'll have two SourceLocations. For selectors with > 2 arguments, we'll have a pointer to an array SourceLocations containing the locations of all of the identifiers in the selector. This affects ObjCMethodDecls and ObjCMessageExprs, and possibly a few other places. If you're interesting in Objective-C and want to implement this, that would be great; if not, I'll do it once we have a basic DeclarationNameLoc in place.

};

The kind of info contained is identified using the available
DeclarationName.

Sure.

For the kinds examined we never need to add indirection to extra memory
and the space needed is at most 8 bytes.

Right. I consider that acceptable for a (big!) improvement in source-location information. Only Objective-C selectors will need external storage (and that's not the common case).

I believe that DeclarationNameLoc should be added to:

FunctionDecl
DeclRefExpr
DependentScopeDeclRefExpr
CXXDependentScopeMemberExpr
MemberExpr
UnresolvedMemberExpr

Agreed. They don't all have to happen in the same commit :slight_smile:

(also it seems there is some intersection between DeclarationNameLoc and
PseudoDestructorExpr)

PseudoDestructorExpr is probably fine as-is.

We might add to these classes a method like:

DeclarationName CLASS::getDeclarationNameLocs(SourceLocation& MainLoc,
DeclarationNameLoc& Locs);

to query for complete info.

It would be nice if the thing we call DeclarationNameLoc could bundle the DeclarationName and extra location information together, so that it could provide convenience functions like getSourceRange() (covering the entire name), getNameAsString(), and so on. I suggest that you rename the class that you have currently named DeclarationNameLoc to StoredDeclarationNameLoc (or something like it), then have the function

  DeclarationNameLoc CLASS::getDeclNameLoc();

that gets one of these easy-to-use DeclarationNameLoc classes, and is likely to be used in many places where the AST is manipulated with source-location information.

DeclarationNameLoc might also become the primary storage mechanism for the expressions, rather than having separate DeclarationName/SourceLocation/StoredDeclarationNameLoc members.

  - Doug