libclang and MemberRefExpr

Hello,

When using the visitor functions in libclang, I noticed that for a member-access I get a MemberRefExpr containing a DeclRefExpr. The CXSourceRange for the DeclRefExpr covers the base expression and the CXSourceRange for the MemberRefExpr covers the whole expression. So the question is: how can I get the CXSourceRange covering the id-expression for the member itself? I.e., for “x.y”, how can I get the range for “y”?

Thanks,
Erik.

clang_getCursorLocation() will retrieve the location of "y" in the member expression "x.y". So long as the name is a simple identifier, you can just turn that into a range with clang_getRange().

However, if the name isn't a simple identifier ("x.operator"), then you can still get the location of the start of the member name ("operator"), but when you ask for the range, you'll only get the range of that one token.

Back when we designed libclang, Clang didn't even have the information about where the three tokens of "operator" were. Now, we actually have this information via DeclarationNameInfo, so it would make sense to add an API for specifically what you want. Here is a general API that (I think!) could fully solve this problem:

  CXSourceRange clang_getCursorReferenceNameRange(CXCursor C, unsigned NameFlags, unsigned PieceIndex);

where C is a cursor that references something else (e.g., a member reference, declaration reference, type reference, etc.), and returns the source range covering the reference itself. The two "unsigned" values would be for configurability:

  - NameFlags could be bitset with three independent flags: WantQualifier (to ask it to include the nested-name-specifier, e.g., Foo:: in x.Foo::y, in the range), WantTemplateArgs (to ask it to include the explicit template arguments, e.g., <int> in x.f<int>, in the range), and WantSinglePiece (described below).

  - WantPiece/PieceIndex is my attempt at handling cases where the name itself isn't contiguous. For example, imagine the expression "a[y]", which ends up referring to an overloaded operator. The source range for the full operator name is, effectively, "[y]", since the name has been split into two parts. However, that's not necessarily useful, so WantPiece would indicate that we want a range covering only one piece of the name, where PieceIndex==0 indicates that we want the '[' and PieceIndex==1 indicates that we want the ']'.
  
  My real motivation for WantPiece/PieceIndex is Objective-C, where we have identifiers that are split, e.g., [foo aMethod:bar withWibble:wibble]. The method name here is aMethod:withWibble:, and a single source range for that method name just doesn't work.

  - Doug

Hello,

When using the visitor functions in libclang, I noticed that for a member-access I get a MemberRefExpr containing a DeclRefExpr. The CXSourceRange for the DeclRefExpr covers the base expression and the CXSourceRange for the MemberRefExpr covers the whole expression. So the question is: how can I get the CXSourceRange covering the id-expression for the member itself? I.e., for “x.y”, how can I get the range for “y”?

clang_getCursorLocation() will retrieve the location of “y” in the member expression “x.y”. So long as the name is a simple identifier, you can just turn that into a range with clang_getRange().

Thanks. I worked around it by getting the tokens for the translation unit, then searching for the identifier, and checking if the cursor for that token was part of the member expression.

However, if the name isn’t a simple identifier (“x.operator”), then you can still get the location of the start of the member name (“operator”), but when you ask for the range, you’ll only get the range of that one token.

Back when we designed libclang, Clang didn’t even have the information about where the three tokens of “operator” were. Now, we actually have this information via DeclarationNameInfo, so it would make sense to add an API for specifically what you want. Here is a general API that (I think!) could fully solve this problem:

CXSourceRange clang_getCursorReferenceNameRange(CXCursor C, unsigned NameFlags, unsigned PieceIndex);

where C is a cursor that references something else (e.g., a member reference, declaration reference, type reference, etc.), and returns the source range covering the reference itself. The two “unsigned” values would be for configurability:

  • NameFlags could be bitset with three independent flags: WantQualifier (to ask it to include the nested-name-specifier, e.g., Foo:: in x.Foo::y, in the range), WantTemplateArgs (to ask it to include the explicit template arguments, e.g., in x.f, in the range), and WantSinglePiece (described below).

  • WantPiece/PieceIndex is my attempt at handling cases where the name itself isn’t contiguous. For example, imagine the expression “a[y]”, which ends up referring to an overloaded operator. The source range for the full operator name is, effectively, “[y]”, since the name has been split into two parts. However, that’s not necessarily useful, so WantPiece would indicate that we want a range covering only one piece of the name, where PieceIndex==0 indicates that we want the ‘[’ and PieceIndex==1 indicates that we want the ‘]’.

So for “operator” you would 3 pieces? Or just 1?
Also, how would you know that there are no more pieces left? By returning a null-range?

My real motivation for WantPiece/PieceIndex is Objective-C, where we have identifiers that are split, e.g., [foo aMethod:bar withWibble:wibble]. The method name here is aMethod:withWibble:, and a single source range for that method name just doesn’t work.

How many ranges would be returned for a call to “aMethod:withWibble:”? Or to rephrase it different, would the first range include the first colon, or would the colon be a separate range?

Anyway, sounds like a reasonable interface, so I can give it a go if nobody else is working on it.

– Erik.

> Hello,
>
> When using the visitor functions in libclang, I noticed that for a member-access I get a MemberRefExpr containing a DeclRefExpr. The CXSourceRange for the DeclRefExpr covers the base expression and the CXSourceRange for the MemberRefExpr covers the whole expression. So the question is: how can I get the CXSourceRange covering the id-expression for the member itself? I.e., for "x.y", how can I get the range for "y"?

clang_getCursorLocation() will retrieve the location of "y" in the member expression "x.y". So long as the name is a simple identifier, you can just turn that into a range with clang_getRange().

Thanks. I worked around it by getting the tokens for the translation unit, then searching for the identifier, and checking if the cursor for that token was part of the member expression.

However, if the name isn't a simple identifier ("x.operator"), then you can still get the location of the start of the member name ("operator"), but when you ask for the range, you'll only get the range of that one token.

Back when we designed libclang, Clang didn't even have the information about where the three tokens of "operator" were. Now, we actually have this information via DeclarationNameInfo, so it would make sense to add an API for specifically what you want. Here is a general API that (I think!) could fully solve this problem:

CXSourceRange clang_getCursorReferenceNameRange(CXCursor C, unsigned NameFlags, unsigned PieceIndex);

where C is a cursor that references something else (e.g., a member reference, declaration reference, type reference, etc.), and returns the source range covering the reference itself. The two "unsigned" values would be for configurability:

- NameFlags could be bitset with three independent flags: WantQualifier (to ask it to include the nested-name-specifier, e.g., Foo:: in x.Foo::y, in the range), WantTemplateArgs (to ask it to include the explicit template arguments, e.g., <int> in x.f<int>, in the range), and WantSinglePiece (described below).

- WantPiece/PieceIndex is my attempt at handling cases where the name itself isn't contiguous. For example, imagine the expression "a[y]", which ends up referring to an overloaded operator. The source range for the full operator name is, effectively, "[y]", since the name has been split into two parts. However, that's not necessarily useful, so WantPiece would indicate that we want a range covering only one piece of the name, where PieceIndex==0 indicates that we want the '[' and PieceIndex==1 indicates that we want the ']'.

So for "operator" you would 3 pieces? Or just 1?

Probably just one.

Also, how would you know that there are no more pieces left? By returning a null-range?

Yes.

My real motivation for WantPiece/PieceIndex is Objective-C, where we have identifiers that are split, e.g., [foo aMethod:bar withWibble:wibble]. The method name here is aMethod:withWibble:, and a single source range for that method name just doesn't work.

How many ranges would be returned for a call to "aMethod:withWibble:"? Or to rephrase it different, would the first range include the first colon, or would the colon be a separate range?

Two ranges, IMO: one covering "aMethod:" and the other covering "withWibble:".

Anyway, sounds like a reasonable interface, so I can give it a go if nobody else is working on it.

Nobody else is working on it. If you're interested in tackling it, that would be wonderful!

  - Doug

After a bit of thought and some explorative programming, I’m not sure if this API would be covering all cases. It would work for C/ObjC, but C++ is (as usual) a bit more tricky, especially with conversion operators. For example:

struct Something {
operator const std::string &();
};

void foo() {
Something s;
s.operator const std::string &(); // ← this line
}

In the call to the conversion operator, the visitor will only give a MemberRefExpr as node, and no more detail on the right-hand side of the expression. But what is in there, is a TypeRef. Actually, it can by any id-expression, so it might even be something like “some_template<some_type, 1 + 2 + 3>”. (And I do not know if c++0x is going to add some more cases…) So I am wondering if this function might be too limited to be really useful, or if it would still make sense for the other cases.

– Erik.

I think it handles this case properly. Here, the function would give the full range of “operator const std::string &” as the reference name range. One can iterate into the children of the MemberExpr to see the references to “std” and “string” inside there.

  • Doug

I’m giving it a shot, but I’m running into a couple of problems. For ObjC,ParseObjCMethodDecl does not store the identifier positions, nor the colons. Should I extend the Selector there, or is there another way to get the SourceLocations for those tokens?

Then for C++, consider:

struct Struct {
void func();
int operator;
};

void f()
{
Struct inst;
inst.func(); // 1
inst[1]; // 2
inst.operator; // 3
}

Now (1) and (3) are doable, but (2) is a bit tricky: the SourceLocation for the two pieces of the DeclRefExpr are stored in the CXXOperatorCallExpr node as the lparen/rparen location in the CallExpr parent-class. Should these two locations also be stored in the DeclRefExpr, like in case (3)? Or should we leave them where they are and handle case (2) in the same way as case (1)?

Regards,
Erik.

However, if the name isn’t a simple identifier (“x.operator”), then you can still get the location of the start of the member name (“operator”), but when you ask for the range, you’ll only get the range of that one token.

Back when we designed libclang, Clang didn’t even have the information about where the three tokens of “operator” were. Now, we actually have this information via DeclarationNameInfo, so it would make sense to add an API for specifically what you want. Here is a general API that (I think!) could fully solve this problem:

CXSourceRange clang_getCursorReferenceNameRange(CXCursor C, unsigned NameFlags, unsigned PieceIndex);

where C is a cursor that references something else (e.g., a member reference, declaration reference, type reference, etc.), and returns the source range covering the reference itself. The two “unsigned” values would be for configurability:

  • NameFlags could be bitset with three independent flags: WantQualifier (to ask it to include the nested-name-specifier, e.g., Foo:: in x.Foo::y, in the range), WantTemplateArgs (to ask it to include the explicit template arguments, e.g., in x.f, in the range), and WantSinglePiece (described below).
  • WantPiece/PieceIndex is my attempt at handling cases where the name itself isn’t contiguous. For example, imagine the expression “a[y]”, which ends up referring to an overloaded operator. The source range for the full operator name is, effectively, “[y]”, since the name has been split into two parts. However, that’s not necessarily useful, so WantPiece would indicate that we want a range covering only one piece of the name, where PieceIndex==0 indicates that we want the ‘[’ and PieceIndex==1 indicates that we want the ‘]’.

So for “operator” you would 3 pieces? Or just 1?

Probably just one.

I’m giving it a shot, but I’m running into a couple of problems. For ObjC,ParseObjCMethodDecl does not store the identifier positions, nor the colons. Should I extend the Selector there, or is there another way to get the SourceLocations for those tokens?

Right, this data isn’t available yet. We’d need to pass through a separate set of SourceLocations for the identifiers and colons.

I wasn’t actually expecting you to tackle selectors, but I think it’s great if you’re planning to work on those, too. I was mostly concerned that any solution to this problem can still work for selectors.

Then for C++, consider:

struct Struct {
void func();
int operator;
};

void f()
{
Struct inst;
inst.func(); // 1
inst[1]; // 2
inst.operator; // 3
}

Now (1) and (3) are doable, but (2) is a bit tricky: the SourceLocation for the two pieces of the DeclRefExpr are stored in the CXXOperatorCallExpr node as the lparen/rparen location in the CallExpr parent-class. Should these two locations also be stored in the DeclRefExpr, like in case (3)? Or should we leave them where they are and handle case (2) in the same way as case (1)?

I think that case (2) should be treated the same way as case (1). The fact that these are stored as CXXOperatorCallExprs indicates that the MemberRefExpr is synthesized.

  • Doug

However, if the name isn’t a simple identifier (“x.operator”), then you can still get the location of the start of the member name (“operator”), but when you ask for the range, you’ll only get the range of that one token.

Back when we designed libclang, Clang didn’t even have the information about where the three tokens of “operator” were. Now, we actually have this information via DeclarationNameInfo, so it would make sense to add an API for specifically what you want. Here is a general API that (I think!) could fully solve this problem:

CXSourceRange clang_getCursorReferenceNameRange(CXCursor C, unsigned NameFlags, unsigned PieceIndex);

where C is a cursor that references something else (e.g., a member reference, declaration reference, type reference, etc.), and returns the source range covering the reference itself. The two “unsigned” values would be for configurability:

  • NameFlags could be bitset with three independent flags: WantQualifier (to ask it to include the nested-name-specifier, e.g., Foo:: in x.Foo::y, in the range), WantTemplateArgs (to ask it to include the explicit template arguments, e.g., in xf, in the range), and WantSinglePiece (described below).
  • WantPiece/PieceIndex is my attempt at handling cases where the name itself isn’t contiguous. For example, imagine the expression “a[y]”, which ends up referring to an overloaded operator. The source range for the full operator name is, effectively, “[y]”, since the name has been split into two parts. However, that’s not necessarily useful, so WantPiece would indicate that we want a range covering only one piece of the name, where PieceIndex==0 indicates that we want the ‘[’ and PieceIndex==1 indicates that we want the ‘]’.

So for “operator” you would 3 pieces? Or just 1?

Probably just one.

I’m giving it a shot, but I’m running into a couple of problems. For ObjC,ParseObjCMethodDecl does not store the identifier positions, nor the colons. Should I extend the Selector there, or is there another way to get the SourceLocations for those tokens?

Right, this data isn’t available yet. We’d need to pass through a separate set of SourceLocations for the identifiers and colons.

I wasn’t actually expecting you to tackle selectors, but I think it’s great if you’re planning to work on those, too. I was mostly concerned that any solution to this problem can still work for selectors.

Ok, I will do that in a second iteration.

Then for C++, consider:

struct Struct {
void func();
int operator;
};

void f()
{
Struct inst;
inst.func(); // 1
inst[1]; // 2
inst.operator; // 3
}

Now (1) and (3) are doable, but (2) is a bit tricky: the SourceLocation for the two pieces of the DeclRefExpr are stored in the CXXOperatorCallExpr node as the lparen/rparen location in the CallExpr parent-class. Should these two locations also be stored in the DeclRefExpr, like in case (3)? Or should we leave them where they are and handle case (2) in the same way as case (1)?

I think that case (2) should be treated the same way as case (1). The fact that these are stored as CXXOperatorCallExprs indicates that the MemberRefExpr is synthesized.

So to get the location of the brackets in case (2), you would need to call clang_getCursorReferenceNameRange on the CallExpr and then deduce that the MemberRefExpr is synthesised? Or should clang_getCursorReferenceNameRange check if it is called on a cursor pointing to a DeclRefExpr, walk up the AST nodes to find a CXXOperatorCallExpr, and then get the bracket locations from it? And if this last one is the proper behaviour, how can I get the parent node of an AST node?

Regards,
Erik.

Attached is a set of patches against svn trunk 133049.

0001: stores the location of the brackets for calls to array subscription expressions when the caller is a C++ operator. I tried to make the least intrusive patch as possible, but I am not sure if there are more cases where the DeclRefExpr should store a DeclarationNameLoc then only the overloaded array subscription. I send it to the list previously, but did not receive feedback yet.
0002: the patch to add clang_getCursorReferenceNameRange, including a change to c-index-test which will print “SingleRefName=…” when the flag WantSinglePiece is used, and “RefName=…” for the separate pieces.
0003: a test-case for clang_getCursorReferenceNameRange.
0004: fixes for the other test-cases.

– Erik.

0001-Store-bracket-locations-for-array-subscription-expre.patch.gz (1.08 KB)

0002-Added-clang_getCursorReferenceNameRange-to-retreive-.patch.gz (2.94 KB)

0003-Test-for-clang_getCursorReferenceNameRange.patch.gz (1.09 KB)

0004-Fixed-tests-after-introducing-clang_getCursorReferen.patch.gz (49.3 KB)

Attached is a set of patches against svn trunk 133049.

0001: stores the location of the brackets for calls to array subscription expressions when the caller is a C++ operator. I tried to make the least intrusive patch as possible, but I am not sure if there are more cases where the DeclRefExpr should store a DeclarationNameLoc then only the overloaded array subscription. I send it to the list previously, but did not receive feedback yet.

This looks pretty good, but please feed this information through template instantiation (in lib/Sema/TreeTransform.h, where we end up eventually calling CreateOverloadedArraySubscriptExpr).

0002: the patch to add clang_getCursorReferenceNameRange, including a change to c-index-test which will print "SingleRefName=.." when the flag WantSinglePiece is used, and "RefName=..." for the separate pieces.

Cool. A few comments here:

+enum CXCursor_RefNameFlags {
+ /**
+ * \brief Include the nested-name-specifier, e.g. Foo:: in x.Foo::y, in the
+ * range.
+ */
+ CXCursor_WantQualifier = 0x1,