libclang: Resolving dependent names

Hi,

I am using libclang API to implement semantic syntax highlighting. However, there are a few rough edges which should be polished in order to get satisfying results. Resolving dependent names (‘CXType_DEPENDENT’ AST nodes) is one of them that I’ve came across. libclang in this case does not provide the same depth of information about the node(s) as it does for a non-dependent-name node(s).

In particular, following example depicts what I am trying to explain (please mind the comments in the code):

// demo.cpp

#include

template
bool foo() {
std::vector vec;
vec._M_impl; // ‘vec’ is identified as part of ‘CXXDependentScopeMemberExpr’ but access to member field ‘_M_impl’ is not and it’s completely missing from the AST
return vec.empty(); // ‘vec’ is identified as part of ‘CXXDependentScopeMemberExpr’ but call expression ‘empty’ is not and it’s completely missing from the AST
}

bool bar() {
std::vector vec;
vec._M_impl; // both ‘vec’ & access to member field ‘_M_impl’ are identified as expected
return vec.empty(); // both ‘vec’ & ‘empty’ are identified as expected
}

// output from clang -Xclang -ast-dump demo.cpp

-FunctionTemplateDecl 0x564cf4e63c48 <demo.cpp:3:1, line:8:1> line:4:6 foo

-TemplateTypeParmDecl 0x564cf4e63af8 <line:3:11, col:20> col:20 referenced typename T
-FunctionDecl 0x564cf4e63ba0 <line:4:1, line:8:1> line:4:6 foo '_Bool (void)' -CompoundStmt 0x564cf4e64190 <col:12, line:8:1>
-DeclStmt 0x564cf4e64038 <line:5:2, col:20>
-VarDecl 0x564cf4e63fd8 <col:2, col:17> col:17 referenced vec 'std::vector<T>':'vector<T>' -CXXDependentScopeMemberExpr 0x564cf4e64078 <line:6:2, col:6> '<dependent type>' lvalue -DeclRefExpr 0x564cf4e64050 col:2 ‘std::vector’:‘vector’ lvalue Var 0x564cf4e63fd8 ‘vec’ ‘std::vector’:‘vector’
-ReturnStmt 0x564cf4e64178 <line:7:2, col:19> -CallExpr 0x564cf4e64150 <col:9, col:19> ‘’
-CXXDependentScopeMemberExpr 0x564cf4e640f8 <col:9, col:13> '<dependent type>' lvalue -DeclRefExpr 0x564cf4e640d0 col:9 ‘std::vector’:‘vector’ lvalue Var 0x564cf4e63fd8 ‘vec’ ‘std::vector’:‘vector’
-FunctionDecl 0x564cf4e641e0 <line:10:1, line:14:1> line:10:6 bar '_Bool (void)' -CompoundStmt 0x564cf4e76dd8 <col:12, line:14:1>
-DeclStmt 0x564cf4e76898 <line:11:2, col:22>
-VarDecl 0x564cf4e646c8 <col:2, col:19> col:19 used vec 'std::vector<int>':'class std::vector<int, class std::allocator<int> >' callinit -CXXConstructExpr 0x564cf4e76868 col:19 ‘std::vector’:‘class std::vector<int, class std::allocator >’ ‘void (void)’
-MemberExpr 0x564cf4e768f8 <line:12:2, col:6> ‘struct std::_Vector_base<int, class std::allocator >::_Vector_impl’ lvalue ._M_impl 0x564cf4e6bb78
-ImplicitCastExpr 0x564cf4e768d8 <col:2> 'struct std::_Vector_base<int, class std::allocator<int> >' lvalue <UncheckedDerivedToBase (_Vector_base)> -DeclRefExpr 0x564cf4e768b0 col:2 ‘std::vector’:‘class std::vector<int, class std::allocator >’ lvalue Var 0x564cf4e646c8 ‘vec’ ‘std::vector’:‘class std::vector<int, class std::allocator >’
-ReturnStmt 0x564cf4e76dc0 <line:13:2, col:19> -CXXMemberCallExpr 0x564cf4e76d50 <col:9, col:19> ‘_Bool’
-MemberExpr 0x564cf4e76d18 <col:9, col:13> '<bound member function type>' .empty 0x564cf4e6fc20 -ImplicitCastExpr 0x564cf4e76da8 col:9 ‘const class std::vector<int, class std::allocator >’ lvalue
`-DeclRefExpr 0x564cf4e76cf0 col:9 ‘std::vector’:‘class std::vector<int, class std::allocator >’ lvalue Var 0x564cf4e646c8 ‘vec’ ‘std::vector’:‘class std::vector<int, class std::allocator >’

Is this really a technical issue from language POV or an implementation detail that is still missing? 14.6.2 [temp.dep] defines dependent-names as constructs whose semantics may differ from one instantiation to another. However, I am not quite sure if I understand this correctly because semantics of something being a data member or a function member cannot be really changed across different instantiations?

I managed somehow to workaround this issue by tokenizing such (dependent) nodes and then trying to deduce their kinds by checking up their parent from the AST. I.e. ‘MEMBER_REF_EXPR’ dependent-name nodes (‘CXType_DEPENDENT’) which have ‘CALL_EXPR’ as their direct AST parent can be resolved as ‘CXCursor_CXXMethod’. Otherwise, they would have been resolved as ‘CXCursor_FieldDecl’. This satisfies the use-case which I am currently trying to cover and it seems to be working but I thought it would be better to have this functionality provided by the library, and probably a more generic solution which would fit other use-cases as well (i.e. non-‘MEMBER_REF_EXPR’ nodes). If this is possible I would be happy to contribute.

Cheers,
Adi

Ping :slight_smile:

Hi,

I’m not an expert in clang internals but I can understand why the information it has at this point is much more limited. Consider the following example

–example.cc

template

struct A

{
void foo(int);

};

template
struct A<T*>
{
struct B { void operator()(int); };
B foo;
};

template
void foo()
{
A a;
a.foo(3);
}

template void foo(); // first case
template void foo<int*>(); // second case

–end of example.cc

That “a.foo(3)” will either be “a.foo(3)” in the first case and “a.foo.operator()(3)” in the second case. In the first case “foo” is a member function and in the second one is a data member.

Hope this helps.

Kind regards,

Roger

Hi Roger,

Thanks for your effort. Your example perfectly made clear the quote from the standard.

I do realize that information for dependent names is limited due to the two phase lookup which needs to take place in order to get a full picture. But I was puzzled by the fact that we can (partially) extract more information by inspecting the AST. I.e. if we stumble upon a dependent-name node which happens to be a child of call-expression node then it certainly encodes an information that dependent-name node comprises of a function call (or a functor which boils down to the operator() call).

So, it looks like that some additional information without going to the second phase lookup can be made available. It may be a good candidate for libclang interface extension.

Cheers,
Adi