libclang: token annotation

Hi everyone!

First of all, thank you for libclang :wink:

I’m working on semantic highlight plugin for Code::Blocks IDE based on libclang.
So far, everything works quite well, but I’ve noticed some issues with
the functions clang_annotateTokens and (this is probably more specific) clang_getCursor.

I’m using svn version (rev. 157460) but the same is with 3.1.

From the documentation:
“clang_getCursor() maps an arbitrary source location within a translation unit down to the most specific cursor that describes the entity at that location.”

And the weird things begin. Consider the following (valid) C++ code:

---------------------------------8<----------------------------------------

#ifdef MY_MACRO
#define NUM 2
#else
#define NUM 5
#endif

template
class A
{
#ifdef MY_MACRO // (1)
int a; // (2)
#else
bool a;
#endif

public:
static bool array[NUM]; // (3)
operator T();
};

template // (4)
bool A::array[NUM]; // (5)

bool g(A a)
{
return a; // (6)
}

template // (4)
T f(A a)
{
return a;
}

---------------------------------8<----------------------------------------

Using clang_annotateTokens (clang_getCursor) or c-index-test -test-annotate-tokens (-cursor-at, resp.) everything but the following is OK:

(1) Inside any block (class, struct, namespace, … declaration, function body, etc.) preprocessor directives
are not annotated as such.
(2) If the macro MY_MACRO is not defined, all tokens in “int a;” are annotated in same way as the block they belong to;
shouldn’t they be annotated as, for example, CXXCursor_InactiveCode (new CXCursorKind value)?
(3) It’s weird but here clang_annotateTokens behaves different from c-index-test: the former annotates the ‘NUM’ token
as ‘VarDecl=array’ (wrong) and the latter as ‘macro expansion=NUM’, as expected…
(4) The whole line is annotated here with CXCursor_FirstInvalid (70) and this concerns all such ‘template’ lines.
(5) The whole line is marked as ‘VarDecl=array’ (IHMO ‘A’ should be ‘TemplateRef=A’ and ‘T’ should be ‘TypeRef=T’, ‘NUM’ behaves similarly as in (3)).
(6) ‘a’ is annotated as ‘CallExpr=operator _Bool’, shouldn’t be ‘DeclRefExpr=a’? It is weird, but in the function ‘f’ there is no such problem.

Is it a bug or do I miss something?

Cheers,
Michal Staromiejski

Hi everyone!

First of all, thank you for libclang :wink:

You're welcome :slight_smile:

I'm working on semantic highlight plugin for Code::Blocks IDE based on libclang.
So far, everything works quite well, but I've noticed some issues with
the functions clang_annotateTokens and (this is probably more specific) clang_getCursor.

I'm using svn version (rev. 157460) but the same is with 3.1.

From the documentation:
"clang_getCursor() maps an arbitrary source location within a translation unit down to the most specific cursor that describes the entity at that location."

As a sidenote, in case it isn't obvious, if the location cannot be resolved it resolves to the enclosing entity, e.g. if you point at a comment inside a namespace it will give you the namespace.

And the weird things begin. Consider the following (valid) C++ code:

---------------------------------8<----------------------------------------
#ifdef _MY_MACRO_
#define NUM 2
#else
#define NUM 5
#endif

template <class T>
class A
{
#ifdef _MY_MACRO_ // (1)
    int a; // (2)
#else
    bool a;
#endif

public:
    static bool array[NUM]; // (3)
    operator T();
};

template <class T> // (4)
bool A<T>::array[NUM]; // (5)

bool g(A<bool> a)
{
    return a; // (6)
}

template <class T> // (4)
T f(A<T> a)
{
    return a;
}
---------------------------------8<----------------------------------------

Using clang_annotateTokens (clang_getCursor) or c-index-test -test-annotate-tokens (-cursor-at, resp.) everything but the following is OK:

(1) Inside any block (class, struct, namespace, ... declaration, function body, etc.) preprocessor directives
are not annotated as such.

Please file a bug for this.

(2) If the macro _MY_MACRO_ is not defined, all tokens in "int a;" are annotated in same way as the block they belong to;
shouldn't they be annotated as, for example, CXXCursor_InactiveCode (new CXCursorKind value)?

This is a known limitation, we don't identify inactive source due to preprocessor directives so it resolves to the enclosing scope; it'd be good to do as you said, please file a bug to track it.

(3) It's weird but here clang_annotateTokens behaves different from c-index-test: the former annotates the 'NUM' token
as 'VarDecl=array' (wrong) and the latter as 'macro expansion=NUM', as expected...

File a bug with a test case that uses clang_annotateTokens and reproduces this.

(4) The whole line is annotated here with CXCursor_FirstInvalid (70) and this concerns all such 'template' lines.

We should probably identify this as part of the 'array' declaration, and have 'T' be a TemplateTypeParameter, file a bug.

(5) The whole line is marked as 'VarDecl=array' (IHMO 'A' should be 'TemplateRef=A' and 'T' should be 'TypeRef=T', 'NUM' behaves similarly as in (3)).

I believe you are right, file bug.

(6) 'a' is annotated as 'CallExpr=operator _Bool', shouldn't be 'DeclRefExpr=a'? It is weird, but in the function 'f' there is no such problem.

You are right, and, you guessed it, please file a bug.

Is it a bug or do I miss something?

It's better that each separate issue is tracked in different bugs, the work involved varies greatly.

Thanks for the feedback!

Thanks for your quick reply!

As for (3), I’ve realized that with CXTranslationUnit_DetailedPreprocessingRecord option macros are annotated as expected (this option is used by c-index-test, which explains different behavior).

For other things I’ll file bugs as soon as possible.

Michal Staromiejski

Bugs filed, links for reference:

http://llvm.org/bugs/show_bug.cgi?id=12970

http://llvm.org/bugs/show_bug.cgi?id=12971

http://llvm.org/bugs/show_bug.cgi?id=12972

http://llvm.org/bugs/show_bug.cgi?id=12975

http://llvm.org/bugs/show_bug.cgi?id=12976

http://llvm.org/bugs/show_bug.cgi?id=12977

Regards
Michal Staromiejski

Cloned to radars, thanks!