can I add TokenGroup.get_rtokens()

Hi there,

I’ve only just starting using clang/ as a parser and I want to be able to get doxygen style comments associated with #defines in C source code. The cursor.raw_comment doesn’t seem to be populated for preprocessor defines so in order to fetch doxygen style comments I’ve done this:

for cursor in cursor.get_children()
if(cursor.kind == clang.cindex.CursorKind.MACRO_DEFINITION):

Get position of the preprocessor define

start = cursor.extent.start.offset
end = cursor.extent.end.offset+1

Get the extent from the source file. Not sure why

I need start-2 instead of start.

extent = tu.get_extent(path_of_source, (0, start-2)

Get the list of tokens before the pre-processor define

tokens = clang.cindex.TokenGroup.get_tokens(tu, extent)

Reverse the generator so that we can walk backwards through

the token list to extract comments before the preprocessor

definition. This is painfully slow.

tokens = reversed(list(tokens))

comment = None

for t in tokens:
if(t.spelling in (‘#’, ‘define’)):
elif(t.kind == clang.cindex.TokenKind.COMMENT):
comment = t.spelling

if(comment != None and comment.startswith(‘/**’)):

process comment for this preprocessor statement

I’d like to get the token list in reverse order so that I can walk backwards looking for comments associated with the #define. That has a lot of overhead in Python if the token list is large so I’ve added a method called get_rtokens() to the TokenGroup class which returns the tokens in the reverse order:

class TokenGroup(object):


def get_rtokens(tu, extent):

“”"Helper method to return all tokens in an extent in reverse order

to avoid the expense of having to convert the returned generator

to a list and then calling reverse() on it.


tokens_memory = POINTER(Token)()

tokens_count = c_uint()

conf.lib.clang_tokenize(tu, extent, byref(tokens_memory),


count = int(tokens_count.value)

If we get no tokens, no memory was allocated. Be sure not to return

anything and potentially call a destructor on nothing.

if count < 1:


tokens_array = cast(tokens_memory, POINTER(Token * count)).contents

token_group = TokenGroup(tu, tokens_memory, tokens_count)

for i in xrange(count-1, 0, -1):

token = Token()

token.int_data = tokens_array[i].int_data

token.ptr_data = tokens_array[i].ptr_data

token._tu = tu

token._group = token_group

yield token

I’m wondering if anyone has accomplished this in a better way or whether the the get_rtokens() method would be useful to other people. What I really want to do is relate a cursor to a token and then navigate forward and backwards from that token but I can’t see how to do that.


Brad Elliott