RFC: visiting cursors "backwards" ?

libclang allows me to traverse sibling cursors.
I have run into a use-case where I want to visit the previous sibling cursor. How may I do that ?

The use-case is this:

Imagine code such as:

   struct foo;
   // Documentation for 'bar'
   struct bar;

In Synopsis, I'd like to associate the comment with the following 'bar' declaration. Therefor, I'd like to tokenize the region between 'struct bar' and the previous cursor, so I can read in the comment and store it for later documentation extraction.

I'd like to do this non-invasively, i.e. without the need for libclang to be aware of even the concept of documentation. As long as it gives me access to "all comments preceding cursor 'X'", I can work things out.

Thus: Is there a way to locate the previous sibling cursor for any given cursor ? (This may be a null cursor, if the given cursor is the first child of its parent. That's fine, as in that case I would start from the parent's start location itself.)

Thanks,
         Stefan

libclang allows me to traverse sibling cursors.
I have run into a use-case where I want to visit the previous sibling
cursor. How may I do that ?

The use-case is this:

Imagine code such as:

  struct foo;
  // Documentation for 'bar'
  struct bar;

In Synopsis, I'd like to associate the comment with the following 'bar'
declaration. Therefor, I'd like to tokenize the region between 'struct
bar' and the previous cursor, so I can read in the comment and store it
for later documentation extraction.

I'd like to do this non-invasively, i.e. without the need for libclang
to be aware of even the concept of documentation. As long as it gives me
access to "all comments preceding cursor 'X'", I can work things out.

Personally, I'd rather see the "find the declaration to which this comment is attached" logic in the Clang core, exposed via libclang, rather than layered on top of Clang.

Thus: Is there a way to locate the previous sibling cursor for any given
cursor ? (This may be a null cursor, if the given cursor is the first
child of its parent. That's fine, as in that case I would start from the
parent's start location itself.)

There's no way to do this, aside from walking through all of the parent's children again and keeping track of the previous child. We've generally avoided this since it will add another pointer to the size of every declaration node, and we try to keep the AST small.

  - Doug

OK. Hmm, I could indeed record "previous" cursors as I'm traversing the AST, allowing me to access the comments at the same time as translating the clang AST into the Synopsis ASG. That would avoid the need for libclang to have to know anything about comment <-> declaration associations.

(The reason I'd very much prefer to keep this away from libclang is because these associations tend to be a matter of customization, and hard to support on a low level. Consider this:

// E docs
enum E {
   e0, //< e0 docs
   e1}; //< e1 docs

I think the meaning (and logical attachment) of the comments is rather obvious. Yet, this is tricky to implement: Sometimes the comments precede the declaration they are associated with, sometimes they follow them. Sometimes they are even outside the parent cursor's range.

To support that inside libclang, one would have to parse the comment, using specific rules and markup. I'd rather leave such markup choices to the user who embeds documentation into his code.

So, I think I'd prefer to start by recording 'previous' cursors in my own code.

Thanks,
         Stefan

Thus: Is there a way to locate the previous sibling cursor for any given
cursor ? (This may be a null cursor, if the given cursor is the first
child of its parent. That's fine, as in that case I would start from the
parent's start location itself.)

There's no way to do this, aside from walking through all of the parent's children again and keeping track of the previous child. We've generally avoided this since it will add another pointer to the size of every declaration node, and we try to keep the AST small.

OK. Hmm, I could indeed record "previous" cursors as I'm traversing the AST, allowing me to access the comments at the same time as translating the clang AST into the Synopsis ASG. That would avoid the need for libclang to have to know anything about comment <-> declaration associations.

Yes, that'd work.

(The reason I'd very much prefer to keep this away from libclang is because these associations tend to be a matter of customization, and hard to support on a low level. Consider this:

// E docs
enum E {
e0, //< e0 docs
e1}; //< e1 docs

I think the meaning (and logical attachment) of the comments is rather obvious. Yet, this is tricky to implement: Sometimes the comments precede the declaration they are associated with, sometimes they follow them. Sometimes they are even outside the parent cursor's range.

To support that inside libclang, one would have to parse the comment, using specific rules and markup. I'd rather leave such markup choices to the user who embeds documentation into his code.

Because it's tricky to implement, it's a good candidate for going straight into Clang so that nobody has to do this work again. But, it's up to you.

  - Doug

OK, let's review it once I've implemented it. Then we can see whether it makes sense to be lowered into libclang.

     Stefan

(snip)

OK. Hmm, I could indeed record "previous" cursors as I'm traversing the
AST, allowing me to access the comments at the same time as translating
the clang AST into the Synopsis ASG. That would avoid the need for
libclang to have to know anything about comment <-> declaration
associations.

(The reason I'd very much prefer to keep this away from libclang is
because these associations tend to be a matter of customization, and
hard to support on a low level. Consider this:

// E docs
enum E {
e0, //< e0 docs
e1}; //< e1 docs

I think the meaning (and logical attachment) of the comments is rather
obvious. Yet, this is tricky to implement: Sometimes the comments
precede the declaration they are associated with, sometimes they follow
them. Sometimes they are even outside the parent cursor's range.

I don't think the "who does this comment belong to" problem is
solvable in the general case:

enum E {
  e0,
  // e0 docs or e1 docs ?
  e1
};

The convention used by Doxygen is that /// comments and /** */
comments refer to the next declaration, whereas ///< and /**< */ refer
to the previous one. Of course, this is is no help in the general
case.

Csaba

I agree. This is precisely why I'd like to push the handling up in the stack, to be able to provide hooks for customization.
The way Synopsis solves this is to store all comments that are in the vicinity of declarations, and then let users pick a strategy for converting them into documentation. This involves

a) a filter that removes non-document comments (some use "///..." comments for docs, some use "/** ...*/", etc.
b) a markup (such as javadoc, or doxygen, or ReStructuredText)

These transformations are performed on the Synopsis ASG, in Python.

Thanks,
         Stefan

A first implementation of the approach seems to work reasonably well.

There are a some shortcomings, though. For example:

struct Foo
{
   // some docs
   int member;
};

When visiting the FieldDecl for 'member', the cursor's range starts with 'member'. To recognize that the comment in the preceding line belongs to it, I need a extend the cursor's range to the entire declaration (instead of just the single declarator).
Is that possible ? Right now, clang_getLexicalParent() brings me to the ClassDecl for 'Foo'.
Is there a reason declarations aren't represented as cursors ? Or is there another way to get information (such as the extent) about a declaration ?

Thanks,
         Stefan