Regarding clang-c and comments...

Hi there,

I’m writing a very simple c parser (dumps formatted symbols, something similar to ctags but actually working) using clang-c, which is working well, but I’m hoping to add some support for source comments in the code.

At the moment I’m using the clang-c subset for this, but there doesn’t seem to be any api for accessing comments; specifically CXCursorKind doesn’t seem to have a type for ‘comment’, and I’ve ended up doing something similar to:

for each cursor
get cursor info
save info (specifically, offset, filename, display name)

for each saved symbol:
find the closest previous symbol
walk backwards through the file from position of symbol → position of previous symbol looking for closest comment

This is quite annoying to do, and has a few problems (example: symbols with no filename, unsaved files, large projects means loading every symbol into memory).

Is there a to get comments as elements through to the visitor function?

If no in clang-c, then perhaps in the full clang api?

Finally, as a slightly better alternative, is there a way to query a cursor to get the previous lexical cursor? At the very least this would save me having to save the entire set of data in memory first…

Help much appreciated!

cheers,
Doug.

cough Alternatively, is there a different list for questions like this?

*cough* Alternatively, is there a different list for questions like this?

No, this is the right list. I'm pretty sure there are people out
there doing things similar to this, so I was expecting that someone
else would respond.

clang generally does not preserve comments in the AST in any way, so
as far as I know your approach is the only approach. With the C++ API,
you can register a callback that will let you save the locations of
comments, which might help a bit, but isn't substantially different
overall.

-Eli

Hi there,

I'm writing a very simple c parser (dumps formatted symbols, something similar to ctags but actually working) using clang-c, which is working well, but I'm hoping to add some support for source comments in the code.

At the moment I'm using the clang-c subset for this, but there doesn't seem to be any api for accessing comments; specifically CXCursorKind doesn't seem to have a type for 'comment', and I've ended up doing something similar to:

for each cursor
  get cursor info
  save info (specifically, offset, filename, display name)

for each saved symbol:
  find the closest previous symbol
  walk backwards through the file from position of symbol -> position of previous symbol looking for closest comment

This is quite annoying to do, and has a few problems (example: symbols with no filename, unsaved files, large projects means loading every symbol into memory).

Is there a to get comments as elements through to the visitor function?
If no in clang-c, then perhaps in the full clang api?

As Eli noted, Clang's ASTs don't keep track of comments at all. I took a stab at this a few years ago, here

  http://llvm.org/viewvc/llvm-project?view=rev&revision=74704

but then reverted all of the code because it never actually got used.

For Clang's C API, there's another (fairly simple) approach: one could extend the PreprocessingRecord to include comments as a new kind of "PreprocessedEntity", and then provide cursors for those comment entities. It would then be fairly easy to provide an API function in the C API to retrieve the comment text.

Finally, as a slightly better alternative, is there a way to query a cursor to get the previous lexical cursor? At the very least this would save me having to save the entire set of data in memory first..

No, there is no way to get the previous lexical cursor. This information is not stored in the AST (we don't want to pay a pointer per node for something that's rarely needed).

  - Doug

Right, well thanks for the help.

Looks like for the time being it’s going to be a case of handling it manually.

Once the whole project is working I may come back and dig this thread backup again for a bit of help with the PreprocessedEntity as you’ve describe.

Personally I think it’d be very valuable to support comments as some kind of entity in the AST; its pretty much a fundamental requirement for building an visual studio style editor with contextual tooltips (something I notice the eclipse CDT is particularly poor at, probably for this reason).

Cheers,
Doug.

Right, well thanks for the help.

Looks like for the time being it’s going to be a case of handling it manually.

Once the whole project is working I may come back and dig this thread backup again for a bit of help with the PreprocessedEntity as you’ve describe.

Okay!

Personally I think it’d be very valuable to support comments as some kind of entity in the AST; its pretty much a fundamental requirement for building an visual studio style editor with contextual tooltips (something I notice the eclipse CDT is particularly poor at, probably for this reason).

Doing this requires more than just making comments available in the AST. We’d also want to parse them and associate them directly with AST nodes.