Using libClang for parsing Source and Comments

We have not found a tutorial or any documentation addressing our requirement to use libClang for parsing C++ source and full comments. We are currently using libClang 3.6 for testing however moving to 3.7 is possible although we would rather wait for 3.8.

At the present time we have code to walk the AST and obtain cursors and tokens for a sample C++ header. The following are the initial problems we have encountered.

1) Locate C++ standard headers

On Debian we did not have to pass any parameters to libClang for it to locate the C++ standard header files.

On Windows, I had to pass six include lines which referenced GCC std header files. This works but it will be cumbersome for our users. We talked with a few people at the last llvm meeting it was mentioned there might be some enhancements for detecting the system headers.

Do anyone have ideas or suggestion about this?

2) We need to retrieve all the comments and using "cursors" this has not worked. We can easily generate a list of the tokens and walking these appear to show the comments, however this feels very awkward and slow. We understand the basic idea of cursors and tokens. We are not sure about the pros and cons of each approach.

3) Is the -Wdocumentation flag passed to clang when building. Or is this flag supposed to be passed to libClang? Will this provide us with more of the user comments? It appeared to to have no effect.

Our thanks for any and all guidance,

Barbara Geller
Co-Founder of DoxyPress project

Hey Barbara,

see answers inline below from my experience with the Clang C-API working on
KDevelop.

We have not found a tutorial or any documentation addressing our
requirement to use libClang for parsing C++ source and full comments.
We are currently using libClang 3.6 for testing however moving to 3.7 is
possible although we would rather wait for 3.8.

At the present time we have code to walk the AST and obtain cursors and
tokens for a sample C++ header. The following are the initial problems
we have encountered.

1) Locate C++ standard headers

On Debian we did not have to pass any parameters to libClang for it to
locate the C++ standard header files.

On Windows, I had to pass six include lines which referenced GCC std
header files. This works but it will be cumbersome for our users. We
talked with a few people at the last llvm meeting it was mentioned
there might be some enhancements for detecting the system headers.

Do anyone have ideas or suggestion about this?

I think this is still unresolved, but would be glad to be told otherwise.

2) We need to retrieve all the comments and using "cursors" this has
not worked. We can easily generate a list of the tokens and walking
these appear to show the comments, however this feels very awkward and
slow. We understand the basic idea of cursors and tokens. We are not
sure about the pros and cons of each approach.

We also iterate over the tokens in KDevelop to find comments not associated
with a cursor, such as

///FIXME this is broken
/**
* doxygen comment
*/
void foo();

If you only need access to the comments of cursors, i.e. the second comment
above, you should be fine with clang_Cursor_getParsedComment and without
tokens.

3) Is the -Wdocumentation flag passed to clang when building. Or is
this flag supposed to be passed to libClang? Will this provide us with
more of the user comments? It appeared to to have no effect.

You need to pass that as part of the argv to clang_parseTranslationUnit2.

HTH

Milian,

Hey Barbara,

see answers inline below from my experience with the Clang C-API working on
KDevelop.

We have not found a tutorial or any documentation addressing our
requirement to use libClang for parsing C++ source and full comments.
We are currently using libClang 3.6 for testing however moving to 3.7 is
possible although we would rather wait for 3.8.

At the present time we have code to walk the AST and obtain cursors and
tokens for a sample C++ header. The following are the initial problems
we have encountered.

1) Locate C++ standard headers

On Debian we did not have to pass any parameters to libClang for it to
locate the C++ standard header files.

On Windows, I had to pass six include lines which referenced GCC std
header files. This works but it will be cumbersome for our users. We
talked with a few people at the last llvm meeting it was mentioned
there might be some enhancements for detecting the system headers.

Do anyone have ideas or suggestion about this?

I think this is still unresolved, but would be glad to be told otherwise.

I am hoping someone with knowledge in this area can shed some light on this problem.

2) We need to retrieve all the comments and using "cursors" this has
not worked. We can easily generate a list of the tokens and walking
these appear to show the comments, however this feels very awkward and
slow. We understand the basic idea of cursors and tokens. We are not
sure about the pros and cons of each approach.

We also iterate over the tokens in KDevelop to find comments not associated
with a cursor, such as

///FIXME this is broken
/**
  * doxygen comment
  */
void foo();

If you only need access to the comments of cursors, i.e. the second comment
above, you should be fine with clang_Cursor_getParsedComment and without
tokens.

It does seem like using the tokens is the appropriate way to handle parsing comments. Just wanted to make sure we were not missing something obvious.

3) Is the -Wdocumentation flag passed to clang when building. Or is
this flag supposed to be passed to libClang? Will this provide us with
more of the user comments? It appeared to to have no effect.

You need to pass that as part of the argv to clang_parseTranslationUnit2.

We are using clang_parseTranslationUnit2 and the flag did not have any effect. Is there a chance this was not supported in 3.6?

Thanks for your input.

Barbara

We have not found a tutorial or any documentation addressing our
requirement to use libClang for parsing C++ source and full comments. We
are currently using libClang 3.6 for testing however moving to 3.7 is
possible although we would rather wait for 3.8.

At the present time we have code to walk the AST and obtain cursors and
tokens for a sample C++ header. The following are the initial problems
we have encountered.

1) Locate C++ standard headers

On Debian we did not have to pass any parameters to libClang for it to
locate the C++ standard header files.

On Windows, I had to pass six include lines which referenced GCC std
header files. This works but it will be cumbersome for our users. We
talked with a few people at the last llvm meeting it was mentioned
there might be some enhancements for detecting the system headers.

Do anyone have ideas or suggestion about this?

If your tool can detect the headers you can always pass in additional flags to the clang_parseTranslationUnit2 function. You could also ship the headers with your tool to avoid the users having to install the developer tools for the platform. It's even possible to embed headers the executable itself. I'm doing that with some of the internal Clang headers, see [2][3].

2) We need to retrieve all the comments and using "cursors" this has
not worked. We can easily generate a list of the tokens and walking
these appear to show the comments, however this feels very awkward and
slow. We understand the basic idea of cursors and tokens. We are not
sure about the pros and cons of each approach.

There's a complete module for comment introspection [1]. I have not tried it myself yet but it looks quite extensive.

3) Is the -Wdocumentation flag passed to clang when building. Or is
this flag supposed to be passed to libClang? Will this provide us with
more of the user comments? It appeared to to have no effect.

If the comment introspection API does not provide the data you need you could try the CXTranslationUnit_DetailedPreprocessingRecord option for the clang_parseTranslationUnit2 function.

[1] http://clang.llvm.org/doxygen/group__CINDEX__COMMENT.html
[2] http://clang-developers.42468.n3.nabble.com/Builtin-headers-td4049705.html#message4049716
[3] http://clang-developers.42468.n3.nabble.com/Builtin-headers-td4049705.html#message4049772

+benjamin who has added a new function that should also help on windows (because it gets argv[0])