Status/Future of libclang

Hello,

I'm working on a C++ documentation generator "standardese" (GitHub - standardese/standardese: A (work-in-progress) nextgen Doxygen for C++). I'm using libclang for parsing.

Unfortunately I've run into an issue with it: the API isn't providing me all the information that I need. There are things that aren't exposed by the API, for example whether a constructor is explicit, a function constexpr, or the noexcept expression.

I was forced to go to manual parsing of the source code of a cursor by tokenizing it with the help of Boost.Wave (because libclang doesn't give me the preprocessed tokens). This worked relatively good but I now ran into an issue with template specializations (parsing `template <> class c<a < b> {};), so my naive approach doesn't work anymore and I'd need to write a fully fledged parser.

I've also encountered a few things that might be bugs.
I can report those, but when looking at https://llvm.org/bugs/buglist.cgi?product=clang&component=libclang&resolution=---&list_id=100919, there are almost only unassigned, NEW bugs.

I would volunteer to maintain it myself but I have no experience with the internal APIs or the clang project and I'm lacking the time to learn them *and* work on standardese.

So my question: Is there someone actively maintaining libclang? What are the future plans for the API?
If there is someone I'll provide active feedback and feature requests.

Jonathan

Just go for the C++ API : libtooling

You will have all you need.

David.

So you're basically saying: the state of libclang is intentional, use something else?

I can understand that libclang doesn't expose the entire internal API.
But I think it should at least expose enough information to completely recreate equivalent source code for a cursor.

Jonathan

Just go for the C++ API : libtooling

You will have all you need.

You will not have compatibility between versions.

So you're basically saying: the state of libclang is intentional, use
something else?

Jonathan, I have the same questions as you. I also have not encountered an
authoritative answer.

You might encounter the same problem I did if you want to determine the
default values of method parameters:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/48738

Our solution is the same - try to parse the text ourselves because libclang
and the python bindings do not expose it.

Hopefully there is a more authoritative answer about the state of libclang.

Thanks,

Steve.

>> Just go for the C++ API : libtooling
>>
>> You will have all you need.

You will not have compatibility between versions.

> So you're basically saying: the state of libclang is intentional, use
> something else?

Jonathan, I have the same questions as you. I also have not encountered an
authoritative answer.

You might encounter the same problem I did if you want to determine the
default values of method parameters:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/48738

Our solution is the same - try to parse the text ourselves because libclang
and the python bindings do not expose it.

I've written small extensions to libclang and to the python bindings to get
a code generator I wrote working. If there is a small number of missing
features maybe you could do the same?

>> Just go for the C++ API : libtooling
>>
>> You will have all you need.

You will not have compatibility between versions.

> So you're basically saying: the state of libclang is intentional, use
> something else?

Jonathan, I have the same questions as you. I also have not encountered an
authoritative answer.

You might encounter the same problem I did if you want to determine the
default values of method parameters:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/48738

Our solution is the same - try to parse the text ourselves because
libclang
and the python bindings do not expose it.

I've written small extensions to libclang and to the python bindings to
get a code generator I wrote working. If there is a small number of missing
features maybe you could do the same?

When I was writing by own libclang python bindings (
GitHub - rhdunn/libclangpy: A Python 2 and 3 compatible binding to the libclang 3.4 and earlier API) I had to write custom logic on top of
the direct bindings to fix bugs in libclang (e.g. CursorKind_LinkageSpec is
not mapped to the API, so `extern "C" ...` does not work) and some support
for the newer APIs on older versions of libclang.

I also wrote my own tests for the binding to make sure the APIs worked
consistently on the target and later versions of libclang.

My verdict was that it was a second class citizen to the unstable C++ API.

Thanks,
Reece

Hopefully there is a more authoritative answer about the state of libclang.

I've written small extensions to libclang and to the python bindings to
get a code generator I wrote working. If there is a small number of
missing features maybe you could do the same?

Yes, I should start doing that.

>
> When I was writing by own libclang python bindings
> (GitHub - rhdunn/libclangpy: A Python 2 and 3 compatible binding to the libclang 3.4 and earlier API) I had to write custom logic on
> top of the direct bindings to fix bugs in libclang (e.g.
> CursorKind_LinkageSpec is not mapped to the API, so `extern "C" ...`
> does not work) and some support for the newer APIs on older versions of
> libclang.

Did that as well.

> I also wrote my own tests for the binding to make sure the APIs worked
> consistently on the target and later versions of libclang.

Any documentation on how to do that?

> My verdict was that it was a second class citizen to the unstable C++ API.

Seems like it.

I've written small extensions to libclang and to the python bindings to
get a code generator I wrote working. If there is a small number of
missing features maybe you could do the same?

Yes, I should start doing that.

>
> When I was writing by own libclang python bindings
> (GitHub - rhdunn/libclangpy: A Python 2 and 3 compatible binding to the libclang 3.4 and earlier API) I had to write custom logic on
> top of the direct bindings to fix bugs in libclang (e.g.
> CursorKind_LinkageSpec is not mapped to the API, so `extern "C" ...`
> does not work) and some support for the newer APIs on older versions of
> libclang.

Did that as well.

> I also wrote my own tests for the binding to make sure the APIs worked
> consistently on the target and later versions of libclang.

Any documentation on how to do that?

My approach was to have a custom test driver 'run(version, test)' that
called the test function. This would report a skip message if the function
raised a missing function or unsupported exception (for Unexposed* cursors)
and the libclang version is lower than the one being tested, otherwise it
would report a failure. I then created 'test_[APIType][VERSION]' classes,
e.g. 'test_TranslationUnit29', as well as tests for specific CursorKind
values.

This then allowed me to document libclang behaviour differences. For
libclang version bugs, I would test the libclang version that had the bug
and use the test (with an associated comment) to document the behaviour
difference.

I have 'match_type' and 'match_cursor' helpers that test those objects and
raise unsupported exceptions for unexposed type/cursor kinds.

With the differences documented, I could then work out how to support a
consistent behaviour based on the tests and support newer features (e.g.
access specifiers) on earlier versions.

Thanks,
Reece

Hello,

I'm working on a C++ documentation generator "standardese"
(GitHub - standardese/standardese: A (work-in-progress) nextgen Doxygen for C++). I'm using libclang for
parsing.

Unfortunately I've run into an issue with it: the API isn't providing me
all the information that I need. There are things that aren't exposed by
the API, for example whether a constructor is explicit, a function
constexpr, or the noexcept expression.

I was forced to go to manual parsing of the source code of a cursor by
tokenizing it with the help of Boost.Wave (because libclang doesn't give
me the preprocessed tokens). This worked relatively good but I now ran
into an issue with template specializations (parsing `template <> class
c<a < b> {};), so my naive approach doesn't work anymore and I'd need to
write a fully fledged parser.

Have you tried clang_tokenize [1]? You might need to set the correct options as well when creating the translation unit. Like CXTranslationUnit_DetailedPreprocessingRecord [2].

I've also encountered a few things that might be bugs.
I can report those, but when looking at
https://llvm.org/bugs/buglist.cgi?product=clang&component=libclang&resolution=---&list_id=100919,
there are almost only unassigned, NEW bugs.

I would volunteer to maintain it myself but I have no experience with
the internal APIs or the clang project and I'm lacking the time to learn
them *and* work on standardese.

So my question: Is there someone actively maintaining libclang? What are
the future plans for the API?
If there is someone I'll provide active feedback and feature requests.

[1] clang: Token extraction and manipulation

[2] clang: Translation unit manipulation

libclang is a community project. If you spot any issues, simply fix them by
providing a patch on reviewboard. People are willing to review and so far the
most pressing issues we encountered got fixed upstream.

For the stuff that is not exposed, it's usually trivial to add the support to
libclang. Mostly a matter of writing a unit test (which is also pretty easy),
and then forwarding the C++ API via some C wrappers.

Cheers, hope to see some patches to review!

Okay, in a couple of weeks I'll hopefully have the time to do it.

Jonathan