one of the project ideas for GSoC 2014 is a clang-based tool to
generate documentation using doxygen-style comments in the source
code. I wanted to gauge the interest into such a project, see if
someone is willing to mentor it, and provide a rough outline of what
my idea of the project is. Any feedback on this is very welcome.
2 Prior Work
• clang already understands doxygen-style comments to a degree and
attaches them to the ast:
• doxygen can already use clang as a backend
• there already is a cldoc [https://github.com/jessevdk/cldoc]
3 Project Plan
3.1 Fully parse doxygen comments
Doxygen supports markdown, HTML entities, if/endif, post-definition
documentation, file scope doc, function groups, member groups, pages,
page hierarchies, examples, links, auto-links, and todo/bug lists (the
Some of those features might seem like overkill but they usually ended
up in doxygen because someone wanted them and they are actively used
in "the real world" (c).
The CommentParser should do its best to represent those in a useful
fashion in the CommentAST (especially link resolving) so tools further
down the chain can focus on their tasks only.
3.2 To represent intermediately or not
The actual documentation generation tool has two options:
• use libclang, work on the AST directly and spit out documenation.
• let clang produce some intermediate representation (XML?) and work
The first option seems to be the easy road but would tie the
generation directly into clang. It also seems harder to extend and
The second option is probably the most general approach. Generating XML
to represent the AST is actually proposed as its own GSoC project. Maybe
it would be possible to produce a reduced XML only containing
declarations and comments that could later be extended to feature the
full AST. Designing this schema is probably non-trivial and should be
well thought through.
The slides on -Wdocumentation already mention the ability to produce
XML but I couldn't figure out yet how to get that to work. From
glancing at the schema in bindings/xml/comment-xml-schema.rng it looks
pretty useful already, but some features (header dependencies,
inheritance relationships) are AFAIK missing.
The main benefit of an intermediate representation would be to enable
us to build something akin to doxygen's "external projects" feature,
which is incredibly useful (not having it would be a deal-breaker for
some of my own projects).
3.3 Actual Generators
How should the actual generators be defined?
What we should strive for: configuration from doxygen directly
portable, almost no configuration for the common case, at least simple
HTML and LaTeX for starters.
There are almost endless possibilities to do this and all of them have
3.3.1 The doxygen way
One driver (with many special cases) calls into generator
back-ends. All defined in C++. Almost no room for customization (css
in HTML, custom headers in LaTeX, nothing for the rest). Easy for the
3.3.2 Templating engines
Provide templates, have a driver that populates them. Likely the most
general approach, but different template engines for different output
formats with different capabilities. Complexity is in the driver.
3.3.3 Database + Web-server
A special case for HTML. Provide a database and a web-frontend that
can be hosted. Seems interesting for fast search functions and live
documentation updates. clang-server where are you?
3.3.4 A shim for doxygen
Doxygen already can produce XML, but doesn't use it for anything
internally (and the XML isn't really that useful anyway). That
capability could be expanded, but for that the doxygen hackery would
It's a large project, but each stage provides functionality that could
be a contribution on its own.
4 Why not improve doxygen instead?
Doxygen is incredibly hard to hack on, burdened by backward
compatibility (going so far that it prevents obvious bugs from being
fixed), and supports a strange set of languages which are not really
C/C++ like, which makes a lot of changes impossible or very hard. The
support for templates is abysmal and hard to fix without eventually
introducing a full C++ AST. I'm not trying to bash doxygen here. I
used it build cool things and Dimitri is doing a better job at
maintenance than a lot of other OS devs.
5 Who am I anyway?
My name is Philipp Möller and I'm a MSc Computer Science student at
the University of Saarland, Germany. I have already participated in
GSoC 2011 with the CGAL www.cgal.org project. Why do I care about
documentation so much? I build a large part of the CGAL doxygen
documentation ([http://doc.cgal.org/latest/]), got to know almost all
quirks and awesomeness of doxygen, and produced a few doxygen patches
in the process.
I'm used to making small contributions to open-source projects and
used to mailing-list communication.
Build a tool do generate documentation from doxygen-style comments
with clang that supports a lot of doxygens features. What do you think
of this project idea? Would there be a mentor for this project?