Adding more HTML-related facilities in Doxygen comment parsing

Hi Reid,

What applications does this HTML5 validation enable? I've tried to skim
this thread to find the big picture, but I can't find it.

Clients that use parsed comments could always use parsed markup -- it
is always safe to render. But Doxygen includes HTML as an indivisible
part of it. If clients that render parsed comments (in IDE, in HTML,
in PDF etc) would like to use markup represented as HTML, they should
either trust comments or sanitize HTML first.

Why does Clang need to validate the HTML, rather than simply associating
comments with Decls and handing them over to a client who knows the details
of Doxygen and HTML?

Clang needs to parse Doxygen in order to give useful warnings (most
notably \param not matching any actual parameter in the function, but
there are lots of others). Since Clang needs to understand Doxygen
that much, it makes sense for Clang to parse all of it and represent
parsing results in a cooked representation that is easily consumable
by external clients, so that other clients don't have to concern
themselves with parsing, only with further processing and/or
rendering. We have two such intermediate representations -- comment
AST, accessible by C++ and (a bit less so) by libclang APIs, and an
XML representation with a well-defined schema that is extended in a
backwards-compatible way.

With C++ and libclang APIs Clang also allows one to get the raw,
unparsed comment for a declaration and parse it using any other
parsing algorithm or even treat it as a non-comment (e.g., as a pragma
to guide static analysis etc.)