Doxygen comments duplicated in Clang source files : OK to delete them?

As I clean up a lot of Clang's comments for Doxygen's consumption, I'm
hitting one issue repeatedly: a lot of the documentation is cut+pasted
between the header file and the associated source file. Apart from
confusing some configurations of Doxygen (which concatenate the docs,
giving duplication in the output), this inevitably leads to bitrot and
divergence.

Taking one example:
clang::TokenConcatenation::IsIdentifierStringPrefix has documentation
in the header file Lex/TokenConcatenation.h:
    /// \brief Return true if the spelling of the token is literally 'L', 'u',
    /// 'U', or 'u8'.
Whereas in the source file Lex/TokenConcatenation.cpp it is documented as:
    /// IsIdentifierStringPrefix - Return true if the spelling of the token
    /// is literally 'L', 'u', 'U', or 'u8'. Including raw versions.
The latter is closer to correct, but still wrong; if
LangOpts.CPlusPlus0x isn't enabled then it checks only for 'L', which
is entirely reasonable (given that that's the only valid
encoding-prefix from C++98).

Obviously I can fix up the documentation for that one function, but my
bigger-picture question is this: is there any good reason for me not
to delete the duplicated documentation from the source files in this
and other cases? Or a related question: can anyone shed any light on
why this duplication is there in the first place?

-- James

I think the duplication is really bad, and I have a moderately strong
preference for how to de-duplicate:

Public types, variables, functions, etc. should have doxygen comments on
their declarations in the header.

Private types, variables, functions, etc. should have doxygen comments in
the definitions only, either in the source file or in an out-of-line
definition in the header.

This isn't necessarily the C++ definition of "public" or "private" but
more, the intended interpretation for users of the code: If it's part of
the interface that is intended to be used by consumers, document it in the
header. If it's ostensibly part of the implementation, document it in the
source file.

Also, clearly private member variables or private inline functions can't
have this rule applied to them. But in those cases, there is no ambiguity
or choice in the matter -- everything is necessarily attached to the single
declaration/definition.

Is this a pattern others are comfortable with?

Whenever we reach a consensus, James, would you be up for contributing a
patch adding advice about this to the LLVM coding conventions? We should
document the resolution of these issues.

-Chandler

+1 from me on all points.

[ Replying to one point only, for now. ]

As I clean up a lot of Clang's comments for Doxygen's consumption, I'm
hitting one issue repeatedly: a lot of the documentation is cut+pasted
between the header file and the associated source file. Apart from
confusing some configurations of Doxygen (which concatenate the docs,
giving duplication in the output), this inevitably leads to bitrot and
divergence.

I think the duplication is really bad, and I have a moderately strong
preference for how to de-duplicate:

[...]

Whenever we reach a consensus, James, would you be up for contributing a
patch adding advice about this to the LLVM coding conventions? We should
document the resolution of these issues.

No problem, when we appear to have consensus I'll put together a patch.

-- James

Sorry to be a contrarian again, but I find it really useful to have the copy-and-pasted doc comments on methods implementations in the .cpp files as well as on the prototypes in the header. It is really common for me to be working on a .cpp file without the header open, and losing the doc comments is a big loss.

If the goal is to avoid bitrot and divergence, why not make the doc comment parser note divergence and warn about it?

-Chris

As I clean up a lot of Clang's comments for Doxygen's consumption, I'm
hitting one issue repeatedly: a lot of the documentation is cut+pasted
between the header file and the associated source file. Apart from
confusing some configurations of Doxygen (which concatenate the docs,
giving duplication in the output), this inevitably leads to bitrot and
divergence.

I think the duplication is really bad, and I have a moderately strong
preference for how to de-duplicate:

Sorry to be a contrarian again, but I find it really useful to have the
copy-and-pasted doc comments on methods implementations in the .cpp files as
well as on the prototypes in the header. It is really common for me to be
working on a .cpp file without the header open, and losing the doc comments
is a big loss.

That's very interesting (in that it's the first time I've heard this
argument, and it runs counter to my own experience of what works
well). I'll have to think about what's best to do there. One thought
is to keep whatever you want in the implementation but *not* to mark
it up as a Doxygen comment for extraction, though that might make your
hypothetical sync tool a little more complicated.

(I still need to look into how it is that my Doxygen runs all end up
with duplication in the output, whereas those on the Clang website
don't. That might or might not be because I'm using more recent
builds of Doxygen.)

If the goal is to avoid bitrot and divergence, why not make the doc comment
parser note divergence and warn about it?

Divergence is the smallest of several problems.

That's an interesting idea to address that one piece though. I think
we'd need something more than that -- we'd need a re-writer to sync
the comments, as otherwise the effort needed to do so will be
untenable. And of course then we'd either need to decide that one
place is canonical, or have some other way to resolve conflicts.

However: My primary goal is to make Clang's documentation so good that
it's possible, most of the time, to program to the published
interfaces without needing to rely on implementation details or having
to ask questions for even basic uses. I tend to be opposed to
anything that makes maintaining documentation more work for
developers; empirically they're already not keeping up with the
documentation workload, I'd prefer to reduce the overhead and focus on
writing more useful material.

I'll go away and think about your stated preference for duplication,
possibly over something peaty.

-- James