[RFC] Policy for Doxygen comments

In recent months, I’ve noticed a significant uptick in PRs that include extensive Doxygen comments. I’m specifically referring to comments that include a lot of special commands such as @param and @return annotations for functions. My theory is that this is at least in part because coding agents are eager to add those.

While I believe documentation is important, I also strongly believe that:

  1. Comments should aim to document the “why” rather than the “what.”
  2. Incorrect documentation is worse than no documentation at all.

Based on this, I’m skeptical of the value proposition of these comments. More often than not, they are a verbose repetition of the function signature, providing little value. The cost is that as a reviewer, I have to check them for correctness when they’re added and again every time the code is modified. Building with -Wdocumentation could provide some relief, but as of now LLDB is pretty far from that. [1]

So far, I haven’t commented on this during code reviews, but it’s something I keep coming back to. I’d like to get some opinions from our community and hopefully settle on a policy (that I’d be happy to document) going forward.

Proposed PR: [llvm] Update policy for Doxygen comments in the Coding Standards by JDevlieghere · Pull Request #179898 · llvm/llvm-project · GitHub

[1] I’ve made some efforts in the past but it’s really tedious work with questionable value. We are -Wdocumentation clean at the SB API level and enforce that through the unit test.

2 Likes

Might want to add in LLDB to the title, I read this as a whole of LLVM proposal initially.

I have, the opposite way. Along the lines of:

If you’re doing a Dozygen comment that has a brief and arguments already, it should also describe the return value. Even if that’s a copy paste of part of the brief.

I don’t use the comments in a context outside the source files themselves myself, but I assumed it would be better to have them be “complete”?

If someone does make extensive use of the generated documentation / IDE integration / LLM indexing, I’d like to hear what effects the quality of that.

In my case, the author had to describe the return type in the brief anyway for it to make any sense. So if this were the policy I would have accepted the comments as they were. At least for a human reader of the source file, no information was missing.

Yes, I agree that if you specify an argument or return value using the corresponding keyword, it should be complete.

However I think the following is fine too, where you describe the return value in prose (and omit the arguments)

/// Get the foo for bar. Returns true if baz, false otherwise.

What I’m trying to avoid is something like this:

/// Get the foo for bar.
/// 
/// @param bar the bar.
/// 
/// @return True if baz, false otherwise.

In this contrived example it doesn’t look too bad, but if the function takes a bunch arguments and they each just repeat the name of the argument, just to document the return value, I would very much prefer the first example.

2 Likes

+1 for this. I don’t think this example is contrived, we see it a lot in the codebase and it really doesn’t any value to the readers. The prose version is just as precise and a lot more terse.

1 Like

I agree in this case. The arguments would basically be a large diversion, in the middle of the sentence, before the rest of the information.

I tried to come up with a way of describing what this looks like in practice but there’s too many corner cases and there’s a line crossed somewhere where the extra sections actually help readability, but I can’t say where for all cases.

So I think it would help to link to, or add, some statement of the purpose of these comments. So we can ask ourselves “does this comment achieve that”, regardless of form.

“Doxygen comments are supposed to <things> and as long as they achieve that, there is no required form.”

Edit: to prevent review comments being “this describes parameters too much” instead of “this does not inform the user how to use this function, because it does…”. The latter being more useful I think.

1 Like

I support the spirit of this proposal. Doxygen comments should provide additional details and context and avoid restating information that can be glanced from the function signature itself.

1 Like

Here are the existing guidelines in the LLVM Coding Standards: LLVM Coding Standards — LLVM 23.0.0git documentation

As I was trying to draft a proposed wording, I found myself restating a lot of what’s already described there. I think we could achieve my desired outcome by incorporating Adrian’s comment there. For example, by adding the following sentence:

Only document function parameters and return values when it provides additional information, such as intent, usage, or behavior that’s non-obvious. Avoid restating information that can easily be inferred from the function signature itself.

When only a single parameter needs documentation, consider doing so in the brief rather than documenting all parameters.

If folks here agree, I can create a PR and move this to LLVM category on Discourse and continue the discussion there with the broader community.

The existing guidance is good, and it does talk about duplication:

Avoid restating the information that can be inferred from the API name.

But it doesn’t clarify whether duplication within the comment itself is to be avoided.

A minimal documentation comment:

It does not say whether this is minimal in that, in this case, it’s all it needs to be and that’s fine, or -

A documentation comment that uses all Doxygen features in a preferred way:

Why this bigger comment is preferred specifically. Like does it mean “if you do use the other features, this is the preferred way” or “we prefer that you always use the other features, and do it this way”.

So yes, if it’s to align with your proposal, I think it needs to be more specific. What’s there could already mean what you want, but it’s up to the reader which way to take it.

And Adrian’s wording is a great starting point for that.

Only document function parameters and return values when it provides additional information, such as intent, usage, or behavior that’s non-obvious. Avoid restating information that can easily be inferred from the function signature itself.

If folks here agree, I can create a PR and move this to LLVM category on Discourse and continue the discussion there with the broader community.

Seems reasonable to me. I like the first sentence in particular :slight_smile:

-eric

I’ve moved this discussion from LLDB to LLVM and created a PR with my proposed change, which I believe captures the current state of the discussion: [llvm] Update policy for Doxygen comments in the Coding Standards by JDevlieghere · Pull Request #179898 · llvm/llvm-project · GitHub

Overall, strong +1 for making documentation concise and to the point.

Only include code examples, function parameters and return values when it
provides additional information, such as intent, usage, or behavior that’s
non-obvious.

What I usually suggest is to consider renaming function arguments (or giving them more meaningful data types where possible) so that it’s obvious what they are for and we don’t have to explain in comments.

1 Like

Thanks for the input everyone. I’ve merged the PR.