[RFC] Slight tweak to header inclusion policy to promote IWYU

(spin off from [NFC] Add explicit #include abi-breaking.h where its macros are used. by dfukalov · Pull Request #109453 · llvm/llvm-project · GitHub conversation with @mehdi_amini @dfukalov )

I think we’ve generally seen/accepted well-formed Include-What-You-Use changes historically, but the letter of the style guide suggests indirect inclusion is acceptable.

I think it’s “not a big deal, but would be better if the inclusions were explicit, rather than implicit” - I’m not likely to check someone’s #includes carefully to ensure they include everything they need explicitly, but I’m happy to accept patches that add such explicit inclusions to reduce spooky-action-at-a-distance maintenance issues.

(by maintenance issues I mean: Removing a header that is no longer explicitly used can break the build because something that header includes is implicitly used. Equally, removing the include from a header when it’s no longer needed in that header can break the build because someone including that header was relying on the indirect inclusion.
In the case of #ifdef and #if macros, these cases can become non-breaking silent behavior changes (-Wundef will catch the #if cases, but not the #ifdef cases))

Adding these missing includes doesn’t substantially impact compile time since compilers have fast-paths through re-included files with the usual macro-based header repetaed inclusion protetion.

So I’d propose just removing the “or indirectly” portion of the phrase " You must include all of the header files that you are using — you can include them either directly or indirectly through another header file." in LLVM Coding Standards — LLVM 20.0.0git documentation unless anyone has objections.

This wording would still not be entirely precise - I don’t think it’s necessary to include things that you indirectly depend on that are part of interfaces you include (if you include some header that defines a base class with, for instance, an abstract function that takes some type T and your derivted class implements that function and so depends on T - the implicit dependency via the base class header seems generally OK to me? Though I think everyone’s got some judgement on those sort of dependencies - I’m not sure how useful it is to make a really precise enumeration for these sort of cases… but open to other folks feeling differently) - I guess the Google style guide says even that closely connected indirect inclusion shouldn’t be used ( Google C++ Style Guide ) - so I’m open to that too.

Practically speaking, I think the change in policy wouldn’t affect things day-to-day, but allow this sort of cleanup to be done whenever someone felt the itch to make such changes, and in doing so would reduce the maintenance burden of the codebase for those removing unused inclusions based on human or automated local reasoning.

8 Likes

Most of your post seems to go in the direction of “don’t actively fight people if they want to explicitly include headers”, while the wording change you propose is basically “only direct includes are allowed”. Maybe I misunderstood what the resulting wording is supposed to be, it would be helpful if you could spell it out.

I can get behind the former, not the latter. Basically, the rules that would work for me are:

  • You are allowed to use explicit includes.
  • You should not request people to add explicit includes in PR reviews.
  • You should not add explicit includes in a PR that are not directly related to your changes, similar to how you should not reformat unrelated code. (We see this occasionally due to editors automatically inserting IWYU includes.)

A concern I have with any policy stricter than that is that it’s just not really feasible for human-generated code to follow IWYU. You need to use tooling for this. I like to retain the ability to write code in a plain text editor.

1 Like

That seems reasonable to me, but I would prefer that we’re a bit more explicit about the intent than just removing the sentence: is there a concise way to capture what your intent is in the doc so everyone is on the same page?

Perfect may be the enemy of good here: while the guidelines could be to be more explicit, that does not mean we have to be perfect about it: to me the guideline reflects the “intent” behind what we want the codebase to be instead and the kind of “cleanup patches” that would be welcome or not.
(For example: we had code formatting guidelines before the existence of clang-format, and we were still able to write code, and often not everyone would follow the guideline strictly but that worked)

@nikic Yeah, I don’t think any of us get this perfect without tooling - though that presumably applies to the reviewers and the author. A reasonable effort could be made/someone might mention if there’s sometihng egregiously missing - if someone eventually added some precommit tooling that fixed up our header inclusions before committing, that’d probably be OK & wouldn’t impact what tools we can use to write code.

is there a concise way to capture what your intent is in the doc so everyone is on the same page?

The Google style guide is pretty terse on it - could quote that directly, or paraphrase it.

I agree. We must not have a policy that suggests that we require IWYU-cleanliness, unless/until it’s checked automatically by tooling. The effort for humans to figure out which headers they need to include in order to comply with IWYU is significantly greater than the value (marginally reduced maintenance burden for future changes) which results from it.

We do have this functionality as part of clang-tidy include-cleaner. So, if we want to have a policy of IWYU-cleanliness in LLVM, the way to do so would be by setting up a clang-tidy presubmit with this check enabled.

The rest of the style guide, etc, is best effort & sometimes that effort’s pretty minimal - do you think the change I proposed would, in practice, be harmful? (& if so, I guess this rule would need a special carveout compared to all the others to make it clear that it’s acceptable to perform such cleanup but not required?)

1 Like

The message of that section seems to be that you shouldn’t include too much. I think we should reword it altogether, specifically:

  1. Say that header files should be “standalone”, or usable on their own.
  2. State what the policy is on IWYU (remember to remove no longer used includes as well).

I think we should state the requirements explicitly instead of being somewhat vague.

Wanted to add for consideration the case when a header relies on an includer to provide a certain header, for example

// includer.cpp
#include "llvm/ADT/ArrayRef.h"
#include "example.h"
// example.h
struct Test { llvm::ArrayRef<int> test; };

It’s not super easy to notice in practice, so it’s not really enforceable without tooling. Fixing such cases seems non-controversial.

Unfortunately, I don’t have any suggestions for the wording. Just wanted to mention such situations are also possible.

That requirement I think is currently pretty well captured by the existing wording at least - and I think fairly uncontentious, hopefully.

but, yeah, with any rewording we’d need to preserve that requirement

1 Like