[RFC] Lifting "include cleaner" (missing/unused include detection) out of clangd

clangd has a feature to warn when you #include headers but don’t use them.


It works well (it’s still off by default, but we’re planning to turn it on soon).
We’re also planning to add a warning when you use a symbol are missing #include.

Together these provide analysis similar to the include-what-you-use tool, with some differences[1].
We’ve had several requests to make this available for reuse:

  • @serge-sans-paille has been cleaning up LLVM and wants better tools. D121593
  • On the tracking bug for this feature, we get requests for a clang-tidy version of it
  • we’ve got at least 2 internal-only clang-tidy checks implementing special cases of IWYU. A generic upstream check would benefit everyone.
  • inside Google, a team wants to do ~completely automated continuous cleanups (and to ensure the policy enforced matches clang-tidy + clangd)

What do people think of lifting this up and generalizing a bit (to support suggesting new includes) as a library in clang-tools-extra? That way it could be shared between clangd, a new clang-tidy check, and our downstream deployment.


At alternative: there was an RFC in the past to merge IWYU itself into clang-tools-extra. It seems to have gotten stuck over the spectre of modules, and whether it should be incrementally rewritten.

Main differences from IWYU:

  • lightweight implementation suitable for fast analysis with preambles
  • conservative about suggesting changes to ensure a low false-warning rate (at the cost of missing some suggestions)
  • no attempts to work out when forward declarations are possible (but respects them)
  • enforces simplified (selectable) style rules rather than than trying to track compilation requirements exactly
  • because of the above, much simpler implementation

FWIW, the IWYU implementation is not suitable for clangd (lack of preamble support and generally does too much). It was also judged unsuitable for the continuous cleanups, where the policies need to be simple and reliable for transparency, and we don’t have confidence we can make IWYU reliable.

CC @kirillbobyrev @kadircet

5 Likes

(Happy to talk about design… We have thoughts on how an API should look and how to isolate the complexity. But wanted to start with the idea first)

What do people think of lifting this up and generalizing a bit (to support suggesting new includes) as a library in clang-tools-extra? That way it could be shared between clangd, a new clang-tidy check, and our downstream deployment.

Sounds great to me!

I’m not sure I understand why it needs to be shared between clangd and clang-tidy. Can’t it be removed from clangd once it’s in clang-tidy? Supporting it in both sounds redundant to me.

(BTW, moving it from clangd to clang-tidy would also have the advantage that it could be controlled per directory via .clang-tidy files. Today I can only turn it on or off globally.)

Unfortunately clang-tidy checks that make heavy use of the preprocessor can’t work fully in clangd. There’s some work like tracking IWYU directives and checking whether headers are self-contained that should be done while building the preamble rather than every time we refresh diagnostics.

Not sure how you’re turning it on/off, but you can use in-tree .clangd files or an If block to configure clangd per-directory. Configuration

I see, thanks for explaining. It might be a bit of a documentation challenge then to tell users what checks to enable where…

Oh cool, I somehow missed this. I was only aware of the global per-user config file. Thanks!

I put together a prototype over the last couple of days, borrowing bits from clangd.
It’s not terribly accurate (many cases to add) and the design is open for discussion.

But it supports a standalone tool, a clang-tidy check, and two clangd integrations:

https://reviews.llvm.org/D122677

The main API is walkUsed(), which takes an AST and recorded PP events, and enumerates uses in the main file as something like (ref location, referenced symbol, providing header) tuples.
There are functions for recording the PP events needed, and for matching the providing headers against recorded include directives.

Each tool is responsible for tracking which #includes was used at least once, and the mechanics of emitting diagnostics - fairly simple.

It works well

That’s simply not true.

Most importantly the include cleaner doesn’t understand concepts and reports headers contributing just concepts, but nothing else.

Less importantly because one could work around it proposes removal of headers that provide backward compatiblitly via feature test marcos. If all relevant features are provided by the current compiler, the include cleaner also proposes removal of headers.

In the current state the include cleaner a really nice idea, but harmful.

Did I mention overloaded operators? The include cleaner also misses overloaded operators.