[RFC] Lifting "include cleaner" (missing/unused include detection) out of clangd

sam-mccall · March 26, 2022, 12:54am

clangd has a feature to warn when you #include headers but don’t use them.

It works well (it’s still off by default, but we’re planning to turn it on soon).
We’re also planning to add a warning when you use a symbol are missing #include.

Together these provide analysis similar to the include-what-you-use tool, with some differences[1].
We’ve had several requests to make this available for reuse:

@serge-sans-paille has been cleaning up LLVM and wants better tools. D121593
On the tracking bug for this feature, we get requests for a clang-tidy version of it
we’ve got at least 2 internal-only clang-tidy checks implementing special cases of IWYU. A generic upstream check would benefit everyone.
inside Google, a team wants to do ~completely automated continuous cleanups (and to ensure the policy enforced matches clang-tidy + clangd)

What do people think of lifting this up and generalizing a bit (to support suggesting new includes) as a library in clang-tools-extra? That way it could be shared between clangd, a new clang-tidy check, and our downstream deployment.

At alternative: there was an RFC in the past to merge IWYU itself into clang-tools-extra. It seems to have gotten stuck over the spectre of modules, and whether it should be incrementally rewritten.

Main differences from IWYU:

lightweight implementation suitable for fast analysis with preambles
conservative about suggesting changes to ensure a low false-warning rate (at the cost of missing some suggestions)
no attempts to work out when forward declarations are possible (but respects them)
enforces simplified (selectable) style rules rather than than trying to track compilation requirements exactly
because of the above, much simpler implementation

FWIW, the IWYU implementation is not suitable for clangd (lack of preamble support and generally does too much). It was also judged unsuitable for the continuous cleanups, where the policies need to be simple and reliable for transparency, and we don’t have confidence we can make IWYU reliable.

CC @kirillbobyrev @kadircet

sam-mccall · March 26, 2022, 12:57am

(Happy to talk about design… We have thoughts on how an API should look and how to isolate the complexity. But wanted to start with the idea first)

hansw2000 · March 28, 2022, 10:15am

What do people think of lifting this up and generalizing a bit (to support suggesting new includes) as a library in clang-tools-extra? That way it could be shared between clangd, a new clang-tidy check, and our downstream deployment.

Sounds great to me!

stefan-haller · March 29, 2022, 3:07pm

I’m not sure I understand why it needs to be shared between clangd and clang-tidy. Can’t it be removed from clangd once it’s in clang-tidy? Supporting it in both sounds redundant to me.

(BTW, moving it from clangd to clang-tidy would also have the advantage that it could be controlled per directory via .clang-tidy files. Today I can only turn it on or off globally.)

sam-mccall · March 29, 2022, 3:54pm

Unfortunately clang-tidy checks that make heavy use of the preprocessor can’t work fully in clangd. There’s some work like tracking IWYU directives and checking whether headers are self-contained that should be done while building the preamble rather than every time we refresh diagnostics.

Not sure how you’re turning it on/off, but you can use in-tree .clangd files or an If block to configure clangd per-directory. Configuration

stefan-haller · March 29, 2022, 4:01pm

I see, thanks for explaining. It might be a bit of a documentation challenge then to tell users what checks to enable where…

Oh cool, I somehow missed this. I was only aware of the global per-user config file. Thanks!

sam-mccall · March 29, 2022, 9:12pm

I put together a prototype over the last couple of days, borrowing bits from clangd.
It’s not terribly accurate (many cases to add) and the design is open for discussion.

But it supports a standalone tool, a clang-tidy check, and two clangd integrations:

https://reviews.llvm.org/D122677

The main API is walkUsed(), which takes an AST and recorded PP events, and enumerates uses in the main file as something like (ref location, referenced symbol, providing header) tuples.
There are functions for recording the PP events needed, and for matching the providing headers against recorded include directives.

Each tool is responsible for tracking which #includes was used at least once, and the mechanics of emitting diagnostics - fairly simple.

hasselmm · December 14, 2023, 11:22am

It works well

That’s simply not true.

Most importantly the include cleaner doesn’t understand concepts and reports headers contributing just concepts, but nothing else.

Less importantly because one could work around it proposes removal of headers that provide backward compatiblitly via feature test marcos. If all relevant features are provided by the current compiler, the include cleaner also proposes removal of headers.

In the current state the include cleaner a really nice idea, but harmful.

hasselmm · December 14, 2023, 11:55am

Did I mention overloaded operators? The include cleaner also misses overloaded operators.

Topic		Replies	Views
Use cases related to #include refactoring clangd	3	488	June 1, 2022
Clang-tidy eliminate unused parts of headers clang-tidy clang	4	1206	July 7, 2023
[RFC] Add include-what-you-use tool to clang-tools-extra Clang Frontend	17	281	September 14, 2017
Finding redundant #includes Clang Frontend	8	74	December 15, 2009
[RFC] Deprecation warnings for clang-tidy clang-tidy	7	192	January 17, 2025

[RFC] Lifting "include cleaner" (missing/unused include detection) out of clangd

Related topics