[RFC] Exclude issues from system headers as early as posible

Issues:
Almost all checks currently match statements and prepare diagnostic information for issues that are in system headers only to be thrown away at the end due to lack --system-headers.

Example:

  • without unless(isExpansionInSystemHeader())
    47.4298 ( 12.6%) 13.7315 ( 12.9%) 61.1614 ( 12.7%) 61.1402 ( 12.7%) readability-identifier-naming

  • with unless(isExpansionInSystemHeader())
    23.7770 ( 7.6%) 9.3479 ( 8.1%) 33.1249 ( 7.8%) 33.1096 ( 7.8%) readability-identifier-naming

This is caused by complicated logic of check that hit performance in the files that end users is not interested in.

Proposed solution:

  • add some matcher function and free standing function to check if diagnostic will be excluded on ClangTidyCheck level and use it in checks.

Something like “functionDecl(unlessExcluded())” or this->isExcluded(MatchedDecl->getLocation())

Bigger projects could see significant performance boost. This could be also extended to take into account also --header-filter.

For example this is list of all warnings generated (from system files) in project I work for:
494615 modernize-use-nullptr
509163 modernize-use-auto
521390 modernize-use-override
756136 readability-implicit-bool-conversion
1067033 cppcoreguidelines-pro-bounds-pointer-arithmetic
1076988 modernize-avoid-c-arrays
1098023 cppcoreguidelines-special-member-functions
1376238 google-explicit-constructor
3072423 modernize-use-noexcept
6371033 readability-braces-around-statements
11445644 modernize-use-using
34328178 cppcoreguidelines-macro-usage
96856838 bugprone-reserved-identifier
97458338 readability-identifier-naming

For those few checks its already a quarter of trillion issues that are generated and thrown away.

Raising as requested here: âš™ D143851 [clang-tidy] Tweak 'rule of 3/5' checks to allow defaulting a destructor outside the class.
@njames93

5 Likes

I’d like to add some support for resolving this issue. I work on an open source project where this is a significant pain point. We recently had to split our clang-tidy CI job into two parts because it exceeded the 6 hour time limit on GitHub CI jobs. The two jobs post-split are 5h16 and 2h12. See for example Upgrade clang-tidy used in CI to LLVM 16 · CleverRaven/Cataclysm-DDA@559263e · GitHub

We added some profiling to measure which checks are the slowest, and got similar results to those above. The slowest few checks on the 5h16 job of the above-linked run are

  "time.clang-tidy.bugprone-reserved-identifier.wall": 2407.1580893993378,
  "time.clang-tidy.misc-unused-using-decls.wall": 1290.9960434436798,
  "time.clang-tidy.bugprone-stringview-nullptr.wall": 1205.962524652481,
  "time.clang-tidy.bugprone-use-after-move.wall": 999.7972841262817,
  "time.clang-tidy.modernize-use-transparent-functors.wall": 974.8725790977478,
  "time.clang-tidy.modernize-avoid-c-arrays.wall": 829.8043878078461,
  "time.clang-tidy.modernize-replace-auto-ptr.wall": 806.1057415008545,
  "time.clang-tidy.modernize-deprecated-ios-base-aliases.wall": 753.617425441742,
  "time.clang-tidy.performance-unnecessary-value-param.wall": 717.0466673374176,
  "time.clang-tidy.modernize-use-using.wall": 681.6415565013885

That’s 40 minutes just for bugprone-reserved-identifier.

I would like to start using readability-identifier-naming but it’s just not an option when it’s so expensive; it would probably push us back over the 6 hour mark.

(For reference, this project has roughly half a million lines of C++)

Thank you for a list of checks, I got some ideas.
Are you able to test “custom version” of clang-tidy ?
Or test llvm from main branch ?

Edit: Nwm, i managed to run clang-tidy on this project…

1 Like

I probably would be able to test a custom version of clang-tidy, but if you managed to get it running, that’s probably easier.

I did some quick testing,
When it comes to bugprone-reserved-identifier check, locally it run for 931 seconds (looks like server used in github is very slow), after some minor optimizations is down to 850 seconds, after excluding most of system headers is down to 538 seconds. I will check if I can do something more, and prepare a patch.

Edit: Did more refactoring, get down from 931 seconds to 265 seconds, and that still before excluding system headers & before some other fixes.
Edit2: ~50 seconds, after excluding system headers, that ~95% reduction, if I manage to deliver this.

1 Like

I had been working on a global way to support this in our branch of clang-tidy downstream. It was just a short 30 line patch that disables all ast matcher and ast visitor checks from looking in system headers and headers that don’t match the header filter. I was seeing analysis times for files in clang-sema dropping by up to 92%.

There are definitely some issues that prevent this work being upstreamed as is. The main on me that it does run the slight risk that it may prevent a warning being emitted, most notably this could happen when running unity builds or doing other funky things when including files, The number of warnings suppressed output will also be incorrect, not sure if that is a real issue though.
Would you suggest this work could be upstreamed but in such a way that the behaviour is opt-in(most likely through a command-line option).

Question is, if such up-stream based filtering would still allow things like:
callExpr(hasDeclaration(namedDecl(hasName("::std::move"))) to work. Simply because in some checks is safe to exclude all system headers, but in some checks like unused-using it may cause lot of false-positives issues. Thats a reason why I wanted enable this in per-check basic, but still something like traverse(TK_IgnoreSystemCode could be helpful. Other thing are issues from files that does not match HeaderFilter, they also could be excluded early.

Perhaps enable this as default, and allow individual checks to override the behavior when they need it because they check something specific from system headers?

We need to then also make sure that we test the checks will talk system headers instead of putting everything in the same test file.

I’ll put up the preliminary work as a WIP patch, and then we can discuss further

https://reviews.llvm.org/D150126

This is basically the patch we are using downstream, obviously has non of the extra boilerplate to selectively enable/disable this behaviour.
This also won’t help with preprocessor based checks.
Feel free to comment on the implemenation and how it could be improved.
Parts of it are a little hacky as I wanted out patch to be as minimal as possible

That ticket also goes into the performance of certain clang-tidy checks and hits on some of the same things as this discussion.