Keeping LLVM Clang-Format Clean

As a member of the clang-format team, I maintain a “Clang-Formatted Status Page for the LLVM project”

https://clang.llvm.org/docs/ClangFormattedStatus.html

The page is generated by checking every file in the LLVM tree for its clang-formatted status (excluding lit tests for obvious reasons)

The purpose of which is to encourage LLVM users to use clang-format and report clang-format bugs and to generally follow the LLVM coding style (or whatever style is in your subproject) - (it takes a while to run on my machine so I tend to run it ~ once a month)

The list of “clang-format-clean files” also acts as a regression suite for clang-format, which means the clang-format team can test changes with a significant amount of code to ensure we don’t break anything.

clang-format -verbose -n -files ./clang/docs/tools/clang-formatted-files.txt

What we see is that between runs there are significant amounts of files that were previous 100% clang-formatted that are no longer still.

This suggests in my view a couple of things:

  1. NFC commits and those without code reviews have NO clang-format on commit checks and this can lead to clang-format errors creeping in.
    https://reviews.llvm.org/rG99adacbcb7895114a62266c8a15e794bacd2380c

  2. Some reviews are not firing the llvm-pre-merge-checks that should have shown clang-format errors
    https://reviews.llvm.org/rG876b5ea96bf5890074aec61bc2c6a37b2cdc0617

  3. Some files become unclean due to the clang-format team fixing a bug (yah us!) e.g. clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
    these files have not been part of a code review/commit since the last scan.

index 279fb26fbaf7..6b50ce7277ba 100644
--- a/clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
+++ b/clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
...
-inline void operator delete(void *ptr)OPENMP_NOEXCEPT { ::free(ptr); }
+inline void operator delete(void *ptr) OPENMP_NOEXCEPT { ::free(ptr); }
...

Since the 3rd Jan 2022 the following 48 files (out of 8407 total, see below) have become “clang-format unclean”, Its a reasonable amount in one month and that renders this list as a regression test somewhat useless.

Since the move to github issues the level of change in clang-format has risen as we have new contributors, we really need this regression suite to ensure changes are thoroughly tested to
ensure we are not breaking at least LLVM by being able to run on a large project.

I’d like to gather input on how we might propose we tackle 1) and 2), or if at all.

One solution could be handled by using a github action to comment back on the phabricator commit if it fails clang-format on landing.

If such an action could use the most recent trunk build of clang-format this would help catch some cases of 3) if it checked the whole file and not just the diff

A more drastic solution could be that files that were previously clean as denoted by ./clang/docs/tools/clang-formatted-files.txt must remain clang-format clean and any commit breaking that could be rejected. (ouch, harsh!)

Or alternatively, any file that is present in the clang-formatted-files.txt can be automatically reformatted on commit to maintaining its clean status.

Any thoughts?

MyDeveloperDay

List of Previously Clean Files not Unclean

clang-tools-extra/clang-tidy/modernize/UseTrailingReturnTypeCheck.cpp
clang-tools-extra/clangd/InlayHints.cpp
clang-tools-extra/clangd/Selection.cpp
clang/lib/Headers/openmp_wrappers/__clang_openmp_device_functions.h
compiler-rt/lib/dfsan/dfsan_new_delete.cpp
compiler-rt/lib/memprof/memprof_new_delete.cpp
compiler-rt/lib/scudo/standalone/wrappers_cpp.cpp
flang/lib/Semantics/data-to-inits.h
libc/src/sys/mman/linux/munmap.cpp
lldb/source/Plugins/Process/FreeBSDKernel/ProcessFreeBSDKernel.h
llvm/include/llvm/BinaryFormat/AMDGPUMetadataVerifier.h
llvm/include/llvm/Support/TimeProfiler.h
llvm/include/llvm/Support/VirtualFileSystem.h
llvm/lib/CodeGen/LiveDebugValues/LiveDebugValues.h
llvm/lib/ExecutionEngine/JITLink/EHFrameSupport.cpp
llvm/lib/ExecutionEngine/JITLink/ELFLinkGraphBuilder.h
llvm/lib/Target/AVR/AVRFrameLowering.cpp
llvm/lib/Target/RISCV/RISCVGatherScatterLowering.cpp
llvm/lib/TextAPI/Architecture.cpp
llvm/lib/TextAPI/TextStubCommon.h
llvm/lib/Transforms/IPO/AttributorAttributes.cpp
llvm/lib/Transforms/Instrumentation/CGProfile.cpp
llvm/tools/llvm-reduce/llvm-reduce.cpp
llvm/tools/llvm-split/llvm-split.cpp
llvm/unittests/ADT/SimpleIListTest.cpp
llvm/unittests/Frontend/OpenMPIRBuilderTest.cpp
llvm/unittests/Transforms/Utils/ValueMapperTest.cpp
mlir/include/mlir/ExecutionEngine/CRunnerUtils.h
mlir/lib/Conversion/GPUCommon/GPUToLLVMConversion.cpp
mlir/lib/Dialect/Arithmetic/IR/ArithmeticOps.cpp
mlir/lib/Dialect/Arithmetic/Transforms/ExpandOps.cpp
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
mlir/lib/Dialect/LLVMIR/IR/LLVMTypeSyntax.cpp
mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
mlir/lib/IR/AsmPrinter.cpp
mlir/lib/IR/Diagnostics.cpp
mlir/lib/IR/OperationSupport.cpp
mlir/lib/Parser/AsmParserImpl.h
mlir/lib/Parser/AsmParserState.cpp
mlir/lib/Parser/AttributeParser.cpp
mlir/lib/Parser/DialectSymbolParser.cpp
mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
mlir/lib/Tools/PDLL/Parser/Parser.cpp
mlir/lib/Tools/mlir-lsp-server/MLIRServer.cpp
mlir/tools/mlir-tblgen/AttrOrTypeFormatGen.cpp
mlir/tools/mlir-tblgen/FormatGen.h
openmp/runtime/src/kmp_lock.h

I feel the last option could be done very simply with the following, but I’m wondering how upset people would get with such an NFC commit?

git reset HEAD
git checkout --
git pull --rebase
clang-format -verbose -n -files ./clang/docs/tools/clang-formatted-files.txt |& grep "code should" | cut -d: -f1 | sort -u | xargs clang-format -i
git ls-files -m | xargs git add
git commit -m "NFC re-clang-format files that were previously clang-formatted"
git push

@goncharov FYI

Thx for looking into this topic. I would very much like to have some automatic tooling enforcing only clean clang-format commits. So far we don’t really have a way to enforce this with contributors having push access to the repo (as required for our current workflows).

  1. Some reviews are not firing the llvm-pre-merge-checks that should have shown clang-format errors
    https://reviews.llvm.org/rG876b5ea96bf5890074aec61bc2c6a37b2cdc0617

This was part of D116549 and there I do see the harbormaster results. If you click through to the build results, you can see a failed build with the error message “clang-format: FAILED”.

Regarding cleaning up the entire code base, I don’t have a strong opinion, just some thoughts:

  1. Every in-flight patch might need manual rebasing and everyone who has downstream patches would need to merge them as well. However I can’t quantify how much work that would create for folks.
  2. You can exclude this commit from git blame so at least that issue is solved.

Skimming the status page, it looks like some sub-projects are close to 100% (flang, libc, mlir). Perhaps you can get those communities to accept the challenge to reach 100% and stay there!

I personally would have no problem with your script keeping previously clean files from going bad.

In general the LLVM project policy is not to pro-actively clang-format files; on the other hand, if there are files that haven’t been touched in a long time, it seems unlikely that a pro-active reformat would harm anyone. This approach might want its own separate RFC though.

I’m definitely not asking to clang-format the tree., I’m aware of the implications in terms of impact to up and down stream forks. But because of a combination of the premerge checks and general due diligence we’d eventually get there over time. But its the sliding backwards that’s the problem because at present we are hovering around 50% clean and not making forward progress because almost as many files become unclean as become clean each month.