RFC: Support specifying checks as a list in the config file


Currently one can only specify checks as a string in the .clang-tidy file. This is unintuitive and comes with some drawbacks, as explained in this Github issue.

I have a proposal for a WIP patch to add support for specifying the Checks as either a string or a list of strings.

What do you think, does it sound reasonable? It’s my first time dealing with YAML in the project so I’m sure the code can be improved. I’m mostly interested in knowing if people support the idea in general. I also have some follow-up questions:

  • Should we deprecate the string format, and have it only as a list after a couple releases? That would greatly simplify the code.

  • What format should it use when outputting to YAML via --dump-config?

  • What format do we want to keep internally?

Looking forward to hearing your thoughts!

1 Like

What is wrong with what is proposed in the comments of the github issue?

They are just workarounds, and as I wrote at the end, it’s not at all obvious that you are supposed to add a comma after the comment - otherwise clang-tidy will silently ignore the rest of the list, potentially removing active checks after the comment. Removing a check will go unnoticed in CI, so it can take a while before people realize what happened and then spend even more time figuring out how to solve it (the fact that a comma is needed is unintuitive and undocumented). By that time, a lot of technical debt has probably already creeped in such that is no longer possible to re-enable the checks.

We should not choose interfaces to match our backend - we should choose what makes sense and makes the tool user-friendly. To specify a list of things, the correct syntax in YAML is a list.

Anyhow we got this merged now:

The remaining question I had is if we should deprecate (remove it in clang-tidy X) the old syntax - supporting both is probably confusing to the user and adds complexity in our code. IMO if we always aim at backwards compatibility we will eventually end up in the same situation as C++: many ways of doing the same thing, a lot of technical debt accumulated that cannot be removed, and eventually people switching to other more modern and expressive alternatives.

The comma required after the comment just feels like an LLVM yaml parser bug. Is there a yaml specification that clarifies the use of a comment in this location?

We use something like this:

# Exclude clang-analyzer checks that don't apply
Checks: >

and have no problem with it. If your main complaint is that you can’t have a comment in the middle I don’t see the burden of having the comment at the top as we do.

Is there a yaml specification that clarifies the use of a comment in this location?

I don’t think so.

I don’t see the burden of having the comment at the top as we do.

The rationale is the same as why one should have variables in code as close as possible to where they are used: locality of reference. In your example, if one day you remove a check, you will most likely forget to remove the comment at the very top that explains something about the check. In our case, we have plenty of comments documenting why certain checks are disabled, links to bugs, etc. It becomes very hard to read to have all that information at the very top.

If we take a step back to the basics I think it’s a simple reasoning. This is a list of checks. YAML supports lists natively. Therefore a YAML list should be the first and most intuitive choice to specify checks. Why should things be more complicated than they need to be?

1 Like

Besides, the string way of specifying checks is error-prone. If you miss a comma anywhere in the list of checks, all the checks below it are silently disabled. This is not caught in CI:

Checks: >

Here, clang-tidy will silently ignore the last 2 checks, enabling only the first one. Using a YAML list, the comma is not even needed and you always get the 3 checks.

This sounds like a separate bug – clang-tidy not issuing an error for unrecognized check names.

Sure, but it’s a workaround to the core issue.

There’s more issues, for example users need to spend time reading about how to do multi-line strings in YAML, learn about > versus |, reason about newlines or spaces, etc. Really a lot of effort just to declare a list.

Could you provide a motivation as to why a string is better than a list to represent a list?

Using a list for the check patterns seems like the obvious thing to do, and I’ve seen plenty of complaints about the current syntax, and plenty of people getting it wrong without noticing. So thank you for fixing this!

My vote goes to the list. Deprecation of the comma-separated string syntax sounds reasonable to me, but probably needs some more community buy-in.