[RFC] Deprecate old key-value format in CheckOptions

Hi!

Since ⚙ D128337 [clang-tidy] Extend spelling for CheckOptions, clang-tidy supports a much more readable way of specifying the CheckOptions field, using native YAML syntax:

CheckOptions: 
  x: y

Instead of the old key-value way:

CheckOptions: [
{ key: x, value: y},
]

The old format has been removed from the documentation, and --dump-config flag also dumps the config in the new format.

However we still keep the internal code to handle both versions with the goal of backwards compatibility. This adds extra complexity and maintenance cost, making it harder to improve or extend. It’s also confusing to have multiple ways of specifying the same thing.

Thus I would like to propose to deprecate the old key-value format, possibly printing a warning and creating supporting tools to automate the migration (or maybe --dump-config is already enough?).

Deprecation would mean that we warn in clang-tidy 18 and fully remove support in clang-tidy 20. What do you think?

2 Likes

Hi,

When for simple options I agree that new format of config is easier, it’s not easier for checks.

Examples (Clang 16):

Old way:

- key: something-forbidden-apis.ForbiddenFunctions
  value: |
     ---
      - name:             abort
        warningText:      forbidden api call 'abort'
        headers:
        - cstdlib
        - stdlib.h
      - name:             printf
        warningText:      forbidden api call 'printf'

New way (dump):

something-forbidden-apis.ForbiddenFunctions: "---\n - name:             abort\n   warningText:      forbidden api call 'abort'\n   headers:\n   - cstdlib\n   - stdlib.h\n - name:             printf\n   warningText:      forbidden api call 'printf'\n"

Of course dump were broken even before with:

- key:             something-forbidden-apis.ForbiddenFunctions
  value:           "---\n - name:             abort\n   warningText:      forbidden api call 'abort'\n   headers:\n   - cstdlib\n   - stdlib.h\n - name:             printf\n

This mean that not only basic clang-tidy config file need to switch to new format, but also invidual checks need to switch into new format. And to make config output readable some support would be required for multi-line format.

Other problem is that current config format is custom one, it’s not yaml, its not json, who knows what it is, and as a result you get dump like:

misc-definitions-in-headers.HeaderFileExtensions: h,hh,hpp,hxx
bugprone-suspicious-include.ImplementationFileExtensions: 'c;cc;cpp;cxx'

and start asking your self, what it is ? string or list.

For me clang-tidy just should switch from yaml to json (full json), but now we stuck somewere in between with this trash. Thats why I do not plan to use new format and stick with yaml thats more readable. And this is a reason why old format shouldn’t be deprecated.

For me to make this work it would need to work with structures in readable way.
Both in dump and in input file with easy support for checks (some examples).

If you look over internet then all examples are using old yaml format.
In summary: -1 from me for now

Thanks for the example, that’s very enlightening! I think the clear problem is abuse of primitive types. In particular, we are abusing the string type for everything. In your example, the value for that option is a string, which itself represents a dict. That’s why it’s printed badly when dumped. I wonder if the patch by @njames93 to support nested dicts could solve that?

I agree, perhaps we could just switch to json and use a proper 3rd-party library to parse it, but it sounds like a much larger change. I don’t even know what policies we have for incorporating 3rd-party libraries…

@PiotrZSL I came back to this and realized that we don’t need to use the ugly way, everything should remain identical:

something-forbidden-apis.ForbiddenFunctions: |
     ---
      - name:             abort
        warningText:      forbidden api call 'abort'
        headers:
        - cstdlib
        - stdlib.h
      - name:             printf
        warningText:      forbidden api call 'printf'

I.e. you are using multiline strings for the value, which is possible regardless of using the old/new scheme. WDYT?

For me such approach would be fine. (if it will work).

Also good to have some data: currently there seems to be 80k+ hits of the old pattern out in the wild:

Ok, I tested this localy, works fine.
In this case I’m OK with deprecating this legacy config.

To make migration easy we should:

  • change --dump-config to --dump-config=simple --dump-config=full, with default being full, but in simple mode we wouldn’t print default options for checks.
  • Maybe remove “user” printing from --dump-config, do we need this ?
  • Make --dump config properly print text with new lines, in such case use | character.

With all of that migration will be very easy, as user could just use --config-file and --dump-config. I think that some of our tests still use old format.