Hash for Clang-Tidy findings

Hi,

Anyone would be interested in having a unique identifier for Clang-Tidy findings? Hashing the bug and its context in a similar way as it was introduced in Clang Static Analyzer in https://reviews.llvm.org/D10305 would be useful e.g. to list new defects compared to a baseline and to recognize that a finding is the same even if it was shifted in the source code. The hash would assure that the issues in the following two examples could be considered as the same finding (which they are, since the difference between the two files is an unrelated change that caused the second finding to be shifted by 2 lines):

// test.cpp (version 1)
// ‘clang-tidy test.cpp -checks=bugprone-string-constructor’ output now is:
// test.cpp:4:15: warning: string constructor parameters are probably swapped; expecting string(count, character) [bugprone-string-constructor]
// std::string str(‘x’, 50);
// ^ ~~~~ ~~~
// 50 ‘x’
// planned string to be hashed: bugprone-string-constructor$void bugproneStringConstruct()$15$std::stringstr(‘x’,50);
#include

void bugproneStringConstruct() {
std::string str(‘x’, 50);
}

// test.cpp (version 2)
// ‘clang-tidy test.cpp -checks=bugprone-string-constructor’ output now is:
// test.cpp:6:15: warning: string constructor parameters are probably swapped; expecting string(count, character) [bugprone-string-constructor]
// std::string str(‘x’, 50);
// ^ ~~~~ ~~~
// 50 ‘x’
// planned string to be hashed: bugprone-string-constructor$void bugproneStringConstruct()$15$std::stringstr(‘x’,50);
#include

void bugproneStringConstruct() {
// This function does nothing, but raises clang-tidy’s
// bugprone-string-constructor checker warning.
std::string str(‘x’, 50);
}

If such a hash could serve well the community, are there any thoughts on the implementation? Based on Clang Static Analyzer, the hash would be md5(‘checker name$enclosing context$column of the finding$source code line text’).
I thought about two possible solutions:

  1. The hash could be part of the diagnostic message, i.e.: test.cpp:4:15: warning: string constructor parameters are probably swapped; expecting string(count, character) [bugprone-string-constructor] #0679de2a8c11b9a0e88c7e517b7301fd#
  2. The hash could be generated by an external Clang tool after the analysis, based on the location and checker information provided by the original diagnostic message. The major drawback of this approach is that Clang-Tidy’s output has to parsed, and the source file has to be reparsed in order to find the enclosing context.

Thanks for your comments,
Lorinc

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <CAOkYadgTmH0KChJ-BqS1+=KjymFKH8mwse7uiQRsQ5tS45Lmqg@mail.gmail.com>,
    L??rinc Balog via cfe-dev <cfe-dev@lists.llvm.org> writes:

Anyone would be interested in having a unique identifier for Clang-Tidy
findings? Hashing the bug and its context in a similar way as it was
introduced in Clang Static Analyzer in https://reviews.llvm.org/D10305

As long as you also follow the advice in the comments that source
lines are "normalized" w.r.t. whitespace and comments so that adding a
comment or changing whitespace (tabs to space, etc.) doesn't change
the hash, then I'd like to see this enhancement.