Replacing the non-portable diff invocation coded as %diff_plist (and friends)

Various tests in clang/test/Analysis/ invoke diff with a non-portable -I option. I do not believe that a requirement for a diff utility that supports such an extension is intended by the community. I believe that replacing diff -I with diff on files that have been normalized using grep -v is a reasonable solution. A minor detail is that the output being checked is produced without a newline at the end of the file. As part of the proposed solution, the file is rehabilitated as a text file (as required by grep) by adding such a newline. The same is done for the reference expected files that are missing a newline at the end of the file. The solution is sketched out below for feedback.

The normalization is in many cases encoded in a lit substitution %diff_plist. A sample application of the proposed change would look like the following. See further below for additional rationale on the particulars.

Replace the RUN line:
-// RUN: cat %t.plist | %diff_plist %S/Inputs/expected-plists/unix-fns.c.plist -
+// RUN: %normalize_plist <%S/Inputs/expected-plists/unix-fns.c.plist >%t.expected.sed.plist && echo >>%t.plist && %normalize_plist <%t.plist | diff -u %t.expected.sed.plist -

Replace the lit substitution:
-# Diff command used by Clang Analyzer tests (when comparing .plist files
+# Filtering command used by Clang Analyzer tests (when comparing .plist files

with reference output)

-config.substitutions.append((’%diff_plist’,

  • ‘diff -u -w -I “/” -I “.:” -I “version”’))
    +config.substitutions.append((’%normalize_plist’,
  • “grep -Ev ‘%s|%s|%s’” %
  • (’^[[:space:]]. version .*$’,
  • ‘^[[:space:]]/.$’,
  • ‘^[[:space:]].:.$’)))

Why not use more RUN lines?
The output being checked is line-number sensitive. Keeping the same number of RUN lines minimizes unnecessary noise and risk of errors.

Why not use sed?
sed can be used to normalize the lines that are expected to vary; however, the some of the expected-output files have these lines omitted.
The grep replacement is more compatible with the behaviour of diff -I. The filtered files are nevertheless named with “sed” for the connotation that substitution took place.

Why remove -w from diff?
It appears that the -w option was being used for its effect (in some implementations) of ignoring the difference between a file that ends with a newline and one which does not. Following normalization of both files to end in a newline, this is no longer necessary. Since the effect is non-portable, keeping the option (and thus allowing the non-portable effect to occur, and perhaps become relied upon during development) is potentially harmful.

Why not add the newline via cat?
The newline can be added (in this context) using echo | cat file - as opposed to using echo >>file; however, the lit implementation of cat does not operate as expected on the standard input.

Hi,

Yup, sounds reasonable!

Traditionally these tests were using FileCheck. The point of replacing FileCheck with diff was that FileCheck failure reports for incorrectly produced plist files on tests were very hard to understand, especially when it was happening on a remote buildbot. This was happening because plists are XML files and most bugs in them usually look like large (10-20) lines suddenly appearing or disappearing, with the most important information being in the innermost XML tag in the middle of the chunk. Because FileCheck reports only the first line that doesn’t match, it made it very annoying to debug. Diffs, on the other hand, are outright perfect for the task.

An external normalization script doesn’t anyhow contradict this purpose, and if i understand correctly it was in fact one of the possible solutions that was considered back in

Hi,

Yup, sounds reasonable!

Traditionally these tests were using FileCheck. The point of replacing FileCheck with diff was that FileCheck failure reports for incorrectly produced plist files on tests were very hard to understand, especially when it was happening on a remote buildbot. This was happening because plists are XML files and most bugs in them usually look like large (10-20) lines suddenly appearing or disappearing, with the most important information being in the innermost XML tag in the middle of the chunk. Because FileCheck reports only the first line that doesn’t match, it made it very annoying to debug. Diffs, on the other hand, are outright perfect for the task.

Thanks. The information on why FileCheck is not used here is helpful to know.

– HT

I reverted this series of patches in r363007, since there were multiple issues with the new approach on Windows:

  • The heuristic for detecting path strings was incomplete, since it didn’t account for drive letter style paths.
  • The -w option appeared to be necessary for ignoring CRLF.
  • The ‘$’ portion of the new pattern doesn’t appear to match CRLF lines.

This was enough that fixing forward was prohibitively difficult, so I reverted them.

I reverted this series of patches in r363007, since there were multiple issues with the new approach on Windows:

  • The heuristic for detecting path strings was incomplete, since it didn’t account for drive letter style paths.
  • The -w option appeared to be necessary for ignoring CRLF.
  • The ‘$’ portion of the new pattern doesn’t appear to match CRLF lines.

This was enough that fixing forward was prohibitively difficult, so I reverted them.

Thanks. Would you mind sending me an example path string, and I’ll look into updating the path string detection. The -b option appears to take care of CRLF, and I’ll add ‘[[:space:]]*’ before ‘$’ to account for CRLF.

The lines with paths looked like this:
C:\src\llvm-project\clang\test\Analysis\unix-fns.c

The lines with paths looked like this:
C:\src\llvm-project\clang\test\Analysis\unix-fns.c

Thanks. Looks like just the CRLF problem then, the line format otherwise matches one of the patterns that was present in the patch already (’^[[:space:]].:.$’).

– HT