[cfe-commits] [PATCH] pretty display of unprintable source and correct caret/range/fixup alignment

I'm still hopeful that this change can make it in in time for the 3.1 branch tomorrow. If anyone happens to see this who can review these patches before the branch I'd be grateful. Hopefully there are no major issues so it will be able to go in.

The first patch is just to add support for reversed colors in raw_ostreams in llvm. The second patch uses it to print out unprintable characters, and also adjusts text diagnostic ranges and fix-its for both the escaped representations and for characters that aren't a single column wide. The last patch is unimportant and for another issue: printing diagnostics for source that contains `\0`

- Thanks

0001-Add-reverseColor-to-raw_ostream.patch (4.93 KB)

0001-Nicer-display-of-unprintable-source-and-fix-caret-di.patch (36.5 KB)

0002-fix-display-of-source-lines-with-null-characters.patch (1.96 KB)

The 2 small patches look good.

The biggest issue I have with the big patch is that the foo_l functions are non-portable. They don't exist on linux and other non-BSD unices. I find it worrying to have locale stuff in clang, is there really no other way to get this info?

Apart from that there are some stylistic issues:
- Don't pass around pairs of char*, use StringRef instead. You can also replace some loops with its convenience functions, e.g. StringRef::rfind. The stuff from ConvertUTF.h is an exception as it's contributed code, but its good to do the conversion to from StringRef to char* as late as possible to keep typecasting at a minimum.
- Replace 'const std::string &' with StringRef
- The preferred style is to put const in front of the type: "const int*" instead of "int const*"
- Opening braces always go on the same line as the declaration.
- Be consistent about spacing, "if (" instead of "if(", always put spaces after commas, …

- Ben

I'm still hopeful that this change can make it in in time for the 3.1 branch tomorrow. If anyone happens to see this who can review these patches before the branch I'd be grateful. Hopefully there are no major issues so it will be able to go in.

The first patch is just to add support for reversed colors in raw_ostreams in llvm. The second patch uses it to print out unprintable characters, and also adjusts text diagnostic ranges and fix-its for both the escaped representations and for characters that aren't a single column wide. The last patch is unimportant and for another issue: printing diagnostics for source that contains `\0`

The 2 small patches look good.

The biggest issue I have with the big patch is that the foo_l functions are non-portable. They don't exist on linux and other non-BSD unices. I find it worrying to have locale stuff in clang, is there really no other way to get this info?

wcswidth* is the only api I know of for computing the terminal display column width of text. I think I could switch the iswprint_l and wcswidth_l functions for non-_l versions. iswprint and wcswidth technically use the current locale but I don't think the locale will actually change the results. I've seen isprint used elsewhere so I guess functions based on the current locale would be okay.

But I do have to convert from UTF-8 to the host's wchar_t encoding in order to use these functions. Currently I'm creating a locale where the narrow character encoding is UTF-8 so I can convert from UTF-8 to the wchar_t encoding. And since I go ahead and use that for the other locale based functions. I could assume that the wchar_t is UTF-16 or UTF-32 depending on the sizeof(wchar_t), and use the ConvertUTF8ToUTFXX functions. As far as I know this holds for the platforms we care about. Another alternative would be the

Apart from that there are some stylistic issues:
- Don't pass around pairs of char*, use StringRef instead. You can also replace some loops with its convenience functions, e.g. StringRef::rfind. The stuff from ConvertUTF.h is an exception as it's contributed code, but its good to do the conversion to from StringRef to char* as late as possible to keep typecasting at a minimum.
- Replace 'const std::string &' with StringRef
- The preferred style is to put const in front of the type: "const int*" instead of "int const*"
- Opening braces always go on the same line as the declaration.
- Be consistent about spacing, "if (" instead of "if(", always put spaces after commas, …

- Ben

Okay, this should address the small issues:

0001-Nicer-display-of-unprintable-source-and-fix-caret-di.patch (35.9 KB)

iswprint does turn out to return different results; it's saying all non-ascii characters are unprintable.

I can implement a generic version of this stuff for platforms that don't have xlocale. In fact, since we don't yet have support for printing out non-ascii text on Windows I can replace my current Windows implementation with the generic implementation.

Here's the patch with that done:

0001-Nicer-display-of-unprintable-source-and-fix-caret-di.patch (34 KB)

It is not a portable assumption. At least NetBSD explicitly does not
make any such promise, since it can lose information. You can assume it
iff __STDC_ISO_10646__ is defined.

Joerg

Okay. The current version of the patch with a generic implementation
does not make this assumption.

- Seth

Would it be possible to move the platform specific stuff down to llvm ?

Also, instead of passing "SmallVector<int,200>"'s around could you create a class to encapsulate that functionality ?

-Argyrios

For function arguments it should use SmallVectorImpl instead of SmallVector<int, 200>. Its probably bad to return a SmallVector since you’d incur a copy on the return. If possible it would probably be better to pass in a SmallVector to be modified instead.

~Craig

Okay, here are patches for review with those changes. Built and 'make check’ed and 'make test’ed on OS X. Still building on Windows.

let me know if these are okay to commit.

  • Thanks

0001-platform-support-for-counting-column-widths-and-chec.patch (4.08 KB)

0001-Nicer-display-of-unprintable-source-and-fix-caret-di.patch (32.3 KB)

0002-fix-display-of-source-lines-with-null-characters.patch (1.96 KB)

This style is kind of hard to read as it makes arguments look like local variables at first glance. If it’s a line length problem, putting “static void” on a separate line is probably a better choice so that the argument can still line up with the the parenthese.

+static void selectInterestingSourceRegion(

  • std::string &SourceLine,
  • std::string &CaretLine,
  • std::string &FixItInsertionLine,
  • unsigned Columns,
  • const SourceColumnMap &map) {

It was a line length problem, but one of the other changes eliminated it.

0003-fix-parameter-formatting.patch (1.43 KB)

Oh, and the Windows build finished and looks fine.

byteToColumn and columnToByte bodies probably shouldn’t be buried inside SourceColumnMap. The indentation is confusing because of it.

Believe the general still is to keep the operator at the end of the previous line when splitting lines

  • assert(CaretStart!=(unsigned)-1 && CaretEnd!=(unsigned)-1

  • && SourceStart!=(unsigned)-1 && SourceEnd!=(unsigned)-1);

  • std::pair<SmallString<16>,bool> res

  • = printableTextForNextCharacter(line, &i, DiagOpts.TabStop);

This is a nit, but the ‘else’ isn’t needed since the ‘if’ ends in a return.

  • return std::make_pair(expandedCP, false);
  • } else {
  • // If next character is valid UTF-8, and printable
  • return std::make_pair(SmallString<16>(original_begin, cp_end), true);
  • }

~Craig

0004-pull-byteToColumn-and-columnToByte-out-of-SourceColu.patch (5.79 KB)

I have a patch I think should go in 3.1: A change in the clang
FrontEnd text diagnostics to display 'unprintable' source such as
source with a wrong encoding or characters that can't be displayed in
the terminal. Additionally range highlighting, fixups and caret
locations are corrected both for characters that take multiple columns
(e.g., wide east asian characters) and for the displayed 'unprintable'
characters.

This patch to the frontend in clang relies on an addition to llvm that
supports platform specific column width calculation and is_print
checks. (The area this code was added too appears 'unowned' on the
list of source code owners)

These changes to text diagnostics display are complimentary to the
improved support for unicode source added since clang 3.0.

Patches are attached, and emails from reviewers that suggested changes
(which have been applied) are quoted below.

0001-Nicer-display-of-unprintable-source-and-fix-caret-di.patch (32.2 KB)

0002-fix-display-of-source-lines-with-null-characters.patch (1.97 KB)

0003-Add-tests-for-diagnostics-on-unprintable-source.patch (1.64 KB)

0001-platform-support-for-counting-column-widths-and-chec.patch (4.1 KB)

Hi Seth,

I have a patch I think should go in 3.1: A change in the clang
FrontEnd text diagnostics to display 'unprintable' source such as
source with a wrong encoding or characters that can't be displayed in
the terminal. Additionally range highlighting, fixups and caret
locations are corrected both for characters that take multiple columns
(e.g., wide east asian characters) and for the displayed 'unprintable'
characters.

This patch to the frontend in clang relies on an addition to llvm that
supports platform specific column width calculation and is_print
checks. (The area this code was added too appears 'unowned' on the
list of source code owners)

These changes to text diagnostics display are complimentary to the
improved support for unicode source added since clang 3.0.

Patches are attached, and emails from reviewers that suggested changes
(which have been applied) are quoted below.

Thank you for working on this. I think it's a great fix for top-of-tree Clang. However, since we're not talking about fixing a regression and the vast majority of users are unlikely to ever run into this problem, I do not think we should take this patch for 3.1.

  - Doug

Okay, I'll commit to trunk in a bit then.

Thanks,
Seth

More nits for the llvm patch:

-Use lower case for the first letter of functions, as dictated by llvm conventions.
-For columnWidth accept a StringRef as parameter which is more versatile, and copy it inside the function to get a null-terminated c string. You are already copying the string anyway.
-For managing the ‘locale_holder’ object, use llvm::ManagedStatic, which handles lifetime of such objects and allows them to be created lazily when needed.

-Argyrios

These are addressed in the committed version.

llvm: r154944/fdc97cd2
clang: r154947/70712b27