Assertion on valid use of highlighting whitespace

Hi all,

Whilst fixing bug 7377, I ran into an assertion when trying to highlight code in the following warning:

flag ' ' results in undefined behavior in 'p' conversion specifier

triggered by: printf("% p", p);

Assertion failed: (StartColNo <= EndColNo && "Trying to highlight whitespace??"), function HighlightRange, file /Volumes/Data/Users/tcare/Projects/llvm/tools/clang/lib/Frontend/TextDiagnosticPrinter.cpp, line 138.

The assertion is triggered by trying to highlight the format specifier ("% p")

This assertion seems valid, except for this rare case. Maybe there are others?

Any ideas on how to fix this?


I think there might be other things broken in this function too.

I’m having problems trying to get highlighting/fixits working for parts of printf statements (rather than the whole thing).

Currently, if I give a SourceLocation within a printf formatting string, it will highlight it from the beginning until the end of the format string (rather than the end I specified). For example:

/tmp/fixit.c:13:15: warning: precision used with ‘n’ conversion specifier,
resulting in undefined behavior [-Wformat]
printf("%100.100n", (int*) 0);

When what I expect is:

/tmp/fixit.c:13:15: warning: precision used with ‘n’ conversion specifier,
resulting in undefined behavior [-Wformat]
printf("%100.100n", (int*) 0);

This ends up breaking the fixits, which delete the rest of the format string rather than the invalid part.

The offending line is lib/Frontend/TextDiagnosticPrinter.cpp:116

// Add in the length of the token, so that we cover multi-char tokens.
EndColNo += Lexer::MeasureTokenLength(End, SM, *LangOpts);

I’m sure this line has importance, but I don’t think it is correct.

I’m a bit lost in this section! Does anyone have any ideas?


The mental model for a source range is that it is a pair of source locations [B, E), where both B and E are expected to point at the beginning of the token. That line above is relexing the token at the location E to find the end of the token. Since you’ve adjusted the ranges to point at specific characters, when you’re getting is E adjusted to the end of the “token” where you’re pointing… which is in the middle of a string literal, hence the somewhat odd highlighting behavior.

We would need to introduce a new notion (e.g., a new “CharacterSourceRange”) that forgoes this adjustment and treats the ending source location in the range as character-level positions in the source rather than token-level positions. CharacterSourceRange would only be used in limited places when needed, such as FixItHint and perhaps Diagnostic, and all affected clients would need to be updated.

  • Doug

I’ll implement this.

r106338 should do this. Please let me know if you run into any problems with it.


Hi Chris,

Works great except now I’m getting off by 1 problems at the end of the string:

/tmp/fixit.c:41:12: warning: flag ‘0’ is ignored when flag ‘-’ is present
printf("%0-f", 1.23);

Should my SourceLocation for the end byte be pointing to the ‘f’ or the ‘"’? It’s currently pointing to the ‘f’.


You tell me. Right now, the ranges are set up for half open ranges [start, end). If you want fully inclusive ranges [start, end] I can make that work, just let me know what feels most natural from a client standpoint.


Please keep it as half-open!

  • Doug

Ok. Tom, please add one! :slight_smile:


OK no problem. I’ll update SemaChecking to handle this.