Crash on token spelling for a comment (found in -verify)

While writing some new tests I found a simple example that crashes under clang’s -verify. I’m a little concerned that this could hit real code (especially in things like IDE plugins, etc that may be more interesting in the spelling of tokens).

The repro is simple:

// \

this causes the verify mode to try to get the spelling of this comment token. So long as the following line is empty, this crashes due to going off-the-end inside the loop around Lexer.cpp:306-310. The getCharAndSizeNoWarn is called just at the trailing ‘’ and it cannot fulfill it’s contract as there is no valid character for it to return. Instead it returns a character off the end of the buffer & an increment count that puts Ptr one unit /beyond/ ‘End’. The loop will now never satisfy its exit criteria & walks into memory that it shouldn’t.

Just wondering if anyone has some nice ideas about how to fix this - I assume it’s rather perf critical so I don’t want to go mucking with it too ham-fistedly. The ‘obvious’ thing from my perspective would be to do the walk forward at the previous character rather than when we’re actually at the ‘’, but this interferes with the fast path. The alternative seems to be to have getCharAndSize[NoWarn] return a boolean about whether or not it was able to read a char - but that might have similar problems.

Ideas welcome, otherwise I’ll just have a tinker & see what sort of perf results (any standard clang perf benchmarks would be nice)

  • David

I saw this issue when calling clang -E on MFC code:

Oh, thanks! - so it is a real(er) bug. Still interested in any opinions on a good fix.

  • David