End SourceLocation for multi-line Tokens?

How would one get the end of a Token if it spans several lines, such as a BCPL comment? I can clearly get the start position and length, but the token may contain any number of line breaks. I suppose I could use the preprocessor to get the text of the token, count the number of line breaks, and calculate the end line number and column, but is there an easier way?

Thanks,

e

Are you asking about something like:

foo\
bar??/
baz

? If so, the source location for the token will point to the start of foo. Lexer::getTokenLength will return the full extent of the token. Adding it to the start location will give you a location that points to the space after the 'z'.

-Chris

Same thing works for those. An example s HTMLRewriter.cpp, html::SyntaxHighlight:

     unsigned TokOffs = SourceMgr.getFullFilePos(Tok.getLocation());
     unsigned TokLen = Tok.getLength();
...
     case tok::comment:
       HighlightRange(RB, TokOffs, TokOffs+TokLen, BufferStart,
                      "<span class='comment'>", "</span>");

Note that the lexer doesn't return comments unless specifically in keep-comment-mode (which corresponds to -C when using -E mode in the driver). See html::SyntaxHighlight as an example that cons's up a lexer to pull out comments on the fly.

-Chris

To get the end position, you need to get a SourceLocation for the end of the token, then ask SM the line/col of that location. This will properly handle the various craziness that C includes.

See this post for info on getting the source location for the end of a token:
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-April/001501.html

Incidentally, improvements to the internals document are very welcome :slight_smile:

-Chris