wrong getLocEnd() returns

Hi clang devs,

Using clang 3.3, I noticed that some classes (clang::FloatingLiteral
for exemple) lacks the End Location information, returning the Start
Location instead.

As an example, these sources
void function() {
float f = 0.0;
}

gives an AST like (I just keep the interesting part):
CompoundStmt 0x5457228 <test1.c:1:17, line:3:1>
`-DeclStmt 0x5457210 <line:2:1, col:14> // Ok this that
  `-VarDecl 0x5457180 <col:1, col:11> f 'float' // It should end at 13
    `-ImplicitCastExpr 0x54571f8 <col:11> 'float' <FloatingCast>
// Same here
      `-FloatingLiteral 0x54571d8 <col:11> 'double' 0.000000e+00 // Same here

It is problematic for me, because I can't trust the return of
getLocEnd() methods (I want to rewrite sources and need precise
locations).
Patching this seemed not hard to me, so I gave it a try and succeeded
for FloatingLiteral.

I was wondering if there were good reasons NOT doing this patch for
all objects ?
Would you be interested in such patches ?
Are there other ways to retrieve precise end locations ?

Thanks,
  Nicolas

For the most part, Clang's SourceLocations are token-granular; therefore, the "end location" is the location of the last token in a statement. To actually get the location after the last character, you'll need to use Lexer::getLocForEndOfToken.

Hope that helps,
Jordan

Hi jordan,

What bothers me is that there is no simple way to detect the end of a
statement/declaration. At least, I'm no more blocked, I can get it
manually.

Thanks,
  Nicolas

All the getLocEnd locations should be correct. The Exprs that return the start location are those known to consist of a single token. If there are any incorrect end locations, that's a bug and these should be fixed, but in the example you gave both VarDecl and its subexpressions have the correct ending location, which is the beginning of the last token in the expression.

The idea is that it's easy to go from the last token location to the "one character past the end" location, but not the other way around. At the very least, doing it this way optimizes storage for single-token expressions.

Jordan