GotoStmt::getSourceRange, bug?

I think there is a bug in the current implementation of getSourceRange for GotoStmt:

   class GotoStmt : public Stmt {
     LabelStmt *Label;
     SourceLocation GotoLoc;
   public:
     ...
     virtual SourceRange getSourceRange() const {
       return SourceRange(GotoLoc, Label->getLocEnd());
     }
     ...
   };

I believe that getting the end SourceLocation using Label->getLocEnd() is incorrect. While the target LabelStmt is referenced by GotoStmt, it isn't a substatement in the tree. Consider the following snippet of code:

   l1: goto l1;

Here, the GotoStmt is a substatement of the LabelStmt for "l1". The implementation of getSourceRange() in LabelStmt is as follows (notice the call to SubStmt->getLocEnd):

   class LabelStmt : public Stmt {
     IdentifierInfo *Label;
     Stmt *SubStmt;
     SourceLocation IdentLoc;
   public:
     ...
     virtual SourceRange getSourceRange() const {
       return SourceRange(IdentLoc, SubStmt->getLocEnd());
     }
    ....
   };

There is an unbounded recursion here. While LabelStmt is correctly calling getLocEnd for its substatement, GotoStmt is incorrectly calling getLocEnd for the target LabelStmt.

The question is that do we have enough extent information recorded in GotoStmt to accurately recreate the full source range for a goto statement?

The question is that do we have enough extent information recorded in
GotoStmt to accurately recreate the full source range for a goto
statement?

Not currently. It looks like we need to add a "LabelLoc" to the AST (this is already passed into the actions, so it's trivial to do). Make sense?

Thanks for finding this,

snaroff

Cool. I've committed a patch. I'm terribly familiar with the mechanics of SourceLocations, so if someone could audit the patch that would be great.

In particular, I'm not so certain about the following line in GotoStmt:getSourceRange:

   return SourceRange(GotoLoc, LabelLoc);

This extent will only include from the start of the goto statement to the start of the label token (and not the end of the label token). Is this what we want?

It is. It will derive the end of the label token through the magic of the routine below. This enables us to compute the range of identifiers (which are quite common) without caching <beginloc,endloc> pairs.

Since the range protocol is typically used for error/diagnostics, performance isn’t a problem…

snaroff

/// GetTokenLength - Given the source location of a token, determine its length.
/// This is a fully general function that uses a lexer to relex the token.
unsigned TextDiagnosticPrinter::GetTokenLength(SourceLocation Loc) {
// If this comes from a macro expansion, we really do want the macro name, not
// the token this macro expanded to.
Loc = SourceMgr.getLogicalLoc(Loc);
const char *StrData = SourceMgr.getCharacterData(Loc);

// TODO: this could be special cased for common tokens like identifiers, ‘)’,
// etc to make this faster, if it mattered. This could use
// Lexer::isObviouslySimpleCharacter for example.

// Create a lexer starting at the beginning of this token.
Lexer TheLexer(Loc, *ThePreprocessor, StrData);
Token TheTok;
TheLexer.LexRawToken(TheTok);
return TheTok.getLength();
}