Stmt::getLocEnd() behaviour

I've been playing with Clang to see what it can do, and thinking of
writing a Rewriter tutorial. I was thinking of rewriting statements
of the form:
   if (expr)
     stmt;
to:
   if (expr)
   {
      stmt;
   }
as a way to enforce some coding standards I've seen. I coupled a
RecursiveASTVisitor with a rewriter, and when visiting an IfStmt, I
expected getLocEnd() to point at the end of the if statement. What I
get instead is somewhere inside the stmt. Is that expected behaviour?
For example with the following input file test1.c:
int i = 4;
int j = 0;

int main()
{
  if (i < 4)
     j++;

  while (i < 6)
     i++;

  return 0;
}

I get the output:
int i = 4;
int j = 0;

int main()
{
  <Start If>if (i < 4)
     j<End If>++;

  <Start While>while (i < 6)
     i<End While>++;

  return 0;
}

so that the getLocEnd points to before the ++ operator.
The code I used was:
#include <iostream>

#include "llvm/Support/Host.h"
#include "llvm/Support/raw_ostream.h"

#include "clang/Frontend/CompilerInstance.h"
#include "clang/Basic/TargetOptions.h"
#include "clang/Basic/TargetInfo.h"
#include "clang/Basic/FileManager.h"
#include "clang/Basic/SourceManager.h"
#include "clang/Lex/Preprocessor.h"
#include "clang/Basic/Diagnostic.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/Parse/ParseAST.h"
#include "clang/Rewrite/Rewriters.h"
#include "clang/Rewrite/Rewriter.h"

clang::Rewriter Rewrite;

// RecursiveASTVisitor is is the big-kahuna visitor that traverses
// everything in the AST.
class MyRecursiveASTVisitor
    : public clang::RecursiveASTVisitor<MyRecursiveASTVisitor>
{
public:
  bool VisitStmt(clang::Stmt *s);
};

// Override Statements which includes expressions and more
bool MyRecursiveASTVisitor::VisitStmt(clang::Stmt *s)
{
  if (clang::isa<clang::IfStmt>(s))
  {
    Rewrite.InsertText(s->getLocStart(), "<Start If>", true, true);
    Rewrite.InsertText(s->getLocEnd(), "<End If>", true, true);
  }
  else
  if (clang::isa<clang::WhileStmt>(s))
  {
    Rewrite.InsertText(s->getLocStart(), "<Start While>", true, true);
    Rewrite.InsertText(s->getLocEnd(), "<End While>", true, true);
  }

  return true; // returning false aborts the traversal
}

class MyASTConsumer : public clang::ASTConsumer
{
public:
    virtual bool HandleTopLevelDecl(clang::DeclGroupRef d);
};

bool MyASTConsumer::HandleTopLevelDecl(clang::DeclGroupRef d)
{
  MyRecursiveASTVisitor rv;
  typedef clang::DeclGroupRef::iterator iter;

  for (iter b = d.begin(), e = d.end(); b != e; ++b) {
    rv.TraverseDecl(*b);
  }

  return true; // keep going
}

int main()
{
  using clang::CompilerInstance;
  using clang::TargetOptions;
  using clang::TargetInfo;
  using clang::FileEntry;

  CompilerInstance ci;
  ci.createDiagnostics(0,NULL);

  TargetOptions to;
  to.Triple = llvm::sys::getDefaultTargetTriple();
  TargetInfo *pti = TargetInfo::CreateTargetInfo(ci.getDiagnostics(), to);
  ci.setTarget(pti);

  ci.createFileManager();
  ci.createSourceManager(ci.getFileManager());
  ci.createPreprocessor();
  ci.getPreprocessorOpts().UsePredefines = false;
  MyASTConsumer *astConsumer = new MyASTConsumer();
  ci.setASTConsumer(astConsumer);

  ci.createASTContext();

  // Initialize rewriter
  Rewrite.setSourceMgr(ci.getSourceManager(), ci.getLangOpts());

  const FileEntry *pFile = ci.getFileManager().getFile("test1.c");
  ci.getSourceManager().createMainFileID(pFile);
  ci.getDiagnosticClient().BeginSourceFile(ci.getLangOpts(),
                                           &ci.getPreprocessor());
  clang::ParseAST(ci.getPreprocessor(), astConsumer, ci.getASTContext());
  ci.getDiagnosticClient().EndSourceFile();

  // Now output rewritten source code
  const clang::RewriteBuffer *RewriteBuf =
           Rewrite.getRewriteBufferFor(ci.getSourceManager().getMainFileID());
  llvm::errs() << std::string(RewriteBuf->begin(), RewriteBuf->end());

  return 0;
}

A few more questions:
How do I find the points inside the if statement to insert braces?
(after the expression and after the statement)
How best to check if the statement within the if is a CompoundStmt?

Thanks!
Robert Ankeney

Arguably a bug... it's a side-effect of getLocEnd on if statements
forwarding to the contained statement, which is an expression in this
case. That said, getting from one SourceLocation to the other isn't
too hard with MeasureTokenLength.

-Eli

I coupled a
RecursiveASTVisitor with a rewriter, and when visiting an IfStmt, I
expected getLocEnd() to point at the end of the if statement. What I
get instead is somewhere inside the stmt. Is that expected behaviour?

My recollection is that getLocEnd() returns the start of the last token, not the end of it, which generally means that you have to always do a scan to find the end of the current token from the location anyways.

That said, considering that braces appear to use the location of the `}' as the end location, the fact that a statement expression doesn't reset the end location to the location of the `;' is probably a bug.

A few more questions:
How do I find the points inside the if statement to insert braces?
(after the expression and after the statement)

You can get the location of the then statement as an open location, and the locEnd (modulo what I said above) as a close location.

How best to check if the statement within the if is a CompoundStmt?

if (isa<CompoundStmt>(ifstmt.getThen()))

As a final note, I might advise that it's probably better to run the output through your favorite code prettifier than to compute things like indentation in a clang pass.