Problem with retrieving the BinaryOperator RHS end location

I am just trying to print the whole BinaryOperator statement text verbatim, but I can not get the end of the statement location.


bool VisitBinaryOperator(BinaryOperator  *Stmt) {
         const char       *startBuf;
         const char       *endBuf;
         SourceManager    *SM;

         SM = &TheRewriter.getSourceMgr(); // based on Rewriter
         startBuf = SM->getCharacterData( Stmt->getLHS()->getBeginLoc());
         endBuf   = SM->getCharacterData( Stmt->getRHS()->getEndLoc());
         fprintf( stderr, "write: \"%.*s\"\n", (int) (endBuf - startBuf), startBuf);
         return( true);
}

For an input of:


void   x( void)
{
    void  (*y)(void);

    y = x;
}

I get


write: "y = "

This is the dump of the AST:


FunctionDecl 0x19d3bd8 </home/src/srcL/llvm-clang-samples/y.m:1:1, line:6:1> line:1:8 referenced x 'void (void)'
`-CompoundStmt 0x19d3ec8 <line:2:1, line:6:1>
   >-DeclStmt 0x19d3e38 <line:3:4, col:20>
   > `-VarDecl 0x19d3dd8 <col:4, col:19> col:12 used y 'void (*)(void)'
   `-BinaryOperator 0x19d3ea8 <line:5:4, col:8> 'void (*)(void)' '='
     >-DeclRefExpr 0x19d3e50 <col:4> 'void (*)(void)' lvalue Var 0x19d3dd8 'y' 'void (*)(void)'
     `-ImplicitCastExpr 0x19d3e90 <col:8> 'void (*)(void)' <FunctionToPointerDecay>
       `-DeclRefExpr 0x19d3e70 <col:8> 'void (void)' Function 0x19d3bd8 'x' 'void (void)'

This is my first foray into clang rewriting...

Ciao

Nat!

See http://clang.llvm.org/docs/InternalsManual.html#sourcerange-and-charsourcerange

The "EndLoc" of an AST node is generally the location of the final
token (generally, SourceRanges are ranges of tokens, inclusive of both
endpoints), so you need to expand the end of the region by the length
of the last token.

Thanks, that was the trick. I added "endBuf += Lexer::MeasureTokenLength( Stmt->getEndLoc(), *SM, TheRewriter.getLangOpts());" to my function

and I am now getting the complete line.

Thanks again

Nat!

It turns out the fix suggestion - at least as I implemented it - works for simple cases, but not in general.

endBuf  = SM->getCharacterData( Stmt->getRHS()->getEndLoc());
endBuf += Lexer::MeasureTokenLength( Stmt->getEndLoc(), *SM, TheRewriter.getLangOpts());

I believe the problem is that the SourceManager I am using
to "measure" the length is looking at the unmodified sourcecode. But
the RewriteBuffer may already contain changes.

So in my example if I am looking at `x = yyy` at the source but
in the rewrite buffer its now `x=yyy_renamed` it will not take the
extra characters into account.

I should be measuring the contents of the RewriteBuffer instead,
with likely another SourceManager, but I lack the expertise to set
this up.

Ciao
    Nat!

Can you post a more-complete sscce example of what you are trying to do?

Thanks,

Stephen

I think this is the simplest code, that exhibits my problem.

This code rewrites direct struct access like p->x

into a C function-call getX( p).


//------------------------------------------------------------------------------
// Tooling sample. Demonstrates:
//
// * How to write a simple source tool using libTooling.
// * How to use RecursiveASTVisitor to find interesting AST nodes.
// * How to use the Rewriter API to rewrite the source code.
//
// Eli Bendersky (eliben@gmail.com)
// This code is in the public domain
//------------------------------------------------------------------------------
#include <sstream>
#include <string>

#include "clang/AST/AST.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Frontend/ASTConsumers.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendActions.h"
#include "clang/Rewrite/Core/Rewriter.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
#include "llvm/Support/raw_ostream.h"

using namespace clang;
using namespace clang::driver;
using namespace clang::tooling;

static llvm::cl::OptionCategory ToolingSampleCategory("Tooling Sample");

// By implementing RecursiveASTVisitor, we can specify which AST nodes
// we're interested in by overriding relevant methods.
class MyASTVisitor : public RecursiveASTVisitor<MyASTVisitor> {
public:
   MyASTVisitor(Rewriter &R) : TheRewriter(R) {}

   bool VisitMemberExpr(MemberExpr *Expr) {
     SourceManager   *SM;
     int             length;
     const char      *startBuf;
     const char      *endBuf;
     const char      *opBuf;

     SM       = &TheRewriter.getSourceMgr();
     startBuf = SM->getCharacterData( Expr->getBeginLoc());
     opBuf    = SM->getCharacterData( Expr->getOperatorLoc());
     endBuf   = SM->getCharacterData( Expr->getEndLoc());
     length   = endBuf - startBuf;

     std::string  front( startBuf, opBuf - startBuf);

     length  += Lexer::MeasureTokenLength( Expr->getEndLoc(), *SM, TheRewriter.getLangOpts());

     std::string  origin( startBuf, length);
     std::string  replace;

     replace = std::string( "getX(") + front + std::string( ")");

     fprintf( stderr, "rewrite: \"%s\" -> \"%s\"\n", origin.c_str(), replace.c_str());
     TheRewriter.ReplaceText( Expr->getBeginLoc(), length, replace);

     return true;
   }

private:
   Rewriter &TheRewriter;
};

// Implementation of the ASTConsumer interface for reading an AST produced
// by the Clang parser.
class MyASTConsumer : public ASTConsumer {
public:
   MyASTConsumer(Rewriter &R) : Visitor(R) {}

   // Override the method that gets called for each parsed top-level
   // declaration.
   bool HandleTopLevelDecl(DeclGroupRef DR) override {
     for (DeclGroupRef::iterator b = DR.begin(), e = DR.end(); b != e; ++b) {
       // Traverse the declaration using our AST visitor.
       Visitor.TraverseDecl(*b);
       (*b)->dump();
     }
     return true;
   }

private:
   MyASTVisitor Visitor;
};I changed my theory is,

// For each source file provided to the tool, a new FrontendAction is created.
class MyFrontendAction : public ASTFrontendAction {
public:
   MyFrontendAction() {}
   void EndSourceFileAction() override {
     SourceManager &SM = TheRewriter.getSourceMgr();
     llvm::errs() << "** EndSourceFileAction for: "
                  << SM.getFileEntryForID(SM.getMainFileID())->getName() << "\n";

     // Now emit the rewritten buffer.
TheRewriter.getEditBuffer(SM.getMainFileID()).write(llvm::outs());
   }

   std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
                                                  StringRef file) override {
     llvm::errs() << "** Creating AST consumer for: " << file << "\n";
     TheRewriter.setSourceMgr(CI.getSourceManager(), CI.getLangOpts());
     return llvm::make_unique<MyASTConsumer>(TheRewriter);
   }

private:
   Rewriter TheRewriter;
};

int main(int argc, const char **argv) {
   CommonOptionsParser op(argc, argv, ToolingSampleCategory);
   ClangTool Tool(op.getCompilations(), op.getSourcePathList());

   return Tool.run(newFrontendActionFactory<MyFrontendAction>().get());
}

With input


struct _x
{
    struct _x  *x;
    int        y;
};

int   getXXY( struct _x *p)
{
    return( p->x->x->y);
}

it produces


RecordDecl 0x27cdad8 </usr/local/llvm/srcL/llvm-clang-samples/z.c:1:1, line:5:1> line:1:8 struct _x definition
> -FieldDecl 0x282a620 <line:3:4, col:16> col:16 x 'struct _x *'

`-FieldDecl 0x282a680 <line:4:4, col:15> col:15 y 'int'
rewrite: "p->x->x->y" -> "getX(p->x->x)"
rewrite: "p->x->x" -> "getX(p->x)"
rewrite: "p->x" -> "getX(p)"
FunctionDecl 0x282a7e0 </usr/local/llvm/srcL/llvm-clang-samples/z.c:8:1, line:11:1> line:8:7 getXXY 'int (struct _x *)'
> -ParmVarDecl 0x282a6f0 <col:15, col:26> col:26 used p 'struct _x *'

`-CompoundStmt 0x282aa00 <line:9:1, line:11:1>
   `-ReturnStmt 0x282a9f0 <line:10:4, col:22>
     `-ImplicitCastExpr 0x282a9d8 <col:10, col:22> 'int' <LValueToRValue>
       `-ParenExpr 0x282a9b8 <col:10, col:22> 'int' lvalue
         `-MemberExpr 0x282a988 <col:12, col:21> 'int' lvalue ->y 0x282a680
           `-ImplicitCastExpr 0x282a970 <col:12, col:18> 'struct _x *' <LValueToRValue>
             `-MemberExpr 0x282a940 <col:12, col:18> 'struct _x *' lvalue ->x 0x282a620
               `-ImplicitCastExpr 0x282a928 <col:12, col:15> 'struct _x *' <LValueToRValue>
                 `-MemberExpr 0x282a8f8 <col:12, col:15> 'struct _x *' lvalue ->x 0x282a620
                   `-ImplicitCastExpr 0x282a8e0 <col:12> 'struct _x *' <LValueToRValue>
                     `-DeclRefExpr 0x282a8c0 <col:12> 'struct _x *' lvalue ParmVar 0x282a6f0 'p' 'struct _x *'
** EndSourceFileAction for: /usr/local/llvm/srcL/llvm-clang-samples/z.c
struct _x
{
    struct _x  *x;
    int        y;
};

int   getXXY( struct _x *p)
{
    return( getX(p)(p->x)>x->x));
}

From a few more tests along the way, my current pet theory is, that the change from member access to a function call is tripping something up. If I rewrite just the righthand side of the expression, it works OK.

Ciao

Nat\!

I accidentally added some garbage in the middle of the source while writing the mail, here is the corrected source.

Sorry for that.


//------------------------------------------------------------------------------ 

// Tooling sample. Demonstrates:
//
// * How to write a simple source tool using libTooling.
// * How to use RecursiveASTVisitor to find interesting AST nodes.
// * How to use the Rewriter API to rewrite the source code.
//
// Eli Bendersky (eliben@gmail.com)
// This code is in the public domain
//------------------------------------------------------------------------------ 

#include <sstream>
#include <string>

#include "clang/AST/AST.h"
#include "clang/AST/ASTConsumer.h"
#include "clang/AST/RecursiveASTVisitor.h"
#include "clang/Frontend/ASTConsumers.h"
#include "clang/Frontend/CompilerInstance.h"
#include "clang/Frontend/FrontendActions.h"
#include "clang/Rewrite/Core/Rewriter.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
#include "llvm/Support/raw_ostream.h"

using namespace clang;
using namespace clang::driver;
using namespace clang::tooling;

static llvm::cl::OptionCategory ToolingSampleCategory("Tooling Sample");

// By implementing RecursiveASTVisitor, we can specify which AST nodes
// we're interested in by overriding relevant methods.
class MyASTVisitor : public RecursiveASTVisitor<MyASTVisitor> {
public:
   MyASTVisitor(Rewriter &R) : TheRewriter(R) {}

   bool VisitMemberExpr(MemberExpr *Expr) {
     SourceManager   *SM;
     int             length;
     const char      *startBuf;
     const char      *endBuf;
     const char      *opBuf;

     SM       = &TheRewriter.getSourceMgr();
     startBuf = SM->getCharacterData( Expr->getBeginLoc());
     opBuf    = SM->getCharacterData( Expr->getOperatorLoc());
     endBuf   = SM->getCharacterData( Expr->getEndLoc());
     length   = endBuf - startBuf;

     std::string  front( startBuf, opBuf - startBuf);

     length  += Lexer::MeasureTokenLength( Expr->getEndLoc(), *SM, TheRewriter.getLangOpts());

     std::string  origin( startBuf, length);
     std::string  replace;

     replace = std::string( "getX(") + front + std::string( ")");

     fprintf( stderr, "rewrite: \"%s\" -> \"%s\"\n", origin.c_str(), replace.c_str());
     TheRewriter.ReplaceText( Expr->getBeginLoc(), length, replace);

     return true;
   }

private:
   Rewriter &TheRewriter;
};

// Implementation of the ASTConsumer interface for reading an AST produced
// by the Clang parser.
class MyASTConsumer : public ASTConsumer {
public:
   MyASTConsumer(Rewriter &R) : Visitor(R) {}

   // Override the method that gets called for each parsed top-level
   // declaration.
   bool HandleTopLevelDecl(DeclGroupRef DR) override {
     for (DeclGroupRef::iterator b = DR.begin(), e = DR.end(); b != e; ++b) {
       // Traverse the declaration using our AST visitor.
       Visitor.TraverseDecl(*b);
       (*b)->dump();
     }
     return true;
   }

private:
   MyASTVisitor Visitor;
};

// For each source file provided to the tool, a new FrontendAction is created.
class MyFrontendAction : public ASTFrontendAction {
public:
   MyFrontendAction() {}
   void EndSourceFileAction() override {
     SourceManager &SM = TheRewriter.getSourceMgr();
     llvm::errs() << "** EndSourceFileAction for: "
                  << SM.getFileEntryForID(SM.getMainFileID())->getName() << "\n";

     // Now emit the rewritten buffer.
TheRewriter.getEditBuffer(SM.getMainFileID()).write(llvm::outs());
   }

   std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
                                                  StringRef file) override {
     llvm::errs() << "** Creating AST consumer for: " << file << "\n";
     TheRewriter.setSourceMgr(CI.getSourceManager(), CI.getLangOpts());
     return llvm::make_unique<MyASTConsumer>(TheRewriter);
   }

private:
   Rewriter TheRewriter;
};

int main(int argc, const char **argv) {
   CommonOptionsParser op(argc, argv, ToolingSampleCategory);
   ClangTool Tool(op.getCompilations(), op.getSourcePathList());

   return Tool.run(newFrontendActionFactory<MyFrontendAction>().get());
}

For ease of reproduction I put everything together into a small github project.

https://github.com/mulle-nat-stash/cfe-mailinglist

Assuming I didn't goof completely, I can think of three ways this could be fixable:

* abort visiting after the first substitution and reparse everything (I don't know how to do this, but I can probably find out)

* hack the AST on the fly to represent the change from MemberExpr to a FunctionCall (I don't know if this would help at all)

* get a second SourceManager on the RewriteBuffer and calculate extents with it (I don't know how and if this is possible either)

Ciao

Nat\!

I just made a branch on my tooling repo to attack your issue. See https://github.com/firolino/clang-tool/blob/nat-example/src/transformer/nattransformer.cc

Just execute “bin/clang-tool …/examples/simple.cc --”. It transforms

int getXXY(struct _x *p)
{
p->x->y;
return( p->x->x->y);
}

to

int getXXY(struct _x *p)
{
getX(p)->y;
return( getX(getX(p))->y);
}

Hopefully it helps you. I had the same issues as you have in the past as well. Mostly cause of ReplaceText. So I stopped using it :slight_smile:

I copied your code, replacing ReplaceText with InsertText and RemoveText to make my function read now:


bool VisitMemberExpr(MemberExpr *Expr) {
TheRewriter.InsertText(Expr->getSourceRange().getBegin(), "getX(");
TheRewriter.InsertText(Expr->getOperatorLoc(), ")");
TheRewriter.RemoveText(SourceRange(Expr->getOperatorLoc(), Expr->getSourceRange().getEnd().getLocWithOffset(-1)));
return true;
}

and it works. Thanks a lot Firat!

Nat!