AST UnaryOperator subexpression

Hi,

I am trying to walk the AST tree and wrap all the dereferenced expressions
"*expr" with macro, e.g. "*DEREF_EXPR(expr)", but I get strange behavior with
following code snippet that should do the wrapping:

const UnaryOperator *pUnaryOperator = dyn_cast<const UnaryOperator>(pExpr);
assert(pUnaryOperator);
if (UnaryOperator::Deref == pUnaryOperator->getOpcode())
{
    const Expr *pSubExpr = pUnaryOperator->getSubExpr();
    mRewriter.InsertCStrBefore(pSubExpr->getSourceRange().getBegin(),
"DEREF_EXPR(");
    mRewriter.InsertCStrAfter(pSubExpr->getSourceRange().getEnd(), ")");
}

When the subexpression of the dereference operator (*) is another compound
expression, e.g. "(bar+1)" in following code sample

void foo(void) {
    unsigned *bar = 0;
    *(bar+1) = 1;
}

I get the expected result

void foo(void) {
    unsigned *bar = 0;
    *DEREF_EXPR((bar+1)) = 1;
}

But when the subexpression is leaf token

void foo(void) {
    unsigned *bar = 0;
    *bar = 1;
}

I get

void foo(void) {
    unsigned *bar = 0;
    *DEREF_EXPR()bar = 1;
}

instead of expected

void foo(void) {
    unsigned *bar = 0;
    *DEREF_EXPR(bar) = 1;
}

I would assume that pSubExpr->getSourceRange().getBegin() and
pSubExpr->getSourceRange().getEnd() source locations should point BEFORE and
AFTER the subexpression, but it does not work as expected. In case of leaf token
subexpression both point to the same source location (before the token). What am
I doing wrong?

Thanks,

Petr

I am trying to walk the AST tree and wrap all the dereferenced expressions
"*expr" with macro, e.g. "*DEREF_EXPR(expr)", but I get strange behavior with
following code snippet that should do the wrapping:

Ok!

const UnaryOperator *pUnaryOperator = dyn_cast<const >(pExpr);
assert(pUnaryOperator);
if (UnaryOperator::Deref == pUnaryOperator->getOpcode())
{
   const Expr *pSubExpr = pUnaryOperator->getSubExpr();
   mRewriter.InsertCStrBefore(pSubExpr->getSourceRange().getBegin(),
"DEREF_EXPR(");
   mRewriter.InsertCStrAfter(pSubExpr->getSourceRange().getEnd(), ")");
}

This is almost exactly right.

When the subexpression of the dereference operator (*) is another compound
expression, e.g. "(bar+1)" in following code sample

void foo(void) {
   unsigned *bar = 0;
   *(bar+1) = 1;
}

I get the expected result

void foo(void) {
   unsigned *bar = 0;
   *DEREF_EXPR((bar+1)) = 1;
}

Actually, you're getting the wrong result here. The trick is that it is inserting the new ")" *before* the old one. Because the new and old token are the same, it happens to render correctly.

But when the subexpression is leaf token

void foo(void) {
   unsigned *bar = 0;
   *bar = 1;
}

I get

void foo(void) {
   unsigned *bar = 0;
   *DEREF_EXPR()bar = 1;
}

instead of expected

void foo(void) {
   unsigned *bar = 0;
   *DEREF_EXPR(bar) = 1;
}

Right. This occurs because 'bar' is not a ')'. In both cases, you're inserting the token *before* the last token in the range.

I would assume that pSubExpr->getSourceRange().getBegin() and
pSubExpr->getSourceRange().getEnd() source locations should point BEFORE and
AFTER the subexpression, but it does not work as expected. In case of leaf token
subexpression both point to the same source location (before the token). What am
I doing wrong?

Actually they don't. The trick here is that "end" points to the *start* of the last token in the range. This makes construction of the source ranges much more clean and simple, but pushes some logic into the clients. Basically, to insert text after the end of the range, you have to add the length of the last token. Luckily, this is really easy to get :slight_smile:

The HTML rewriter uses code like this:

   SourceManager &SM = ...
   SourceLocation E = Range.getEnd();

   // If E is a macro expansion, we want the instantiation location. You should determine how you want to handle this. There are many possible strategies.
   E = SM.getLogicalLoc(E);

   // Add the size of the end token.
   E = E.getFileLocWithOffset(Lexer::MeasureTokenLength(E, SM));
   mRewriter.InsertCStrAfter(E, ")");

Please let me know if that doesn't work.

-Chris

Chris Lattner <clattner@...> writes:

> I would assume that pSubExpr->getSourceRange().getBegin() and
> pSubExpr->getSourceRange().getEnd() source locations should point
> BEFORE and
> AFTER the subexpression, but it does not work as expected. In case
> of leaf token
> subexpression both point to the same source location (before the
> token). What am
> I doing wrong?

Actually they don't. The trick here is that "end" points to the
*start* of the last token in the range. This makes construction of
the source ranges much more clean and simple, but pushes some logic
into the clients. Basically, to insert text after the end of the
range, you have to add the length of the last token. Luckily, this is
really easy to get :slight_smile:

OK, makes sense. I was confused because my sample inserted the same token as was
already present at the end (")") so I did not notice something the difference.

   // If E is a macro expansion, we want the instantiation location.
You should determine how you want to handle this. There are many
possible strategies.
   E = SM.getLogicalLoc(E);

   // Add the size of the end token.
   E = E.getFileLocWithOffset(Lexer::MeasureTokenLength(E, SM));
   mRewriter.InsertCStrAfter(E, ")");

Please let me know if that doesn't work.

Simple fix and works as expected. I guess I was looking in wrong places when
looking for relevant code samples before.

-Chris

Thanks for your help!

-Petr