ASTMatcher for assert()

Hello,

I am struggling to replace "assert(c)" macro calls by "BAR(c)" where c
is some integer expression and BAR is a function.

The problem manifests itself in RefactoringTool::applyAllReplacements
which returns false, thereby causing the tool to skip replacements.

I am using the following ASTMatcher:

StatementMatcher makeAssertMatcher() {
  return callExpr(callee(functionDecl(hasName("__builtin_expect")))).bind("AssertBindId");
}

Note: callExpr(callee(functionDecl(hasName("assert"))) won't work here
because assert() is a macro according to the assert.h and cassert
headers.

Here's an example match using clang-query:

example.cpp:

  #include <assert.h>
  int main() {
    assert(false);
    return 0;
  }

$ clang-query> match callExpr(callee(functionDecl(hasName("__builtin_expect"))))

  Match #1:

  example.cpp:4:3: note: "AssertBindId" binds here
    assert(false);
    ^~~~~~~~~~~~~
  /usr/include/assert.h:93:6: note: expanded from macro 'assert'
      (__builtin_expect(!(e), 0) ? __assert_rtn(__func__, __FILE__,
__LINE__, #e) : (void)0)
       ^~~~~~~~~~~~~~~~~~~~~~~~~
  1 match.

But I cannot find the correct way to rewrite the match; the following
replacement is skipped (i.e. see line 295 in Refactoring.cpp):

  const CallExpr *E = Result.Nodes.getNodeAs<CallExpr>("AssertBindId");
  SourceManager &SM = *Result.SourceManager;
  SourceLocation LocBegin = E->getLocStart();
  Replace->insert(tooling::Replacement(SM, LocBegin,
    /* number of letters in assert */ 6, "BAR"));

What is the correct way (if any) to rewrite call expressions of macros
such as "assert()"?

Best,
Alex

You need to get the expansion location:
SourceLocation ExpLocBegin = SM.getExpansionLoc(LocBegin);
Replace->insert(tooling::Replacement(SM, ExpLocBegin, 6, "BAR"));

Something like this should do the trick...
Cheers,
/Manuel

Thanks for the effective response. This helped on the right track and
I've successfully used SourceManager::getImmediateExpansionRange() to
insert BAR and its surrounding parenthesis correctly.

As a directly related issue: The proposed technique of using the
expansion range does the trick but of course requires that "-cc1
-rewrite-macros" is _not_ invoked prior to rewriting "assert()"
macros. However, this is problematic when other rewrites in the same
RefactoringTool _do_ rely on macros having been rewritten. This poses
the question whether these things can coexist in the same
RefactoringTool. In particular, is it possible to call the
RewriteMacros functionality only after certain matches have completed?

To ask the same thing differently, what is the preferred way to
compose ASTMatchers and cc1 drivers such as RewriteMacros?

Thanks for the effective response. This helped on the right track and
I've successfully used SourceManager::getImmediateExpansionRange() to
insert BAR and its surrounding parenthesis correctly.

As a directly related issue: The proposed technique of using the
expansion range does the trick but of course requires that "-cc1
-rewrite-macros" is _not_ invoked prior to rewriting "assert()"
macros. However, this is problematic when other rewrites in the same
RefactoringTool _do_ rely on macros having been rewritten. This poses
the question whether these things can coexist in the same
RefactoringTool. In particular, is it possible to call the
RewriteMacros functionality only after certain matches have completed?

To ask the same thing differently, what is the preferred way to
compose ASTMatchers and cc1 drivers such as RewriteMacros?

I wouldn't combine them. Why not write all tools to work on the
non-macro-rewritten code? (you can get macros yourself via the PPCallbacks
interface).

[...] Why not write all tools to work on the non-macro-rewritten code?
(you can get macros yourself via the PPCallbacks interface).

As far as I understand, PPCallbacks operate on a different level of
abstraction than ASTMatchers. The RefactoringTool I have in mind
rewrites two things: (1) "assert()" macros and (2) control-flow
conditions in "if-then-else", "while" and "for" statements. The
rewriting of both (1) and (2) is now implemented as ASTMatchers.
Unfortunately, (2) does not work correctly unless macros are rewritten
beforehand. To see this, consider "#define compare(a, b) do { if ((a)
< (b)) swap(a, b) } while(0);".

I presumed for now that it does not make sense, generally speaking, to
invoke ASTMatchers from within PPCallbacks because the source code
inside a macro definition may not be well-formed. This is why I was
merely looking for a way to combine ASTMatchers and cc1 drivers.

Best,
Alex

> [...] Why not write all tools to work on the non-macro-rewritten code?
> (you can get macros yourself via the PPCallbacks interface).

As far as I understand, PPCallbacks operate on a different level of
abstraction than ASTMatchers. The RefactoringTool I have in mind
rewrites two things: (1) "assert()" macros and (2) control-flow
conditions in "if-then-else", "while" and "for" statements. The
rewriting of both (1) and (2) is now implemented as ASTMatchers.
Unfortunately, (2) does not work correctly unless macros are rewritten
beforehand. To see this, consider "#define compare(a, b) do { if ((a)
< (b)) swap(a, b) } while(0);".

I'm considering it :slight_smile: So what's the problem rewriting this inside the macro?

I presumed for now that it does not make sense, generally speaking, to
invoke ASTMatchers from within PPCallbacks because the source code
inside a macro definition may not be well-formed. This is why I was
merely looking for a way to combine ASTMatchers and cc1 drivers.

No, my idea on how to use PPCallbacks was to just use them to collect
information about the macros, and then use that information to generate the
replacements.

[W]hat's the problem rewriting this inside the macro?

The problem I encountered was that Clang would not automatically
invoke the ASTMatchers inside the macro definitions. The comments I
found before I posted on this mailing list indicated that this is by
design and that it can only be circumvented by rewriting macros
beforehand, i.e. -rewrite-macros. But your previous remark gives me
hope that the situation might have changed, or that I have just missed
the right mailing list comments.

[Use PPCallbacks to] collect information about the macros, and then use that information to generate the replacements.

I'm afraid that with my limited expertise in Clang I would have to
look at an example that illustrates the technique you've in mind. It
may be even that I have not articulated clearly enough and that we are
thinking of two different things here.

> [W]hat's the problem rewriting this inside the macro?

The problem I encountered was that Clang would not automatically
invoke the ASTMatchers inside the macro definitions. The comments I
found before I posted on this mailing list indicated that this is by
design and that it can only be circumvented by rewriting macros
beforehand, i.e. -rewrite-macros. But your previous remark gives me
hope that the situation might have changed, or that I have just missed
the right mailing list comments.

Well, the macro has to be actually used :wink: Even if it's a library, I'd hope
you have at least a test that uses the macro... (same for templates btw).

> [Use PPCallbacks to] collect information about the macros, and then use
that information to generate the replacements.

I'm afraid that with my limited expertise in Clang I would have to
look at an example that illustrates the technique you've in mind. It
may be even that I have not articulated clearly enough and that we are
thinking of two different things here.

I'm, not sure you need that kind of information. See above. Code that is in
macros will be visited at every expansion location.

Well, the macro has to be actually used :wink: Even if it's a library, I'd hope you have at least a test that uses the macro... (same for templates btw).

Indeed, here's a small example:

#define foobar while(true) { }
int main() { foobar }

The ASTMatchers (around 500 LOC) that are supposed to rewrite this
kind of code are located here:

  https://github.com/ahorn/native-symbolic-execution-clang

A few notes:

Each ASTMatcher checks whether
Result.Context->getSourceManager().isInWrittenMainFile(); otherwise,
the tool would try to rewrite system headers etc. But as a sanity
check, I had also removed the isInWrittenMainFile() checks entirely.
In either case, the
"while" loop in the macro will not be rewritten. The case without
macros works great and I highly suspect that I am doing something
wrong that would be obvious to a Clang developer.

Best,
Alex

In article <CAOsfVvnVuADDsCCVqCfwR_9jGSR-R_iiSCtiWELCsg7Dv99_eg@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

In article <CAN1LFUMPxQHi8rzft-j50NDPJkJeet6zkPqLGhxfV35wM3LW+w@mail.gmail.com>,
    Alex Horn <alex.horn@cs.ox.ac.uk> writes:

A few notes:

Each ASTMatcher checks whether
Result.Context->getSourceManager().isInWrittenMainFile(); otherwise,
the tool would try to rewrite system headers etc.

I'm thinking of contributing a matcher that allows you to match nodes
based on the associated source file so that you can easily discard
matches outside the directory hierarchy(ies) of interest. This seems
like a common problem that all refactoring tool authors will encounter
sooner or later.

> Well, the macro has to be actually used :wink: Even if it's a library, I'd
hope you have at least a test that uses the macro... (same for templates
btw).

Indeed, here's a small example:

#define foobar while(true) { }
int main() { foobar }

The ASTMatchers (around 500 LOC) that are supposed to rewrite this
kind of code are located here:

  https://github.com/ahorn/native-symbolic-execution-clang

Ok, I think the problem here is (as opposed to the first example of
assert(true)) you now want to refactor *inside* the macro, so you have to
use the spelling location.
Thus, you'll need to introduce the same line of code as in my first reply
on this thread, but replace expansion with spelling:
SourceLocation SpellLocBegin = SM.getSpellingLoc(LocBegin);

Generally, for any macro location, you have to decide what to do. What I
usually do is that I get a range from the AST node, and use
Lexer::makeFileCharRange to get the location range that captures the AST
range (if possible). This is pretty complex code, as macro expansion is
rather involved, so I wouldn't try to roll it on my own.

Cheers,
/Manuel

In article <
CAN1LFUMPxQHi8rzft-j50NDPJkJeet6zkPqLGhxfV35wM3LW+w@mail.gmail.com>,
    Alex Horn <alex.horn@cs.ox.ac.uk> writes:

> A few notes:
>
> Each ASTMatcher checks whether
> Result.Context->getSourceManager().isInWrittenMainFile(); otherwise,
> the tool would try to rewrite system headers etc.

I'm thinking of contributing a matcher that allows you to match nodes
based on the associated source file so that you can easily discard
matches outside the directory hierarchy(ies) of interest. This seems
like a common problem that all refactoring tool authors will encounter
sooner or later.

I think it's a good idea, but will not solve the general problem of
surprise when encountering source locations. They are quite hard to
understand, and the problem that one needs to decide whether to do
something with the expansion of spelling location (or something in between)
remains.

Cheers,
/Manuel

In article <CAOsfVv=aBdSiMydayJkqiUEuRDzsZXUG0ArWyNHPHoDraUdZaA@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

In article <CAOsfVv=
aBdSiMydayJkqiUEuRDzsZXUG0ArWyNHPHoDraUdZaA@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

>
> > I'm thinking of contributing a matcher that allows you to match nodes
> > based on the associated source file so that you can easily discard
> > matches outside the directory hierarchy(ies) of interest. [...]
>
> I think it's a good idea, but will not solve the general problem of
> surprise when encountering source locations.

I take it you are referring to the discussion on this thread of:

- do I refactor the location where a macro is invoked?
- do I refactor the location where a macro is defined?

Are there any other surprises relating to source location other than
those relating to the site of a macro invocation vs. the site of a
macro definition?

Well, the problem that there are not only two locations to think about -
macro expansion can have arbitrarily many steps, and which level of
expansion is needed usually depends on what part of the AST one is looking
at

a.h:
#define A(x) a(x)
b.h:
#define B(x) b(x)
c.cc:
B(c());

Now depending which part of a(b(c())) you want to look at, you'll get a
different file.

In article <CAOsfVvm9hiRBifj1VbotCHMUdck_vg_2wSy6bgPAxxE7UHnwOg@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

> I take it you are referring to the discussion on this thread of:
>
> - do I refactor the location where a macro is invoked?
> - do I refactor the location where a macro is defined?
>
> Are there any other surprises relating to source location other than
> those relating to the site of a macro invocation vs. the site of a
> macro definition?
>

Well, the problem that there are not only two locations to think about -
macro expansion can have arbitrarily many steps, and which level of
expansion is needed usually depends on what part of the AST one is looking
at

a.h:
#define A(x) a(x)
b.h:
#define B(x) b(x)
c.cc:
B(c());

Now depending which part of a(b(c())) you want to look at, you'll get a
different file.

Ah, yes, very good point. (I tend to think of macros mentally in a
single-level of expansion, but obviously it's more involved than that.)

So... maybe we need some mechanism that let you drill down through macro
expansion one expansion at a time?

I want to play more with refactoring applied to macros, so I should do
some spike coding and see what's involved and what would be useful
beyond my own experiment.

Dear Manuel,

Thanks for those suggestions. If the mechanism for the refactoring
inside macros ever changes and you happen to remember our little
discussion don't hesitate to get in touch and I'd gladly try those
changes out should that be helpful for the designers of the library.

All the best,
Alex

Dear Manuel,

Thanks for those suggestions. If the mechanism for the refactoring
inside macros ever changes and you happen to remember our little
discussion don't hesitate to get in touch and I'd gladly try those
changes out should that be helpful for the designers of the library.

Based on this discussion I'm not sure what needs to change :slight_smile: In general,
source locations already model all expansions of macros, and we had good
success writing code transformations through that.

Cheers,
/Manuel

Dear Manuel,

I see what you mean :slight_smile: and "change" is probably not the right word
since, as you say, all the features are already there. To use these in
the right way comes probably totally natural to everyone who works
with Clang long enough --- and arguably that is all that matters in
the end.

As a newcomer to Clang, however, it can be challenging to decide when
one "type of location" (spelling, expansion, plain-old getLocStart
etc.) is the right thing to use and how they relate to each other.
Your answers all helped me tremendously as I stumbled in the dark on
this but (because of the way my brain works) it bothers me that I have
written code that seems to do something useful but I don't understand
why it works and when it doesn't. Since Clang source-to-source
transformations must account for probably very obscure corner cases
there may be no easy answer to all this but if you happen to know of
an article that explains the design behind source locations, this
might be a good way to end our discussion and it may also help others
who might stumble across our discussion in the future.

Thanks for all your support and your generous interpretation of my questions.

Best,
Alex

Dear Manuel,

I see what you mean :slight_smile: and "change" is probably not the right word
since, as you say, all the features are already there. To use these in
the right way comes probably totally natural to everyone who works
with Clang long enough --- and arguably that is all that matters in
the end.

As a newcomer to Clang, however, it can be challenging to decide when
one "type of location" (spelling, expansion, plain-old getLocStart
etc.) is the right thing to use and how they relate to each other.
Your answers all helped me tremendously as I stumbled in the dark on
this but (because of the way my brain works) it bothers me that I have
written code that seems to do something useful but I don't understand
why it works and when it doesn't. Since Clang source-to-source
transformations must account for probably very obscure corner cases
there may be no easy answer to all this but if you happen to know of
an article that explains the design behind source locations, this
might be a good way to end our discussion and it may also help others
who might stumble across our discussion in the future.

Ah, yes, I've planned to write such an article for a while, but alas, so
far the priorities haven't panned out ...
The best we currently have (afaik) is
http://clang.llvm.org/docs/IntroductionToTheClangAST.html and the talk I
gave at the euro llvm last year (the link has the video).

Cheers & thx for the feeback!,
/Manuel