Writing C++ refactoring tools in Common Lisp

Hey folks,

I’ve written a new Common Lisp environment/compiler over the past two years.
It uses LLVM as the back-end (by exposing the LLVM C++ library to Common Lisp) and interoperates with C++ in that it has a template library like boost::python for exposing arbitrary C++ classes/functions to Common Lisp.

I want to do some automated refactoring to improve my C++ code so I’m exposing the Clang ASTMatcher and Refactoring libraries.

I also think it would be really cool to write refactoring tools in a dynamic language like Lisp - without all the exhausting boilerplate required by the C++ approach.

So I’m exposing the Clang AST library to this Common Lisp environment with the goal of writing a general tool for writing C++ refactoring tools in Common Lisp.

My goal is to mimic C++ ASTMatchers like:

recordDecl(hasDescendant(
ifStmt(hasTrueExpression(
expr(hasDescendant(
ifStmt()))))))

Using S-expressions: (record-decl (has-descendent (if-stmt (has-true-expression (expr (has-descendant( if-stmt)))))))

And the source-to-source translation code will be small lambda functions that use the resulting MatchResults.

That way I don’t have to write a lot of documentation :-).

Currently I’m looking at calling the ASTMatcher/Dynamic/VariantValue interface to build ASTMatchers from S-expressions - does that sound like a good idea - or is there a better approach I should be exploring?

Best,

.Chris.

Christian Schafmeister
Associate Professor
Chemistry Department
Temple University

Hey folks,

I’ve written a new Common Lisp environment/compiler over the past two
years.
It uses LLVM as the back-end (by exposing the LLVM C++ library to Common
Lisp) and interoperates with C++ in that it has a template library like
boost::python for exposing arbitrary C++ classes/functions to Common Lisp.

I want to do some automated refactoring to improve my C++ code so I’m
exposing the Clang ASTMatcher and Refactoring libraries.

I also think it would be really cool to write refactoring tools in a
dynamic language like Lisp - without all the exhausting boilerplate
required by the C++ approach.

C++ refactoring tools actually don't require that much boilerplate...

So I’m exposing the Clang AST library to this Common Lisp environment with
the goal of writing a general tool for writing C++ refactoring tools in
Common Lisp.

My goal is to mimic C++ ASTMatchers like:

*recordDecl(hasDescendant(*
* ifStmt(hasTrueExpression(*
* expr(hasDescendant(*
* ifStmt()))))))*

Using S-expressions: *(record-decl (has-descendent (if-stmt
(has-true-expression (expr (has-descendant( if-stmt))))))) *

And the source-to-source translation code will be small lambda functions
that use the resulting MatchResults.

That way I don’t have to write a lot of documentation :-).

Currently I’m looking at calling the ASTMatcher/Dynamic/VariantValue
interface to build ASTMatchers from S-expressions - does that sound like a
good idea - or is there a better approach I should be exploring?

Well, if your boost shim can already match "arbitrary C++ classes /
functions", then it should in principle be able to expose the AST matchers
as is (without the need to go through the dynamic parsing).
After all, the matchers are mostly just template functions :slight_smile:

Otherwise I think going the dynamic matcher route is a good approach.

Cheers,
/Manuel

Sorry, my statement sounded like a criticism - I never mean to criticize - these languages are what they are.
I was just speaking about my feelings from writing a few simple refactoring tools in C++ and my motivations for implementing a Common Lisp compiler and writing refactoring tools in Common Lisp. I’m a new convert to Common Lisp - I just discovered it two years ago and most of my experience with it is writing a self-hosting compiler.

I was trying to figure out how to do that at first but the ASTMatchers looked more exotic than just simple functions.

For instance “namedDecl” has the type internal::VariadicDynCastAllOfMatcher<Decl,NamedDecl>

VariadicDynCastAllOfMatcher is defined as:

template <typename SourceT, typename TargetT>
class VariadicDynCastAllOfMatcher
: public llvm::VariadicFunction<
BindableMatcher, Matcher,
makeDynCastAllOfComposite<SourceT, TargetT> > {
public:
VariadicDynCastAllOfMatcher() {}
};

llvm::VariadicFunction has many templated overloads of “operator()”
I haven’t written template code to automatically wrap variadic functions (as in functions that take … ellipses).

I can wrap functions that take ArrayRef’s though - is there a function like namedDecl(ArrayRef) for ASTMatchers?

I’ve only been getting deep into C++ template programming for about two months so I’m still struggling with reading complex template code.

Hey folks,

I’ve written a new Common Lisp environment/compiler over the past two
years.
It uses LLVM as the back-end (by exposing the LLVM C++ library to Common
Lisp) and interoperates with C++ in that it has a template library like
boost::python for exposing arbitrary C++ classes/functions to Common Lisp.

I want to do some automated refactoring to improve my C++ code so I’m
exposing the Clang ASTMatcher and Refactoring libraries.

I also think it would be really cool to write refactoring tools in a
dynamic language like Lisp - without all the exhausting boilerplate
required by the C++ approach.

C++ refactoring tools actually don't require that much boilerplate…

Sorry, my statement sounded like a criticism - I never mean to criticize -
these languages are what they are.

No worries, I have no problems with criticism - and I didn't take it that
way. My remark was really just meant as that - I think writing a
refactoring tool has suprisingly little overhead. The main problem is
actually compilation, which is why I'd be very interested in seeing more
interpreted integrations.

I was just speaking about my feelings from writing a few simple
refactoring tools in C++ and my motivations for implementing a Common Lisp
compiler and writing refactoring tools in Common Lisp. I’m a new convert
to Common Lisp - I just discovered it two years ago and most of my
experience with it is writing a self-hosting compiler.

So I’m exposing the Clang AST library to this Common Lisp environment
with the goal of writing a general tool for writing C++ refactoring tools
in Common Lisp.

My goal is to mimic C++ ASTMatchers like:

*recordDecl(hasDescendant(*
* ifStmt(hasTrueExpression(*
* expr(hasDescendant(*
* ifStmt()))))))*

Using S-expressions: *(record-decl (has-descendent (if-stmt
(has-true-expression (expr (has-descendant( if-stmt))))))) *

And the source-to-source translation code will be small lambda functions
that use the resulting MatchResults.

That way I don’t have to write a lot of documentation :-).

Currently I’m looking at calling the ASTMatcher/Dynamic/VariantValue
interface to build ASTMatchers from S-expressions - does that sound like a
good idea - or is there a better approach I should be exploring?

Well, if your boost shim can already match "arbitrary C++ classes /
functions", then it should in principle be able to expose the AST matchers
as is (without the need to go through the dynamic parsing).
After all, the matchers are mostly just template functions :slight_smile:

I was trying to figure out how to do that at first but the ASTMatchers
looked more exotic than just simple functions.

For instance “namedDecl” has the type
internal::VariadicDynCastAllOfMatcher<Decl,NamedDecl>
VariadicDynCastAllOfMatcher is defined as:

template <typename SourceT, typename TargetT>
class VariadicDynCastAllOfMatcher
    : public llvm::VariadicFunction<
        BindableMatcher<SourceT>, Matcher<TargetT>,
        makeDynCastAllOfComposite<SourceT, TargetT> > {
public:
  VariadicDynCastAllOfMatcher() {}
};

llvm::VariadicFunction has many templated overloads of "operator()"
I haven’t written template code to automatically wrap variadic functions
(as in functions that take … ellipses).

I can wrap functions that take ArrayRef’s though - is there a function
like namedDecl(ArrayRef<ASTMatcher>) for ASTMatchers?

I’ve only been getting deep into C++ template programming for about two
months so I’m still struggling with reading complex template code.

Yes, I'd actually be suprised if this was possible without major
jump-through-hooping - the problem is that the compiler generates the
template code only when they get instantiated - so unless you have a
backend that takes the lisp code and generates C++ code from that that
integrates with the template code, I wouldn't know how to do it (and that
round-trip would seem like it kills a lot of the benefits of having lisp in
the first place - after all, you'd still get crappy C++ error messages :P)

Agreed.

Maybe the best way to do this is to write a Registry class like the one in Dynamic/Registry.h and define “constructMatcher” and “constructBoundMatcher” that take Lisp symbols and assemble VariantMatchers that way.

In article <CAOsfVvm6WO_iKa9iyVUKCqdmXVUk3R7KkdGNK-gae8hjaXcshA@mail.gmail.com>,
    Manuel Klimek <klimek@google.com> writes:

[...] My remark was really just meant as that - I think writing a
refactoring tool has suprisingly little overhead. The main problem is
actually compilation, which is why I'd be very interested in seeing more
interpreted integrations.

I think an interpreter that allowed you to write source-to-source
transformations in some kind of interpretive DSL would be a great boon
to exploring different refactoring options for C++!

I didn't find it that hard to make a simple C++ refactoring tool using
libtooling, but I can see that having an interpretive DSL that let you
try things interactively would be good for explorative prototyping.
You could always provide a way to spit out libtooling style code from
the interpreted expression.

+Sam, who has written the dynamic registry stuff for ideas…

+Sam, who has written the dynamic registry stuff for ideas...

Hey folks,

I’ve written a new Common Lisp environment/compiler over the past two
years.
It uses LLVM as the back-end (by exposing the LLVM C++ library to
Common Lisp) and interoperates with C++ in that it has a template library
like boost::python for exposing arbitrary C++ classes/functions to Common
Lisp.

I want to do some automated refactoring to improve my C++ code so I’m
exposing the Clang ASTMatcher and Refactoring libraries.

I also think it would be really cool to write refactoring tools in a
dynamic language like Lisp - without all the exhausting boilerplate
required by the C++ approach.

C++ refactoring tools actually don't require that much boilerplate…

Sorry, my statement sounded like a criticism - I never mean to criticize
- these languages are what they are.

No worries, I have no problems with criticism - and I didn't take it that
way. My remark was really just meant as that - I think writing a
refactoring tool has suprisingly little overhead. The main problem is
actually compilation, which is why I'd be very interested in seeing more
interpreted integrations.

I was just speaking about my feelings from writing a few simple
refactoring tools in C++ and my motivations for implementing a Common Lisp
compiler and writing refactoring tools in Common Lisp. I’m a new convert
to Common Lisp - I just discovered it two years ago and most of my
experience with it is writing a self-hosting compiler.

So I’m exposing the Clang AST library to this Common Lisp environment
with the goal of writing a general tool for writing C++ refactoring tools
in Common Lisp.

My goal is to mimic C++ ASTMatchers like:

*recordDecl(hasDescendant(*
* ifStmt(hasTrueExpression(*
* expr(hasDescendant(*
* ifStmt()))))))*

Using S-expressions: *(record-decl (has-descendent (if-stmt
(has-true-expression (expr (has-descendant( if-stmt))))))) *

And the source-to-source translation code will be small lambda
functions that use the resulting MatchResults.

That way I don’t have to write a lot of documentation :-).

Currently I’m looking at calling the ASTMatcher/Dynamic/VariantValue
interface to build ASTMatchers from S-expressions - does that sound like a
good idea - or is there a better approach I should be exploring?

Well, if your boost shim can already match "arbitrary C++ classes /
functions", then it should in principle be able to expose the AST matchers
as is (without the need to go through the dynamic parsing).
After all, the matchers are mostly just template functions :slight_smile:

I was trying to figure out how to do that at first but the ASTMatchers
looked more exotic than just simple functions.

For instance “namedDecl” has the type
internal::VariadicDynCastAllOfMatcher<Decl,NamedDecl>
VariadicDynCastAllOfMatcher is defined as:

template <typename SourceT, typename TargetT>
class VariadicDynCastAllOfMatcher
    : public llvm::VariadicFunction<
        BindableMatcher<SourceT>, Matcher<TargetT>,
        makeDynCastAllOfComposite<SourceT, TargetT> > {
public:
  VariadicDynCastAllOfMatcher() {}
};

llvm::VariadicFunction has many templated overloads of "operator()"
I haven’t written template code to automatically wrap variadic functions
(as in functions that take … ellipses).

I can wrap functions that take ArrayRef’s though - is there a function
like namedDecl(ArrayRef<ASTMatcher>) for ASTMatchers?

I’ve only been getting deep into C++ template programming for about two
months so I’m still struggling with reading complex template code.

Yes, I'd actually be suprised if this was possible without major
jump-through-hooping - the problem is that the compiler generates the
template code only when they get instantiated - so unless you have a
backend that takes the lisp code and generates C++ code from that that
integrates with the template code, I wouldn't know how to do it (and that
round-trip would seem like it kills a lot of the benefits of having lisp in
the first place - after all, you'd still get crappy C++ error messages :P)

Agreed.

Maybe the best way to do this is to write a Registry class like the one
in Dynamic/Registry.h and define “constructMatcher” and
“constructBoundMatcher” that take Lisp symbols and assemble VariantMatchers
that way.

Most of them are simple functions, but there are also a lot that are
"polymorphic". That is, they return a proxy type that allows late binding
of the real matcher type.
During the implementation of the dynamic matcher registry I changed all
these polymorphic/adaptative types to provide extra type information that
allowed the registry to instantiate all the required functions.

What about this?
Instead of wrapping the actual matchers the way it is done in C++, you
could make meta matchers that really only know their name and their
arguments.
Basically all matchers would be functions that return an object that has a
name() method and a list of their arguments.
Then you can generate the C++ matcher string and use the C++ matcher parser
to generate the actual matcher.
This way you only wrap a few [simple] functions from C++ and are shielded
from the template metaprogramming craziness.
Hadn't thought this through, so it might be too simple to work =P.

_Sam

I managed to hack/rewrite parts of Registry.cpp, Registry.h, Marshaller.h and Diagnostics.h so that they work with Common Lisp Symbols and Conses.
I can now assemble ASTMatchers from s-expressions.
I exposed the entire ASTMatcher interface in one go - so I can write any ASTMatcher using S-expressions that can be written with the C++ DSL.

Using the C++ DSL you would use:
fieldDecl(hasType(recordDecl(isDerivedFrom(recordDecl(hasName(“GCObject”)))))).bind(“all”)

In Common Lisp I would use:
(:bind :all (:field-decl (:has-type (:record-decl (:is-derived-from (:record-decl (:has-name “GCObject”))))))))

Question:

How do you get proficient at writing these matchers? I still struggle with it.
Do you have some way of looking at the AST dumps and then inverting them in your head into queries?
I’ll build something like the “clang-query” tool and practice.

I’m going to expose the refactoring classes/functions next and try some interactive source-to-source translation of C++ in Common Lisp.

Best,

.Chris.

So, I managed to hack/rewrite parts of Registry.cpp, Registry.h,
Marshaller.h and Diagnostics.h so that they work with Common Lisp Symbols
and Conses.
I can now assemble ASTMatchers from s-expressions.
I exposed the entire ASTMatcher interface in one go - so I can write any
ASTMatcher using S-expressions that can be written with the C++ DSL.

Using the C++ DSL you would use:

fieldDecl(hasType(recordDecl(isDerivedFrom(recordDecl(hasName(“GCObject”)))))).bind(“all”)

In Common Lisp I would use:
(:bind :all (:field-decl (:has-type (:record-decl (:is-derived-from
(:record-decl (:has-name "GCObject"))))))))

Question:

How do you get proficient at writing these matchers? I still struggle
with it.

In theory, they should read as english (with some bad grammar =P).
In your example, it reads as:
"A <fieldDecl> that a <hasType> that is a <recordDecl> which
<isDerivedFrom> ..."
In my experience, the hardest part is figuring out which matchers to use
(which basically requires knowing about them), not how to assemble them.

Do you have some way of looking at the AST dumps and then inverting them in

your head into queries?

I don't see what you mean by inverting.
There are two types of matchers: filters and traversal.
The filters check properties on the nodes, like "hasName()".
The traversal matchers help you reach other nodes in the AST, like
"hasType()".

There are various trees that are interconnected; expressions, declarations
and types. Matchers allow you to go from one to the other.
Eg. you can go from an expression (like &a->foo) to its type, from a type
to some expression (like getting the initializer for a member field), etc.
In each tree you can also go up and down the hierarchies. Eg. you can go
from Foo type to Foo, or from "1+foo->bar()" to "1" and "foo->bar()".

Thanks - that helps.

I got that from the first talk that I saw Chandler Carruth present on the ASTMatcher library:
http://www.youtube.com/watch?v=mVbDzTM21BQ at 14:40

It sounded fascinating and planted this idea of exposing the ASTMatcher library in the Common Lisp system/compiler that I was just starting to write.

I still have no clear idea what he was talking about “…invert the AST and turn it into a set of matchers…” but it sounded compelling - hilarious huh?

Two types of matchers - filter & traversal - that part I get - what about Narrowing matchers?
The traversal matchers somewhat correspond to methods of the node classes that they traverse from - right?
Like Expr nodes have a Expr::getType() method and an “expr()” matcher can have a “hasType()” matcher as an argument - as in “expr(hasType(…))”?

Can you help me understand the docs on http://clang.llvm.org/docs/LibASTMatchersReference.html?
There are the rows with three columns “Return Type” “Name” “Parameters”.
I understand the type of the parameters. What does the “Return type” tell us? Does it mean that you can only use that matcher X as a parameter of matcher Y if matcher Y accepts matcher Xs “Return Type” (or a more derived type) as a “Parameters” type?

Ok, that is very helpful - thank you.

Best,

.Chris.

In article <BC8D58EA-DA81-418B-A429-9F260BC1DF88@verizon.net>,
    Christian Schafmeister <chris.schaf@verizon.net> writes:

Does it mean that you can only use that matcher X as a parameter of
matcher Y if matcher Y accepts matcher Xs "Return Type" (or a more derived
type) as a "Parameters" type?

I believe this is correct.

In article <BC8D58EA-DA81-418B-A429-9F260BC1DF88@verizon.net>,
    Christian Schafmeister <chris.schaf@verizon.net> writes:

> Does it mean that you can only use that matcher X as a parameter of
> matcher Y if matcher Y accepts matcher Xs "Return Type" (or a more
derived
> type) as a "Parameters" type?

I believe this is correct.

Yep, this is correct.

Generally, you'll want to use "clang -ast-dump" or "clang-check -ast-dump"
to learn how the AST of a file looks; that helps in learning what ast
matchers you'll need. Then I'm using clang-query to refine my ast matchers.

Cheers,
/Manuel