Source to source transformations questions in branches/tooling

Hi,

I've been trying some source to source transformations using the
matcher/callback framework in the tooling branch. They are mostly working
but I have a couple of questions and would like some feedback on whether I
am doing everything as intended. Will you please help me?

There are two transformations I am trying to make: adding the override
keyword to virtual functions where we can and replacing shared_ptr by
value function parameters with const references. I have attached the two
cpp files to this post. They are also available from
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/add-override-specifier
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/replace-shared_ptr-parameters

add-override-specifier.cpp
Adds the override specifier to functions that override virtual
functions from a base class and don't already have the override specifier.

I have written the following matcher to find candidates functions.
        AST_MATCHER(clang::CXXMethodDecl, IsOverrideMissing)
        {
            // does this virtual function override a virtual function from
a base class
            return (Node.isVirtual() && (Node.size_overridden_methods() >
0) && !Node.hasAttr<OverrideAttr>());
        }

In my callback however I don't know how to find the right source location
to insert the override keyword. If there is no body I can use the end of
the function declaration. However if there is a body I don't know how to
get the end of the declaration section before the body. For example in the
following example I want to insert the keyword before the comment. How do I
get the source location of this point?
        void f() const // overrides a base class virtual function
        {
        }

replace-shared_ptr-parameters.cpp
Replace function parameters that specify a shared_ptr instance by
value with shared_ptr by const reference.

Should I be changing the type from shared_ptr<X> to const shared_ptr<X>& or
is there a way to do this by modifying the AST to make the parameter const
and by reference and then re-emit the source?

I have this working for normal functions, methods and template functions
where instantiations of that template exist. What type of matcher should I
write to match a template function that doesn't have an instantiation.
eg template <typename T> f(shared_ptr<T>);

Additional Questions:

It appears that I need to specify all of the system c++ include files when
I run the tool on a file, but I don't need to provide them when I compile
the with the clang compiler that I have built from the tooling branch. Is
this expected behavior?
replace-shared_ptr-parameters input.cpp -- -std=c++11 -Wall -g -O0
-I/usr/include/c++/4.6 -I/usr/include/c++/4.6/x86_64-linux-gnu
-I/usr/include/c++/4.6/backward -I/usr/lib/gcc/x86_64-linux-gnu/4.6/include
-I/usr/local/include -I/usr/lib/gcc/x86_64-linux-gnu/4.6/include-fixed
-I/usr/include/x86_64-linux-gnu -I/usr/include

I'd like to run these tools over our complete code base where the changes
will affect both header and source files. What is the recommended way to
limit changes to only my files and not system header files? I expect I can
limit them to the main file specified by the source manager, however I
assume that this would prevent changes to my header files.

How do I prevent a change from being made multiple times while in the same
translation unit compilation? eg as a result of template or macro
instantiations or multiple header includes. Is there a recommended way to
solve this or is it up to the callback to be safe in the case that it is
run twice on the same piece of code?

Thanks for the excellent work you guys are doing,
Phil

Hi,

I’ve been trying some source to source transformations using the
matcher/callback framework in the tooling branch. They are mostly working
but I have a couple of questions and would like some feedback on whether I
am doing everything as intended. Will you please help me?

There are two transformations I am trying to make: adding the override
keyword to virtual functions where we can and replacing shared_ptr by
value function parameters with const references. I have attached the two
cpp files to this post. They are also available from
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/add-override-specifier
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/replace-shared_ptr-parameters

add-override-specifier.cpp
Adds the override specifier to functions that override virtual
functions from a base class and don’t already have the override specifier.

I have written the following matcher to find candidates functions.
AST_MATCHER(clang::CXXMethodDecl, IsOverrideMissing)
{
// does this virtual function override a virtual function from
a base class
return (Node.isVirtual() && (Node.size_overridden_methods() >
0) && !Node.hasAttr());
}

In my callback however I don’t know how to find the right source location
to insert the override keyword. If there is no body I can use the end of
the function declaration. However if there is a body I don’t know how to
get the end of the declaration section before the body. For example in the
following example I want to insert the keyword before the comment. How do I
get the source location of this point?
void f() const // overrides a base class virtual function
{
}

I’ll leave answering that to Daniel.

replace-shared_ptr-parameters.cpp
Replace function parameters that specify a shared_ptr instance by
value with shared_ptr by const reference.

Should I be changing the type from shared_ptr to const shared_ptr& or
is there a way to do this by modifying the AST to make the parameter const
and by reference and then re-emit the source?

If you can match the location of the shared_ptr and the type, you can emit a replacement to “const X&” (no need to go through the AST here). Is there a part of the matching that’s problematic?

I have this working for normal functions, methods and template functions
where instantiations of that template exist. What type of matcher should I
write to match a template function that doesn’t have an instantiation.
eg template f(shared_ptr);

Ugh. Templates that are never instantiated anywhere are really hard to handle, as in your example all the types are dependent. -ast-dump-xml and friends also don’t really help here. I’ll put it on my list to write a tool that helps answer that kind of question.

Additional Questions:

It appears that I need to specify all of the system c++ include files when
I run the tool on a file, but I don’t need to provide them when I compile
the with the clang compiler that I have built from the tooling branch. Is
this expected behavior?
replace-shared_ptr-parameters input.cpp – -std=c++11 -Wall -g -O0
-I/usr/include/c++/4.6 -I/usr/include/c++/4.6/x86_64-linux-gnu
-I/usr/include/c++/4.6/backward -I/usr/lib/gcc/x86_64-linux-gnu/4.6/include
-I/usr/local/include -I/usr/lib/gcc/x86_64-linux-gnu/4.6/include-fixed
-I/usr/include/x86_64-linux-gnu -I/usr/include

Hm, this should work in principle. Can you add ‘-v’ output to both the tool run and the clang run from your self-built clang compiler and attach them?

A different solution would be to use the CompilationDatabase. If your project uses cmake, it works out of the box; if not, the easiest thing is to create a compilation database. See the docs here:
http://clang.llvm.org/docs/JSONCompilationDatabase.html

I’d like to run these tools over our complete code base where the changes
will affect both header and source files. What is the recommended way to
limit changes to only my files and not system header files? I expect I can
limit them to the main file specified by the source manager, however I
assume that this would prevent changes to my header files.

I simply do match on the path / name of the files for that. Usually project files have easily detectable paths.
Also note that what I do is key whether I want to replace something on the file of the original declaration of the ‘thing’ I want to replace, so that you don’t get inconsistent states.

How do I prevent a change from being made multiple times while in the same
translation unit compilation? eg as a result of template or macro
instantiations or multiple header includes. Is there a recommended way to
solve this or is it up to the callback to be safe in the case that it is
run twice on the same piece of code?

That’s what the refactoring support in the tooling library is for:
http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/include/clang/Tooling/Refactoring.h?view=markup

The idea is that you run over all TUs you want to refactor in the same process. We’re working on getting this multi-threaded so that it scales better.

Another possibility is to output simple diffs from your tool, and use a little python script to slurp in the changes, unique them and apply them (that’s what we currently do for really large scale MapReduce style refactorings).

Thanks for the excellent work you guys are doing,

Thanks :slight_smile:

/Manuel

Hi Philip,

Hi,

I’ve been trying some source to source transformations using the
matcher/callback framework in the tooling branch. They are mostly working
but I have a couple of questions and would like some feedback on whether I
am doing everything as intended. Will you please help me?

There are two transformations I am trying to make: adding the override
keyword to virtual functions where we can and replacing shared_ptr by
value function parameters with const references. I have attached the two
cpp files to this post. They are also available from
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/add-override-specifier
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/replace-shared_ptr-parameters

add-override-specifier.cpp
Adds the override specifier to functions that override virtual
functions from a base class and don’t already have the override specifier.

I have written the following matcher to find candidates functions.
AST_MATCHER(clang::CXXMethodDecl, IsOverrideMissing)
{
// does this virtual function override a virtual function from
a base class
return (Node.isVirtual() && (Node.size_overridden_methods() >
0) && !Node.hasAttr());
}

In my callback however I don’t know how to find the right source location
to insert the override keyword. If there is no body I can use the end of
the function declaration. However if there is a body I don’t know how to
get the end of the declaration section before the body. For example in the
following example I want to insert the keyword before the comment. How do I
get the source location of this point?
void f() const // overrides a base class virtual function
{
}

You could put the attribute to the location of the function’s body (FunctionDecl::getBody()->getLocStart()). This only works if the function has a body but would be very similar to the end of a body-less FunctionDecl. It does not play nicely with code styles where you put the open brace on a new line, though. You could move back one character (SourceLocation::getLocWithOffset(-1)). That would effectively not add the attribute if you have trailing single line comments like the one above. It might be possible to use Lexer::GetBeginningOfToken() on to move to the front of such a single-line comment, but that would need some experimenting.

It might be preferable to know the end of the function’s parameter list (or the location of both parenthesis), but I think this information is currently not stored in the AST and cannot be easily calculated for functions without parameters :-(. We could open a bug for this.

Philip Dunstan wrote:

Hi,

I've been trying some source to source transformations using the
matcher/callback framework in the tooling branch. They are mostly working
but I have a couple of questions and would like some feedback on whether I
am doing everything as intended. Will you please help me?

Hi Philip,

I'm in the same position as you, having used the tooling API and had some
feedback:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/21996

I also just published a follow up blog post:

http://www.kdab.com/assessing-the-automatic-qt-4-to-5-porting-tool/

There are two transformations I am trying to make: adding the override
keyword to virtual functions where we can and replacing shared_ptr by
value function parameters with const references. I have attached the two
cpp files to this post. They are also available from
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/add-

override-specifier

https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/replace-

shared_ptr-parameters

At least the override one seems to be generally useful. It's starting to
look more likely that a repo for things like this (and some of the generic
parts of my work such as method renaming) should be in a shared location in
the clang tree (possibly just in an example).

I'd like to run these tools over our complete code base where the changes
will affect both header and source files. What is the recommended way to
limit changes to only my files and not system header files? I expect I can
limit them to the main file specified by the source manager, however I
assume that this would prevent changes to my header files.

The approach I took was to specify the top of the source tree manually as a
command line argument and then try to verify that I only change files in the
correct paths.

Thanks,

Steve.

Philip Dunstan wrote:

Hi,

I’ve been trying some source to source transformations using the
matcher/callback framework in the tooling branch. They are mostly working
but I have a couple of questions and would like some feedback on whether I
am doing everything as intended. Will you please help me?

Hi Philip,

I’m in the same position as you, having used the tooling API and had some
feedback:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/21996

I also just published a follow up blog post:

http://www.kdab.com/assessing-the-automatic-qt-4-to-5-porting-tool/

There are two transformations I am trying to make: adding the override
keyword to virtual functions where we can and replacing shared_ptr by
value function parameters with const references. I have attached the two
cpp files to this post. They are also available from
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/add-
override-specifier
https://bitbucket.org/phildunstan/clang_tools/src/5d63a291d377/replace-
shared_ptr-parameters

At least the override one seems to be generally useful. It’s starting to
look more likely that a repo for things like this (and some of the generic
parts of my work such as method renaming) should be in a shared location in
the clang tree (possibly just in an example).

Yep. Perfect for the examples section, methinks.

Cheers,
/Manuel

Manuel Klimek wrote

Replace function parameters that specify a shared_ptr instance by
value with shared_ptr by const reference.

Should I be changing the type from shared_ptr<X> to const shared_ptr<X>&
or
is there a way to do this by modifying the AST to make the parameter
const
and by reference and then re-emit the source?

If you can match the location of the shared_ptr and the type, you can emit
a replacement to "const X&" (no need to go through the AST here). Is there
a part of the matching that's problematic?

Nothing problematic. It just feels like I might lose something with a simple
string replacement. I am not sure what might so wrong, so I might just be
being a little over cautious here.

Manuel Klimek wrote

It appears that I need to specify all of the system c++ include files
when
I run the tool on a file, but I don't need to provide them when I compile
the with the clang compiler that I have built from the tooling branch. Is
this expected behavior?

Hm, this should work in principle. Can you add '-v' output to both the
tool
run and the clang run from your self-built clang compiler and attach them?

I've added the outputs with '-v'. The first is clang built from the tooling
branch, the second is my tool built against the same tooling branch. The
only difference looks like the include path from the clang build directory,
and explicitly specifying this has fixed the errors from the tool. Who
should be responsible for adding this path to the tools header search list?

/home/phil/Development/tooling/build/Debug+Asserts/bin/clang++ -c
testdata.cpp -v -std=c++11

clang version 3.2 (branches/tooling 158516)
Target: x86_64-unknown-linux-gnu
Thread model: posix
"/home/phil/Development/tooling/build/Debug+Asserts/bin/clang" -cc1 -triple
x86_64-unknown-linux-gnu -emit-obj -mrelax-all -disable-free -main-file-name
testdata.cpp -mrelocation-model static -mdisable-fp-elim -fmath-errno
-masm-verbose -mconstructor-aliases -munwind-tables -target-cpu x86-64
-target-linker-version 2.22 -momit-leaf-frame-pointer -v -coverage-file
testdata.o -resource-dir
/home/phil/Development/tooling/build/Debug+Asserts/bin/../lib/clang/3.2
-fmodule-cache-path /var/tmp/clang-module-cache -internal-isystem
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6
-internal-isystem
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/x86_64-linux-gnu
-internal-isystem
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/backward
-internal-isystem /usr/local/include -internal-isystem
/home/phil/Development/tooling/build/Debug+Asserts/bin/../lib/clang/3.2/include
-internal-externc-isystem /usr/include/x86_64-linux-gnu
-internal-externc-isystem /include -internal-externc-isystem /usr/include
-std=c++11 -fdeprecated-macro -fdebug-compilation-dir
/home/phil/Development/clang_tools/add-override-specifier -ferror-limit 19
-fmessage-length 138 -mstackrealign -fgnu-runtime -fobjc-runtime-has-arc
-fobjc-runtime-has-weak -fobjc-fragile-abi -fcxx-exceptions -fexceptions
-fdiagnostics-show-option -fcolor-diagnostics -o testdata.o -x c++
testdata.cpp
clang -cc1 version 3.2 based upon LLVM 3.2svn default target
x86_64-unknown-linux-gnu
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6

/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/x86_64-linux-gnu
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/backward
/usr/local/include

/home/phil/Development/tooling/build/Debug+Asserts/bin/../lib/clang/3.2/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.

./add-override-specifier testdata.cpp -- -v -std=c++11

Processing:
/home/phil/Development/clang_tools/add-override-specifier/testdata.cpp.
clang version 3.2 (branches/tooling 158516)
Target: x86_64-unknown-linux-gnu
Thread model: posix
clang Invocation:

"/home/phil/Development/clang_tools/add-override-specifier/add-override-specifier"
"-cc1" "-triple" "x86_64-unknown-linux-gnu" "-fsyntax-only" "-disable-free"
"-main-file-name" "testdata.cpp" "-mrelocation-model" "static"
"-mdisable-fp-elim" "-fmath-errno" "-masm-verbose" "-mconstructor-aliases"
"-munwind-tables" "-target-cpu" "x86-64" "-target-linker-version" "2.22"
"-momit-leaf-frame-pointer" "-v" "-resource-dir"
"/home/phil/Development/clang_tools/add-override-specifier/../lib/clang/3.2"
"-fmodule-cache-path" "/var/tmp/clang-module-cache" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6"
"-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/x86_64-linux-gnu"
"-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/backward"
"-internal-isystem" "/usr/local/include" "-internal-isystem"
"/home/phil/Development/clang_tools/add-override-specifier/../lib/clang/3.2/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-std=c++11" "-fdeprecated-macro" "-fdebug-compilation-dir"
"/home/phil/Development/clang_tools/add-override-specifier" "-ferror-limit"
"19" "-fmessage-length" "138" "-mstackrealign" "-fgnu-runtime"
"-fobjc-runtime-has-arc" "-fobjc-runtime-has-weak" "-fobjc-fragile-abi"
"-fcxx-exceptions" "-fexceptions" "-fdiagnostics-show-option"
"-fcolor-diagnostics" "-x" "c++"
"/home/phil/Development/clang_tools/add-override-specifier/testdata.cpp"

clang -cc1 version 3.2 based upon LLVM 3.2svn default target
x86_64-unknown-linux-gnu
ignoring nonexistent directory
"/home/phil/Development/clang_tools/add-override-specifier/../lib/clang/3.2/include"
ignoring nonexistent directory "/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6

/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/x86_64-linux-gnu
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/backward
/usr/local/include
/usr/include/x86_64-linux-gnu
/usr/include
End of search list.
In file included from
/home/phil/Development/clang_tools/add-override-specifier/testdata.cpp:1:
In file included from
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/iostream:39:
In file included from
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/ostream:39:
In file included from
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/ios:38:
In file included from
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/iosfwd:41:
In file included from
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/bits/postypes.h:41:
In file included from
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../include/c++/4.6/cwchar:45:
/usr/include/wchar.h:40:11: fatal error: 'stdarg.h' file not found
# include <stdarg.h>
          ^
1 error generated.
Error while processing
/home/phil/Development/clang_tools/add-override-specifier/testdata.cpp.

Manuel Klimek wrote

A different solution would be to use the CompilationDatabase. If your
project uses cmake, it works out of the box; if not, the easiest thing is
to create a compilation database. See the docs here:
http://clang.llvm.org/docs/JSONCompilationDatabase.html

I will definitely look into this. We use scons rather than cmake, but it
should no problem extracting this information.

Manuel Klimek wrote

I simply do match on the path / name of the files for that. Usually
project
files have easily detectable paths.
Also note that what I do is key whether I want to replace something on the
file of the original declaration of the 'thing' I want to replace, so that
you don't get inconsistent states.

Interesting. That is good to know.

Manuel Klimek wrote

That's what the refactoring support in the tooling library is for:
http://llvm.org/viewvc/llvm-project/cfe/branches/tooling/include/clang/Tooling/Refactoring.h?view=markup

The idea is that you run over all TUs you want to refactor in the same
process. We're working on getting this multi-threaded so that it scales
better.

Another possibility is to output simple diffs from your tool, and use a
little python script to slurp in the changes, unique them and apply them
(that's what we currently do for really large scale MapReduce style
refactorings).

Thanks, I will definitely look into that

Phil

Daniel Jasper wrote

In my callback however I don't know how to find the right source location
to insert the override keyword. If there is no body I can use the end of
the function declaration. However if there is a body I don't know how to
get the end of the declaration section before the body. For example in
the
following example I want to insert the keyword before the comment. How do
I
get the source location of this point?
       void f() const // overrides a base class virtual function
       {
       }

You could put the attribute to the location of the function's body
(FunctionDecl::getBody()->getLocStart()). This only works if the function
has a body but would be very similar to the end of a body-less
FunctionDecl. It does not play nicely with code styles where you put the
open brace on a new line, though. You could move back one character
(SourceLocation::getLocWithOffset(-1)). That would effectively not add the
attribute if you have trailing single line comments like the one above. It
might be possible to use Lexer::GetBeginningOfToken() on to move to the
front of such a single-line comment, but that would need some
experimenting.

It might be preferable to know the end of the function's parameter list
(or
the location of both parenthesis), but I think this information is
currently not stored in the AST and cannot be easily calculated for
functions without parameters :-(. We could open a bug for this.

I am not sure the end of the function's parameter list is sufficient as
there might be const specifiers and other attributes there that I need to
know about. Some of these ideas might be enough to work on the code base I
want to apply it to though. I will try a few of these ideas out on the
weekend.

Thanks for the help,
Phil

Manuel Klimek wrote

Replace function parameters that specify a shared_ptr instance by
value with shared_ptr by const reference.

Should I be changing the type from shared_ptr to const shared_ptr&
or
is there a way to do this by modifying the AST to make the parameter
const
and by reference and then re-emit the source?

If you can match the location of the shared_ptr and the type, you can emit
a replacement to “const X&” (no need to go through the AST here). Is there
a part of the matching that’s problematic?

Nothing problematic. It just feels like I might lose something with a simple
string replacement. I am not sure what might so wrong, so I might just be
being a little over cautious here.

We’ve been doing simple string replacements for large scale code transformations on google’s repo for about 1.5 years now :slight_smile:

Manuel Klimek wrote

It appears that I need to specify all of the system c++ include files
when
I run the tool on a file, but I don’t need to provide them when I compile
the with the clang compiler that I have built from the tooling branch. Is
this expected behavior?

Hm, this should work in principle. Can you add ‘-v’ output to both the
tool
run and the clang run from your self-built clang compiler and attach them?

I’ve added the outputs with ‘-v’. The first is clang built from the tooling
branch, the second is my tool built against the same tooling branch. The
only difference looks like the include path from the clang build directory,
and explicitly specifying this has fixed the errors from the tool. Who
should be responsible for adding this path to the tools header search list?

Hah, you’re cheating :slight_smile:
Clang tools need the clang builtin headers just like clang. So the best way to run them is to put them into the same binary directory where clang is located.

Or you convince Chandler on IRC that we want to have the builtin headers compiled into the binary for tools :wink:

Cheers,
/Manuel