RFC: Adding a rename refactoring tool to clang

Folks,

At Google we are working on a tool and set of APIs for refactoring C++ programs based on LibTooling. Particularly, we have targeted rename refactoring for C++ as our first step.

In our first iteration we want to offer two things:
1- A command line tool similar to clang-format which will semantically rename a symbol (specified by a position in a file) in a set of input files.
2- An API for doing the above task which can effectively be used to provide this functionality for any editor (Emacs, Vim, CodeMirror, etc.).

== Renaming capabilities
In the first iteration, we are offering the following features for the rename refactoring:
=== Supported C++ constructs:

  • Global and local variables (including function arguments)
  • Functions
  • C and CXX record types (structs/unions/classes)
  • For classes this includes renaming the constructor and destructor
  • User defined types
  • Enumerations (names and constants)
  • Record member variables and methods
  • Namespace specifiers
  • Template parameters
  • Lambda captures
  • Overloaded operators
    === Unsupported C++ constructs:
  • Macros
  • Symbols in comments

== Command line program
=== Current support:

  • Input from stdin, output to stdout

  • Input and output from/to disk

  • Option to specify include path

  • Option to predefine macros
    === Possible improvements:

  • Multiple files from stdin

  • Making backups of renamed files

We think this tool should reside in clang-tools-extra.

Please let us know what you think. Any comment and feedback is appreciated.

Thank you.

ΛMIN シ | amshali@ | Google Inc.

From: "Amin Shali" <amshali@google.com>
To: cfe-dev@cs.uiuc.edu
Cc: "Matthew Plant" <mplant@google.com>
Sent: Friday, July 25, 2014 3:01:48 PM
Subject: [cfe-dev] RFC: Adding a rename refactoring tool to clang

Folks,

At Google we are working on a tool and set of APIs for refactoring
C++ programs based on LibTooling. Particularly, we have targeted
rename refactoring for C++ as our first step.

In our first iteration we want to offer two things:
1- A command line tool similar to clang-format which will
semantically rename a symbol (specified by a position in a file) in
a set of input files.
2- An API for doing the above task which can effectively be used to
provide this functionality for any editor (Emacs, Vim, CodeMirror,
etc.).

Sounds neat.

== Renaming capabilities
In the first iteration, we are offering the following features for
the rename refactoring:
=== Supported C++ constructs:
- Global and local variables (including function arguments)
- Functions
- C and CXX record types (structs/unions/classes)
- For classes this includes renaming the constructor and destructor
- User defined types
- Enumerations (names and constants)
- Record member variables and methods
- Namespace specifiers
- Template parameters
- Lambda captures

What does it mean to rename a lambda capture (or an operator)?

- Overloaded operators
=== Unsupported C++ constructs:
- Macros
- Symbols in comments

== Command line program
=== Current support:
- Input from stdin, output to stdout
- Input and output from/to disk
- Option to specify include path
- Option to predefine macros
=== Possible improvements:

- Multiple files from stdin

- Making backups of renamed files

Making a backup of the files that have been changed would be a nice feature. As would generating a patch instead of making the change in place.

-Hal

Folks,

At Google we are working on a tool and set of APIs for refactoring C++
programs based on LibTooling. Particularly, we have targeted rename
refactoring for C++ as our first step.

In our first iteration we want to offer two things:
1- A command line tool similar to clang-format which will semantically
rename a symbol (specified by a position in a file) in a set of input files.

Do you intend to provide one command-line tool per refactoring, or one tool
that has subcommands for each operation? For command-line / scripting use,
it'd seem useful to also be able to specify a symbol by name (at least for
names that are not function-local). (For instance, say I want to rename all
*Decl classes in clang to *Declaration. I'd like to be able to perform a
semantic grep for those, then run them through a command-line rename tool.)

2- An API for doing the above task which can effectively be used to provide

this functionality for any editor (Emacs, Vim, CodeMirror, etc.).

== Renaming capabilities
In the first iteration, we are offering the following features for the
rename refactoring:
=== Supported C++ constructs:
         - Global and local variables (including function arguments)
         - Functions
         - C and CXX record types (structs/unions/classes)
           - For classes this includes renaming the constructor and
destructor
         - User defined types
         - Enumerations (names and constants)
         - Record member variables and methods
         - Namespace specifiers
         - Template parameters
         - Lambda captures
         - Overloaded operators
=== Unsupported C++ constructs:
         - Macros
         - Symbols in comments

== Command line program
=== Current support:
         - Input from stdin, output to stdout
         - Input and output from/to disk
         - Option to specify include path
         - Option to predefine macros
=== Possible improvements:
         - Multiple files from stdin
         - Making backups of renamed files

We think this tool should reside in clang-tools-extra.

Please let us know what you think. Any comment and feedback is
appreciated.

Do you have any thoughts on how to handle templates? For instance:

struct A { typedef int Foo; };
struct B { typedef char Foo; };
template<typename T> struct X { typename T::Foo bar; };
X<A> a;
X<B> b;

Suppose we want to rename A::Foo to A::Goo. Our obvious choices are:
1) Don't rename the use inside X; breaks X<A>.
2) Do rename the use inside X; breaks X<B>.
3) Rename the use inside X and the use inside B; will surprise users.
4) Report an error and refuse to rename A::Foo. Tell users they need to
rename B::Foo too. (And if the user requests that both are renamed at the
same time, do the right thing.)

I think option 4 is the best choice, but it might be a little tricky to get
right.

How do you intend to behave when renaming an entity shared by multiple
source files? That might influence the checks you're able to perform for
the above case.

Hey everyone, I can answer some of the questions as well.

It:

What does it mean to rename a lambda capture (or an operator)?
It means that we can rename symbols that appear in lambda captures. For operators, we are able to change which operator was being overloaded. For example, we can change all instances of “foo::operator ++” to “foo::operator --”. This functionality is pretty limited but works in most cases.

Also, I really like the idea of generating a path.

Richard:
Right now, there is only one command line tool, aptly called clang-rename. Right now, the type of renaming is much simpler than what you want: given a location in the code that refers to a symbol, all semantically (and syntactically) equivalent instances of that symbol will be renamed. So no, the tool cannot semantic grep, but there are plenty of tools to rename things based on regexps.

In the case that you specify, no renaming will take place, unfortunately. Not at least renaming T::Foo bar is a bug, but one that can be fixed later. It was thought that erring on the side of not renaming was a better idea than making a guess :). Templates are tricky, and they’re one of the few things that are not handled as substantially as they could be.

Which brings me to how renaming takes place: we find a USR at a location, then we find every instance of that USR (and equivalent USRs for constructors/destructors), and then we rename based on those locations. Therefore, any USR that is not dependent on file name (which is almost all of them) can be used to find locations in other files. Thus, renaming in multiple source files works very smoothly.

I hope that addressed your questions sufficiently.

-Matt

FWIW, I have long argued that we should busy-box all of our tools that pull
in the full Clang parser, semantic analyzer, and AST. I think only
clang-format really makes sense as a separate tool...

But in case it wasn't clear, I think that should be done totally separately
from adding any renaming support. This isn't a new problem, and if we want
to solve it, we shouldn't conflate solving it with getting a renaming tool.

Hi Amin,

this is interesting.

Do you plan to keep parsed files in memory/a cache/indexed? If you e.g. want to rename a couple of constructs in the LLVM code base, it seems to make a big difference if you need to wait 5-10 minutes until a rename has taken place or if it is almost instant, especially for editor integration.

I am asking this as performing actions/analysis across multiple translation units is a recurring problem and it would be interesting to see if/how your work would (maybe not today?) integrate with such kind of service.

Cheers,
Tobias

In article <CAGCO0Kh=usFmr1_Uak7d8wNAaPuHnC_kxHQbdy7oXhojDSeGyg@mail.gmail.com>,
    Chandler Carruth <chandlerc@google.com> writes:

FWIW, I have long argued that we should busy-box all of our tools that pull
in the full Clang parser, semantic analyzer, and AST. I think only
clang-format really makes sense as a separate tool...

For the refactoring tools, I think this makes sense, i.e.

clang-refactor rename ....
clang-refactor modernize ...

etc.

What other tools are there besides refactoring tools and clang-format?

You guys at google keep reading my mind and doing the projects that I
wish I had time to do :-).

Keep going!

In article <CAKw0c6BWWAp4tDYixTTA2xsprDTFan6m6c3SuNG-DGdykApS3Q@mail.gmail.com>,
    Amin Shali <amshali@google.com> writes:

At Google we are working on a tool and set of APIs for refactoring C++
programs based on LibTooling. Particularly, we have targeted rename
refactoring for C++ as our first step.

In our first iteration we want to offer two things:
1- A command line tool similar to clang-format which will semantically
rename a symbol (specified by a position in a file) in a set of input
files.
2- An API for doing the above task which can effectively be used to provide
this functionality for any editor (Emacs, Vim, CodeMirror, etc.).

This is totally awesome!

Please take advantage of the refactoring test suite I've created which
has lots of test cases for Rename, among others.

<https://github.com/LegalizeAdulthood/refactor-test-suite>

This is an ongoing effort but I've got 98 test cases for Rename alone
in there.

I am in the process of adding C++11 oriented test cases as time permits.

Issues and pull requests are most welcome on github.

You brought up an interesting case here. Although we are concerned about such case, it is not our intent to actually implement a cache/index service to be able to get the parsed files in this project. I think this could be an opportunity for a service to grow here that can be used not just by refactoring but also other things.

Thank you Richard. I am glad to see that there is already an effort on testing refactoring logic. We will look into this and use it.

Template specialization is also an option:

struct A { typedef int Goo; };
struct B { typedef char Foo; };
template struct X { typename T::Foo bar; };
template<> struct X { typename A::Goo bar; };
X
a;
X b;

For template specialization the renaming tool would have to generate code, which is a little beyond what I would want a renaming tool to do, and determine which type to specialize to, which could quickly become a very complicated problem.

Hi there,

Folks,

At Google we are working on a tool and set of APIs for refactoring C++
programs based on LibTooling. Particularly, we have targeted rename
refactoring for C++ as our first step.

In our first iteration we want to offer two things:
1- A command line tool similar to clang-format which will semantically
rename a symbol (specified by a position in a file) in a set of input files.
2- An API for doing the above task which can effectively be used to
provide this functionality for any editor (Emacs, Vim, CodeMirror, etc.).

This is something that C++ developers are definitely lacking. Just image
how neat it could be to rename some legacy class or method name in the LLVM
or clang codebase that has been bugging you ever since. :slight_smile: If it can be
used broadly, it could become a "game changer" in C/Objective C/C++ code
quality, e.g. for large frameworks like Qt, KDE, the Linux kernel.

I have one question that might be important to others that are interested,
too:
Will this only work on a codebase that can be compiled (or at least
strictly parsed) with clang, or is it more "fuzzy", so it will work with
e.g. a codebase that uses windows.h (that, afaik, can not be compiled with
clang - please correct me if I'm wrong)?
This might be a general LibTooling-question, though.

Greetings,

Daniel Albuschat

Hey Daniel,

Thanks for your optimism! I sure hope it’s prescient.

The code does not need to be compile-able, but it does need to be at least some-what parseable. From what I’ve seen, some errors will not prevent clang from building an AST, albeit an incomplete one. However, the renaming tool will only be able to rename symbols that are contained in the AST, and if the AST doesn’t contain some symbols, perhaps because some includes were missing and gosh darnit clang just has no idea what the heck is a DWORD??

Basically, in order for it to be renamed, a symbol must be understood by clang. And how Clang works around various error conditions is beyond my knowledge - you’ll have to go check the source!

Incidentally, I seem to recall that there is a section in the dragon book that talks about error recovery in parsers. I didn’t read that section (or book), but I imagine someone working on clang has.

I’m confident that did not answer your question entirely, so I apologize.

-Matt

Hi there,

Folks,

At Google we are working on a tool and set of APIs for refactoring C++
programs based on LibTooling. Particularly, we have targeted rename
refactoring for C++ as our first step.

In our first iteration we want to offer two things:
1- A command line tool similar to clang-format which will semantically
rename a symbol (specified by a position in a file) in a set of input files.
2- An API for doing the above task which can effectively be used to
provide this functionality for any editor (Emacs, Vim, CodeMirror, etc.).

This is something that C++ developers are definitely lacking. Just image
how neat it could be to rename some legacy class or method name in the LLVM
or clang codebase that has been bugging you ever since. :slight_smile: If it can be
used broadly, it could become a "game changer" in C/Objective C/C++ code
quality, e.g. for large frameworks like Qt, KDE, the Linux kernel.

I have one question that might be important to others that are interested,
too:
Will this only work on a codebase that can be compiled (or at least
strictly parsed) with clang, or is it more "fuzzy", so it will work with
e.g. a codebase that uses windows.h (that, afaik, can not be compiled with
clang - please correct me if I'm wrong)?

clang can parse windows.h, see
http://clang.llvm.org/docs/MSVCCompatibility.html

In article <CAMfx8+eLTwdAbHw64L-O71hChrsKJKBOsgB-cuYaXN4xU7Qp5Q@mail.gmail.com>,
    Matthew Plant <mplant@google.com> writes:

The code does not need to be compile-able, but it does need to be at least
some-what parseable.

In this regard, clang's rename won't be any worse than any other
language's rename.

For instance, refactoring tools are pretty mature in .NET/Java, but
none of them successfully rename an identifier whose type can't be parsed.
You're fundamentally missing the semantic information needed to decide
where else this identifier needs to be changed.

As far as C++ refactoring tools goes, *anything* based on clang's
parser is going to be light years ahead of other tools that are based
on ad-hoc homegrown parsers. I've been kicking the tires of C++
refactoring tools since roughly 2007 when I started refactoring a crusty
old code base. Obviously the older tools use their own parser. Just
having a parser that gets the job done correctly is difficult, never
mind keeping that parser up to date with the C++11 and C++14
standards.

This is where clang-based refactoring tools are really going to shine.
Leveraging the *production quality* parser and resulting lossless AST
just puts you miles ahead in the game.

Have you evaluated MSVC+Visually Assist plugin? While it's not IntelliJ, I have heard good things about it. I think that's a professional level baseline which other tools could work towards matching/beating..

This is OT for the list, but you and others are welcome to follow-up with me offlist for the refactoring stuff we are doing where I work.

The refactoring work I have experience with ends up relying on things which go sorta outside the typical libclang tooling realm. You really have a lot to track beyond just a single source file and dealing with that can get tricky.

Clang can parse windows.h. :slight_smile: Besides, windows.h is plain C, which makes
it easy. The C++ headers are the hard part.

My apologies if this is getting too off-topic for the list. I'm happy
to continue the conversation in private or on usenet or wherever is
more appropriate.

In article <53D9279E.8080702@pathscale.com>,
    =?ISO-8859-1?Q?=22C=2E_Bergstr=F6m=22?= <cbergstrom@pathscale.com> writes:

> This is where clang-based refactoring tools are really going to shine.
> Leveraging the *production quality* parser and resulting lossless AST
> just puts you miles ahead in the game.

Have you evaluated MSVC+Visually Assist plugin? While it's not IntelliJ,
I have heard good things about it. I think that's a professional level
baseline which other tools could work towards matching/beating..

Visual AssistX isn't bad and I use it regularly. However, it still
has a long way to go as a refactoring tool. VAX started as improved
IntelliSense and code navigation in VC6 days. That is still the area
where it primarily excels, but with MS improving code navigation on
their own I think they realized they needed to be more than code
navigation in order to continue gaining customers. They have added
limited refactoring support in more recent versions.

I posted a review of a recent build here:
<http://legalizeadulthood.wordpress.com/2014/06/13/refactoring-test-results-for-vax-10-8-2036-0-built-2014-05-22/>

Of the different refactoring tools out there that seem interesting, I
need to go look at QtCreator since it has quite a bunch of interesting
refactorings according to it's docs. When I posted the results linked
above, someone suggested that I look at it. Some coworkers had been
using it but it supports more refactorings than I was led to believe
from talking with them. It's on my list to evaluate.

I had previously known about the Eclipse CDT support, but I couldn't
find anyone to recommend and/or champion it as a refactoring tool and
it didn't seem like anyone was actively pushing CDT forward.

It's unfortunate that the different tools use different terminology for
the same transformation. There are a bunch of refactorings that are
specific to C/C++ for which we could all benefit from using the same
name. I've written up some of these on my blog over the years:
<http://legalizeadulthood.wordpress.com/category/refactoring/>

In the list below I've tried to choose the same name for the same
operation between tools, but it may not be the name they use in their
own documentation. I know of the following refactoring tools for C++:

clang-modernize: free tool by some very cool d00dz :slight_smile:
  <http://clang.llvm.org/extra/clang-modernize.html>
Refactorings:
  Replace Explicit For Loop with Range For Loop
  Replace 0/NULL with nullptr
  Replace Type Declaration with auto
  Decorate Virtual Methods with override
  Replace Pass by Reference with Pass By Value
  Replace auto_ptr with unique_ptr

Visual AssistX: commercial tool by Whole Tomato.
  <http://wholetomato.com>
Refactorings:
  Rename
  Extract Function/Method
  Move Implementation to/from Source/Header
  Change Signature

CodeRush for Visual Studio: commercial tool by Developer Xpress.
  Discontinued
  <https://www.devexpress.com/Products/CodeRush/cpp11.xml>
Refactorings:
  Supported a bunch, but they were quite buggy (I filed over 300
  bugs and this was the impetus for creating the refactoring tool
  test suite.)

Qt Creator: free tool by Qt project
  <https://qt-project.org/search/tag/qt~creator>
Refactorings:
  <https://qt-project.org/doc/qtcreator-3.0/creator-editor-refactoring.html>
  Rename
  Add Curly Braces
  Move Declaration Out of Condition
  Negate Condition
  Split Declaration
  Split If Statement
  Swap Operands (e.g. a > b becomes b < a)
  Convert Literal to Decimal
  Convert Literal to Hexadecimal
  Convert Literal to Octal
  Convert to ObjectiveC String Literal
  Replace char Literal with QLatin1Char Literal
  Replace C-style String Literal with QLatin1String Literal
  Mark as Translatable
  Reorder Parameters
  Extract Function
  Extract Constant as Function Parameter
  Convert Symbol to CamelCase
  Complete Switch Statement
  Reformat Pointers or References
  Move Implementation to/from Source/Header
  Optimize For Loop

Eclipse CDT: free tool by eclipse.org
  <http://www.eclipse.org>
Refactorings:
  <http://help.eclipse.org/juno/index.jsp?topic=%2Forg.eclipse.cdt.doc.user%2Ftasks%2Fcdt_t_rename.htm>
  <https://wiki.eclipse.org/images/a/a1/C%2B%2B_Refactoring_-_Now_for_Real.pdf>
  Rename
  Extract Constant
  Extract Function
  Extract Local Variable
  Move Implementation to/from Source/Header