C++11 migration tools

Hi everyone,

While it would be nice for everyone to start using C++11 right away, there is a huge volume of existing code that could also benefit from the new features. Just this week, someone offered a tool to add the override keyword to methods missing it, and I think we can expect more C++11-related refactoring tools in the near future. I think it would be a good idea to talk about how to coordinate similar efforts, which tools we would like to implement, what these tools should look like, and so on. As I’m new to Clang, your input will be especially useful for me, but a general discussion would probably benefit everyone working on something similar.

Desirable features of a source translation system include:

  • Permitting work on parallel migration tools without blocking in either direction.
  • A system for naming these tools and optionally turning them off or on. This means we would need a system to resolve the order of changes should they interact.
  • A subsystem for setting parameters of these tools.
  • A possible interactive mode for decisions a tool can’t make well (e.g. generated variable naming). This would be the first interactive component of Clang, I’m told.
  • A standard interface for these tools (e.g. they should be implemented as a FrontendAction, return tooling::Replacements, and can require that no errors occur during syntax checking).
  • A standard testing convention (e.g. both tests expecting conversions and tests expecting no change, to be run on the FrontendAction).

In particular, I am working on a tool to convert existing C++ for loops to take advantage of the new C++11 range-based syntax. I can imagine similar tools to replace const with constexpr, macro hacks with static_assert, and potentially other common refactorings.

Thoughts? Suggestions?

-Sam

Sam Panzer <panzer@google.com> writes:

In particular, I am working on a tool to convert existing C++ for loops to
take advantage of the new C++11 range-based syntax. I can imagine similar
tools to replace const with constexpr, macro hacks with static_assert, and
potentially other common refactorings.

Thoughts? Suggestions?

You really must watch this presentation, if you haven't already:

    http://www.youtube.com/watch?v=yuIOGfcOH0k

I’m that intern :slight_smile:

-Sam

In case anyone wanted to take a look, the attached patch includes the tool I’ve been working on. I create a new binary, c++migrate, which attempts to convert for loops in the source files given to it. Most of my focus has been on the FrontedAction, so I skirted all of the issues mentioned above by keeping the frontend interaction minimal (i.e. I just call Tooling::ClangTool::run), and the changes are just reported on standard output, if there are any to be made.

The tool can currently convert for loops that range over (1) statically allocated arrays, and (2) Clang-style iterator-based loops (with begin and end iterators defined). All loop variables need to be declared within the loop’s initialization step in order for it to be converted, though this requirement can potentially be eliminated. I’m working on converting iterator-based loops that call someContainer.end() on each iteration, since they’re probably the common case in many codebases.

Just for fun, I ran the tool over the 41 .cpp files in lib/Sema, and my tool found 71 convertible loops in 17 files. There is plenty more work to go, because it clearly missed some easy ones.

Any input or feedback is welcome!

-Sam

for-migration.patch (48.9 KB)

In case anyone wanted to take a look, the attached patch includes the tool I’ve been working on. I create a new binary, c++migrate, which attempts to convert for loops in the source files given to it. Most of my focus has been on the FrontedAction, so I skirted all of the issues mentioned above by keeping the frontend interaction minimal (i.e. I just call Tooling::ClangTool::run), and the changes are just reported on standard output, if there are any to be made.

The tool can currently convert for loops that range over (1) statically allocated arrays, and (2) Clang-style iterator-based loops (with begin and end iterators defined). All loop variables need to be declared within the loop’s initialization step in order for it to be converted, though this requirement can potentially be eliminated. I’m working on converting iterator-based loops that call someContainer.end() on each iteration, since they’re probably the common case in many codebases.

Just for fun, I ran the tool over the 41 .cpp files in lib/Sema, and my tool found 71 convertible loops in 17 files. There is plenty more work to go, because it clearly missed some easy ones.

Any input or feedback is welcome!

High-level observations:

  1. the handling of the rewrites; any reason not to use the Tooling/Refactoring stuff? Currently in the patch it looks to me like the files are not rewritten, but dumped to stdout
  2. is the reason not to use the matchers here that they’re not landed in mainline yet?

Cheers,
/Manuel

I tend to agree that this should use the Tooling/Refactoring stuff.

However, I’m curious about the best way to structure the location of migration candidates: AST matchers vs. visitor.

I can almost see the visitor pattern working really well here as each different construct can have a pile of migration logic dropped in… But if there is a need to connect dots between more distant constructs, that wouldn’t work so well… Not at all sure what would be best here.

I tend to agree that this should use the Tooling/Refactoring stuff.

However, I’m curious about the best way to structure the location of migration candidates: AST matchers vs. visitor.

I can almost see the visitor pattern working really well here as each different construct can have a pile of migration logic dropped in… But if there is a need to connect dots between more distant constructs, that wouldn’t work so well… Not at all sure what would be best here.

I’ve used a combination before - use matchers for the stuff where they work well, then write a very small easy-to-understand visitor if you need more. I think that brings down code size by quite a bit - obviously just a gut feeling here.

A C++11 migration tool is an excellent idea. Some specific suggestions:

  - It should be based on the Tooling/Refactoring infrastructure: this is our way forward for building source-to-source translation tools, and we should use it for our own refactoring tools (especially new ones we're just starting on now). Tooling handles (or will handle) the nitty-gritty details of staging rewrites, de-duplicating rewrites that occur in headers, and making sure that the re-written source still parses afterward.

  - It should be a single tool (C++11 migration) with a number of migration rules (for-each loops, override, nullptr), which we can expand over time. Just running the tool migrates everything we can, and it can have options for toggling individual features ("don't add override") or for targeting a specific common subset among compiles (migrate so that it works with Clang 3.1, GCC 4.6, and MSVC10). Bundling everything together makes for a much nicer user experience, because you write one command line and out pops a C++11 code-base.

  - It should live in a separate repository of Clang tools alongside (but not a part of) the main Clang repository.

  - It should "always" be good enough to take the LLVM+Clang codebase, translate it to C++11, then build a working compiler from that C++11 code-base. It's far better to have fewer/less aggressive transforms that never break code than it is to have eager transforms that do break code. And we have a perfect code-base to validate on.

As a way forward, I think your tool to convert existing C++ for loops to the range-based for syntax would make a great first transformation, but we should put it into the tooling/refactoring context and label it as "the C++11 migration tool". Once the framework is in place, I bet other transformations would be contributed pretty quickly.

  - Doug

Thank you for giving a concrete proposal!

One other feature to add would be a degree of confidence for any given change (Certain success, will probably work, risky), so that a command-line flag could control how aggressive any transformations are. To be more specific, one option is modeling the tool after diagnostics:

  • Transformations are grouped in a new tool, source-migrate
  • Each transformation should be named, so it can be turned on/off, such as -range-for-loop or -noconst-expr
  • Each transformation should be grouped, so that all C++11 transformations could be enabled or disabled in one go, e.g. -Tnocxx11
  • Each transformation should have some way to specify compiler requirements, such as requiring C++ mode.

I think this is a good starting place for a code migration framework - I will try setting up a tool with proper command-line flags.

-Sam

Thank you for giving a concrete proposal!

One other feature to add would be a degree of confidence for any given change (Certain success, will probably work, risky), so that a command-line flag could control how aggressive any transformations are.

I don't think that we should admit any transformation unless we know that the resulting code type-checks. The tooling/refactoring stuff should provide that for us, and we'll need to be careful with some transformations (like the for-range loop transformation) that could break the semantics of the program if we're not careful. Migration tools aren't useful if they're going to break my code without telling me.

To be more specific, one option is modeling the tool after diagnostics:
- Transformations are grouped in a new tool, source-migrate
- Each transformation should be named, so it can be turned on/off, such as -range-for-loop or -noconst-expr
- Each transformation should be grouped, so that all C++11 transformations could be enabled or disabled in one go, e.g. -Tnocxx11
- Each transformation should have some way to specify compiler requirements, such as requiring C++ mode.

I think this is a good starting place for a code migration framework - I will try setting up a tool with proper command-line flags.

Yes, this seems completely reasonable.

  - Doug

FWIW, I’m a bit hesitant about lumping all migrations into a single tool… My preference is for a tool that is at least somewhat focused on a migration domain – in this case, C++11.

FWIW, I’m a bit hesitant about lumping all migrations into a single tool… My preference is for a tool that is at least somewhat focused on a migration domain – in this case, C++11.

I completely misread the post I replied to. Yes, I agree with Chandler: the tool should be a C++11 migration tool, so it has a specific, coherent purpose. It should have options for adopting specific C++11 features.

  • Doug

Douglas Gregor wrote:

Thoughts? Suggestions?

A C++11 migration tool is an excellent idea. Some specific suggestions:

- It should be a single tool (C++11 migration) with a number of migration
rules (for-each loops, override, nullptr), which we can expand over time.
Just running the tool migrates everything we can, and it can have options
for toggling individual features ("don't add override")

This is also the idea for the Qt porting tool I wrote. It's kind of hard to
get the command line stuff to a sensible level:

GitHub - KDABLabs/Qt4to5 if you haven't seen it already.

or for targeting a
specific common subset among compiles (migrate so that it works with Clang
3.1, GCC 4.6, and MSVC10). Bundling everything together makes for a much
nicer user experience, because you write one command line and out pops a
C++11 code-base.

If I was porting a Qt based codebase to use the override keyword, I wouldn't
want to insert the override keyword into the code, but the Q_DECL_OVERRIDE
macro instead:

https://qt.gitorious.org/qt/qtbase/blobs/master/src/corelib/global/qcompilerdetection.h

(and similar for some other features). So at some level, it might make sense
to make that configurable.

From my point of view, apart from a command line tool, I want to create a Qt

gui to do all of these kinds of tasks - Not just porting to Qt 5, but also
adding c++11 features where appropriate, and of course with toggling.

We also discussed the issue of where the Qt 4 to 5 porting tool (even the
command line one above) should live, and I think that this extra repo could
be a good place for it.

Can I add it to such a repo when it exists?

Thanks,

Steve.

How is it conceptually different than the Objective-C modernizer? Why not have both in one tool?

-Chris

How is it conceptually different than the Objective-C modernizer?

It isn’t.

Why not have both in one tool?

Because that tool wouldn’t have any focus. A “C++11 migrator” is something we can rally behind as a useful tool with an obvious scope, as is an “Objective-C modernizer”. But a generic source migrator is an abstraction with no obvious usefulness or scope.

Obviously, these two tools will share nearly all of their infrastructure, except for the actual migration rules. And, in doing so, we’ll demonstrate how best to create your own source-to-source translation tool.

  • Doug

One thing I've been wondering is:

How do we preserve these invariants once we've established them? In
the case of something like the override-adding migration it seems like
that could be implemented as a trivial style checker, enforced at
compile time once you migrate the codebase - and with a builtin fixit.
Once you have such a style checker, the migration is really just a
matter of running the style checker in auto-fix mode - at which point
there's no need for an explicit migration tool.

I'm not sure if this is true of the for loop-ifier, though - I assume
it might be too expensive to run at compile time as a style
enforcement/auto-formatter. (though we have the for condition warning
in-place now that's doing work across the body of a for loop - so
perhaps such a check isn't too expensive to do up-front on every
build)

- David

I'm specifically thinking that they should be the same *executable* - of course they'd be different "tools" with different audiences. The interface to the two tools should be virtually identical. Why is it better to have an array of similar tools (that do different rewrites) as an array of executables instead of as a single executable with a mode switch?

-Chris

Because different users want different tools. The two language migrators presumably aren't the only tools that would go into this side repository; should the eventual code-reformatting tool go into the same executable as well? Are you looking to optimize for build time here, or is there some other reason?

  - Doug

I’m trying to find the right level of abstraction here. A similar argument could be that we should have one tool for each static analyzer checker, because some users would want one but not all checks. What is the downside of having all the rewriters be in the same tool?

-Chris

Because that tool wouldn’t have any focus. A “C++11 migrator” is something we can rally behind as a useful tool with an obvious scope, as is an “Objective-C modernizer”. But a generic source migrator is an abstraction with no obvious usefulness or scope.

Obviously, these two tools will share nearly all of their infrastructure, except for the actual migration rules. And, in doing so, we’ll demonstrate how best to create your own source-to-source translation tool.

I’m specifically thinking that they should be the same executable - of course they’d be different “tools” with different audiences. The interface to the two tools should be virtually identical. Why is it better to have an array of similar tools (that do different rewrites) as an array of executables instead of as a single executable with a mode switch?

Because different users want different tools. The two language migrators presumably aren’t the only tools that would go into this side repository; should the eventual code-reformatting tool go into the same executable as well? Are you looking to optimize for build time here, or is there some other reason?

I’m trying to find the right level of abstraction here. A similar argument could be that we should have one tool for each static analyzer checker, because some users would want one but not all checks.

From a usage perspective, most users will want to run “the static analyzer” and get the default set of checks in one run. They may eventually fine-tune which checkers get run, but most won’t, and they benefit from having all of the checkers bundled into one tool with an obvious purpose.

Similarly for the migration tools. Most users want to migrate to C++11, or migrate to modern Objective-C. The languages are different enough that it doesn’t seem we’d benefit much from having one tool capable of migrating to modern Objective-ARC++11 in one shot… although you’re wearing me down on that particular issue. A code “modernizer” that covers the various languages doesn’t fall afoul of my justification below…

What is the downside of having all the rewriters be in the same tool?

That tool has no obvious purpose or scope; it’s just a grab bag of rewriters that happened to land in the tree. That’s very hard to communicate to potential users. It also doesn’t fit well with the tooling model, which intends to make it easier to build one-off, specialized tools, which are likely to be easier to use than a single executable with a pile of command-line options.

I do also worry about quality: it’s far easier to qualify and test a specialized tool than it is to test all of the emergent combinations from an overly-general tool, especially if the robustness of various pieces varies wildly (which I expect it will). IIRC, the ARC migrator and ObjC modernizer needed some massaging to make all of the rewrites play nicely together, and that’s a fairly small set of rules written by a very small number of people.

  • Doug