Qt 4 to Qt 5 porting tool with the tooling branch

Hi there,

A few weeks ago I was getting started with the tooling branch to do some
porting from Qt 4 to Qt 5 (I was off topic though :)):

http://thread.gmane.org/gmane.comp.compilers.clang.devel/20433/focus=20575

I wrote a blog post with some results here:

http://www.kdab.com/automated-porting-from-qt-4-to-qt-5/

I also wrote the following to colleagues, and thought I'd get some fact
checking on it from the people who might know, so that the new API in the
branch can be documented better:

[Begin copy-paste]

The tool works by using the CMake integration to find out what command
should be used to compile the file passed to it for porting. It then parses
that file and runs a visitor pattern over the parsed AST. While visiting, it
checks to see if the node matches what was specified. When it gets a match
it invokes the run() method of a callback.

The run() method is the one that sets up a replacement for the matched AST
node with some text. The text can be the new method being ported to, or some
other generated string.

The matched nodes can be extracted by their id and type. The type takes a
while to get used to because there are many of them describing all the
different parts of C++ syntax.

http://clang.llvm.org/doxygen/classclang_1_1Decl.html

Once we have the replacement text, we add it to the Replacements structure,
which is processed after matching to to the actual changes.

Knowing how to write a matcher or what type to use to extract a node by id
can be tricky, but can be made easier by running 'clang -cc1 -ast-dump
myfile.cpp' to see what types the parser sees.

After that, the matches are actually nested function calls (the
capitalisation style for functions seems funny to use Qt developers). Nested
matches are usually specializations.

So

    Expression()

matches any expression. As does

    Expression(Expression())

The inner one is redundant.

    Expression(Function())

matches an expression which is a function, not an expression which contains
inside it a function.

child relationships are found with has()

    Expression(has(Function()))

finds for example

    foo(bar())

where foo() would be the Expression() and bar() would be the Function().

To use these in the callback, we would write

    Id("expr", Expression(has(Id("func", Function()))))

and then

    const Expr *Outer =
        Result.Nodes.getStmtAs<Expr>("expr");
    const Expr *Inner =
        Result.Nodes.getStmtAs<Expr>("func");

Match expressions next to each other are by default grouped in an AllOf
expression.

So

    Method(
      HasName("dataChanged"),
      OfClass(
        IsDerivedFrom("QAbstractItemView")
      )
    )

means 'A method with the name dataChanged where that method is on a class
derived from QAbstractItemView'

You do get interesting compile errors sometimes when writing matcher
constructs. With enough playing around and reading other matchers I've been
able to get it to do what I want each time eventually, and the result was
obvious why it was correct.

Apart from the code I've attached, to see what's possible (built in), you
can
reference the following:

# This is the one referenced in the youtube talk above:
http://llvm.org/svn/llvm-project/cfe/branches/tooling/tools/remove-cstr-
calls/RemoveCStrCalls.cpp

# Most of the implementation
http://llvm.org/svn/llvm-
project/cfe/branches/tooling/include/clang/ASTMatchers/ASTMatchers.h

# Unit tests
http://llvm.org/svn/llvm-
project/cfe/branches/tooling/unittests/ASTMatchers/ASTMatchersTest.cpp

[End copy-paste]

I was also wondering if there is an easy way to only modify the code inside
the repo and not included headers? In my tool I require the user to pass the
source-dir on the command line, and then manually check paths to see if they
are below it, but this is error-prone.

Additionally, many of the concepts I encoded are generic, not Qt specific
(such as renaming methods, enums etc). It seems that these could be re-
usable, so we could either add some version of this to the clang examples,
and/or see if we can define a higher level collection of ways to refactor
the code. I presume google also has some collection/repo of such things
already?

Finally, the suggestion here:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/20433/focus=20759

was not quite what I needed to use to correctly match the code used. The
matcher I used was this:

  Finder.addMatcher(
    Id("call",
      Call(
        Callee(Function(HasName(QtEscapeFunction))),
        HasArgument(
          0,
          AnyOf(
            BindTemporaryExpression(has(Id("ctor", ConstructorCall()))),
            BindTemporaryExpression(has(Id("operator",
OverloadedOperatorCall()))),
            Id("operator", OverloadedOperatorCall()),
            Id("ctor", ConstructorCall()),
            Id("expr", Expression())
          )
        )
      )
    ),
    &Callback);

So I had to add some extra BindTemporaryExpression() into the expression
which I would have preferred to be automatic, like I think some implicit
cast stuff is.

Thanks for the guidance so far! :),

Steve.

Hi Stephen,

first, really cool blog post, and thanks for providing feedback & docs and helping us making the tools more awesome - we really appreciate this!

Hi there,

A few weeks ago I was getting started with the tooling branch to do some
porting from Qt 4 to Qt 5 (I was off topic though :)):

http://thread.gmane.org/gmane.comp.compilers.clang.devel/20433/focus=20575

I wrote a blog post with some results here:

http://www.kdab.com/automated-porting-from-qt-4-to-qt-5/

I also wrote the following to colleagues, and thought I’d get some fact
checking on it from the people who might know, so that the new API in the
branch can be documented better:

[Begin copy-paste]

The tool works by using the CMake integration to find out what command
should be used to compile the file passed to it for porting. It then parses
that file and runs a visitor pattern over the parsed AST. While visiting, it
checks to see if the node matches what was specified. When it gets a match
it invokes the run() method of a callback.

The run() method is the one that sets up a replacement for the matched AST
node with some text. The text can be the new method being ported to, or some
other generated string.

The matched nodes can be extracted by their id and type. The type takes a
while to get used to because there are many of them describing all the
different parts of C++ syntax.

http://clang.llvm.org/doxygen/classclang_1_1Decl.html

Once we have the replacement text, we add it to the Replacements structure,
which is processed after matching to to the actual changes.

Knowing how to write a matcher or what type to use to extract a node by id
can be tricky, but can be made easier by running ‘clang -cc1 -ast-dump
myfile.cpp’ to see what types the parser sees.

After that, the matches are actually nested function calls (the
capitalisation style for functions seems funny to use Qt developers). Nested

Yea, that’s not set yet - please chime in with your color of the bikeshed on the review thread for the ast matchers :wink: I seriously don’t care, so I want people who care about one way or the other to jump in with good arguments of why their preferred way is superior :smiley:

matches are usually specializations.

So

Expression()

matches any expression. As does

Expression(Expression())

The inner one is redundant.

Expression(Function())

matches an expression which is a function, not an expression which contains
inside it a function.

child relationships are found with has()

Expression(has(Function()))

finds for example

foo(bar())

where foo() would be the Expression() and bar() would be the Function().

To use these in the callback, we would write

Id(“expr”, Expression(has(Id(“func”, Function()))))

and then

const Expr *Outer =
Result.Nodes.getStmtAs(“expr”);
const Expr *Inner =
Result.Nodes.getStmtAs(“func”);

Match expressions next to each other are by default grouped in an AllOf
expression.

So

Method(
HasName(“dataChanged”),
OfClass(
IsDerivedFrom(“QAbstractItemView”)
)
)

means ‘A method with the name dataChanged where that method is on a class
derived from QAbstractItemView’

You do get interesting compile errors sometimes when writing matcher
constructs. With enough playing around and reading other matchers I’ve been
able to get it to do what I want each time eventually, and the result was
obvious why it was correct.

Apart from the code I’ve attached, to see what’s possible (built in), you
can
reference the following:

This is the one referenced in the youtube talk above:

http://llvm.org/svn/llvm-project/cfe/branches/tooling/tools/remove-cstr-
calls/RemoveCStrCalls.cpp

Most of the implementation

http://llvm.org/svn/llvm-
project/cfe/branches/tooling/include/clang/ASTMatchers/ASTMatchers.h

Unit tests

http://llvm.org/svn/llvm-
project/cfe/branches/tooling/unittests/ASTMatchers/ASTMatchersTest.cpp

[End copy-paste]

All accurate.

I was also wondering if there is an easy way to only modify the code inside
the repo and not included headers? In my tool I require the user to pass the
source-dir on the command line, and then manually check paths to see if they
are below it, but this is error-prone.

If with “manually”, you mean “do a regexp match on the path”, then yep, that’s what I also do.
Why is that error prone?

Additionally, many of the concepts I encoded are generic, not Qt specific
(such as renaming methods, enums etc). It seems that these could be re-
usable, so we could either add some version of this to the clang examples,
and/or see if we can define a higher level collection of ways to refactor
the code. I presume google also has some collection/repo of such things
already?

Unfortunately not. Most of the engineers using those tools at Google have pretty non-standard and complex problems, at which point other costs start to dwarf writing the matchers and replacements.

But we are starting to work on a more principled approach that includes ready-made refactorings, targeted towards small scale use cases (“I want to rename what is under my cursor”, we actually have a prototype for this in the branch).

Finally, the suggestion here:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/20433/focus=20759

was not quite what I needed to use to correctly match the code used. The
matcher I used was this:

Finder.addMatcher(
Id(“call”,
Call(
Callee(Function(HasName(QtEscapeFunction))),
HasArgument(
0,
AnyOf(
BindTemporaryExpression(has(Id(“ctor”, ConstructorCall()))),
BindTemporaryExpression(has(Id(“operator”,
OverloadedOperatorCall()))),
Id(“operator”, OverloadedOperatorCall()),
Id(“ctor”, ConstructorCall()),
Id(“expr”, Expression())
)
)
)
),
&Callback);

So I had to add some extra BindTemporaryExpression() into the expression
which I would have preferred to be automatic, like I think some implicit
cast stuff is.

Yep, the temporary stuff is not yet perfect. Ideas welcome, preferably when we have it in mainline :wink: (there’s a review thread currently open - afaik nobody answered yet).

Cheers,
/Manuel

Manuel Klimek wrote:

Hi Stephen,

first, really cool blog post, and thanks for providing feedback & docs and
helping us making the tools more awesome - we really appreciate this!

No problem. Clang based porting tools is a cool party :).

After that, the matches are actually nested function calls (the
capitalisation style for functions seems funny to use Qt developers).
Nested

Yea, that's not set yet - please chime in with your color of the bikeshed
on the review thread for the ast matchers :wink: I seriously don't care, so I
want people who care about one way or the other to jump in with good
arguments of why their preferred way is superior :smiley:

Is the review thread on cfe-commits? I'm not really familiar with how
reviews are done in this community yet. Does this mean that the content of
the branch is being rebased and applied to trunk?

I don't think I'll jump in on this bikeshed aspect though :). The case is
something I noted for colleagues who use Qt all day, where the convention is
methods/functions = lowercase first letter. If you want to be 'compatible'
with Qt, they would need to be changed, but otherwise it's not important :).

[End copy-paste]

All accurate.

Great, thanks.

I was also wondering if there is an easy way to only modify the code
inside the repo and not included headers? In my tool I require the user
to pass the
source-dir on the command line, and then manually check paths to see if
they
are below it, but this is error-prone.

If with "manually", you mean "do a regexp match on the path", then yep,
that's what I also do.
Why is that error prone?

There could be aspects of relative/absolute paths, symlinks, os-specific
issues, bugs in the regexp, etc.

Additionally, many of the concepts I encoded are generic, not Qt specific
(such as renaming methods, enums etc). It seems that these could be re-
usable, so we could either add some version of this to the clang
examples, and/or see if we can define a higher level collection of ways
to refactor the code. I presume google also has some collection/repo of
such things already?

Unfortunately not. Most of the engineers using those tools at Google have
pretty non-standard and complex problems, at which point other costs start
to dwarf writing the matchers and replacements.

But we are starting to work on a more principled approach that includes
ready-made refactorings, targeted towards small scale use cases ("I want
to rename what is under my cursor", we actually have a prototype for this
in the branch).

The ClangRename example?

Yes, that's also useful in the context of an IDE I think, but in terms of
our goals of a 'fire and forget' tool that does the boring parts of a
porting job, it's not really so important to rename things by symbol under
cursor or at a location in a file.

Finally, the suggestion here:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/20433/focus=20759

was not quite what I needed to use to correctly match the code used. The
matcher I used was this:

Finder.addMatcher(
   Id("call",
     Call(
       Callee(Function(HasName(QtEscapeFunction))),
       HasArgument(
         0,
         AnyOf(
           BindTemporaryExpression(has(Id("ctor", ConstructorCall()))),
           BindTemporaryExpression(has(Id("operator",
OverloadedOperatorCall()))),
           Id("operator", OverloadedOperatorCall()),
           Id("ctor", ConstructorCall()),
           Id("expr", Expression())
         )
       )
     )
   ),
   &Callback);

So I had to add some extra BindTemporaryExpression() into the expression
which I would have preferred to be automatic, like I think some implicit
cast stuff is.

Yep, the temporary stuff is not yet perfect. Ideas welcome, preferably
when we have it in mainline :wink: (there's a review thread currently open -
afaik nobody answered yet).

I'd like to have a look if you can link it.

Thanks,

Steve.

Manuel Klimek wrote:

Hi Stephen,

first, really cool blog post, and thanks for providing feedback & docs and
helping us making the tools more awesome - we really appreciate this!

No problem. Clang based porting tools is a cool party :).

After that, the matches are actually nested function calls (the
capitalisation style for functions seems funny to use Qt developers).
Nested

Yea, that’s not set yet - please chime in with your color of the bikeshed
on the review thread for the ast matchers :wink: I seriously don’t care, so I
want people who care about one way or the other to jump in with good
arguments of why their preferred way is superior :smiley:

Is the review thread on cfe-commits? I’m not really familiar with how
reviews are done in this community yet. Does this mean that the content of
the branch is being rebased and applied to trunk?

Yep. The content of the branch is “rebased” (well, as far as you can do that in svn) continuously anyway (usually about once per week, or when somebody screams at me because the compile breaks).
http://lists.cs.uiuc.edu/pipermail/cfe-commits/Week-of-Mon-20120611/059010.html

I don’t think I’ll jump in on this bikeshed aspect though :). The case is
something I noted for colleagues who use Qt all day, where the convention is
methods/functions = lowercase first letter. If you want to be ‘compatible’
with Qt, they would need to be changed, but otherwise it’s not important :).

Fair enough…

[End copy-paste]

All accurate.

Great, thanks.

I was also wondering if there is an easy way to only modify the code
inside the repo and not included headers? In my tool I require the user
to pass the
source-dir on the command line, and then manually check paths to see if
they
are below it, but this is error-prone.

If with “manually”, you mean “do a regexp match on the path”, then yep,
that’s what I also do.
Why is that error prone?

There could be aspects of relative/absolute paths, symlinks, os-specific
issues, bugs in the regexp, etc.

Additionally, many of the concepts I encoded are generic, not Qt specific
(such as renaming methods, enums etc). It seems that these could be re-
usable, so we could either add some version of this to the clang
examples, and/or see if we can define a higher level collection of ways
to refactor the code. I presume google also has some collection/repo of
such things already?

Unfortunately not. Most of the engineers using those tools at Google have
pretty non-standard and complex problems, at which point other costs start
to dwarf writing the matchers and replacements.

But we are starting to work on a more principled approach that includes
ready-made refactorings, targeted towards small scale use cases (“I want
to rename what is under my cursor”, we actually have a prototype for this
in the branch).

The ClangRename example?

Yes, that’s also useful in the context of an IDE I think, but in terms of
our goals of a ‘fire and forget’ tool that does the boring parts of a
porting job, it’s not really so important to rename things by symbol under
cursor or at a location in a file.

Good point.

Finally, the suggestion here:

http://thread.gmane.org/gmane.comp.compilers.clang.devel/20433/focus=20759

was not quite what I needed to use to correctly match the code used. The
matcher I used was this:

Finder.addMatcher(
Id(“call”,
Call(
Callee(Function(HasName(QtEscapeFunction))),
HasArgument(
0,
AnyOf(
BindTemporaryExpression(has(Id(“ctor”, ConstructorCall()))),
BindTemporaryExpression(has(Id(“operator”,
OverloadedOperatorCall()))),
Id(“operator”, OverloadedOperatorCall()),
Id(“ctor”, ConstructorCall()),
Id(“expr”, Expression())
)
)
)
),
&Callback);

So I had to add some extra BindTemporaryExpression() into the expression
which I would have preferred to be automatic, like I think some implicit
cast stuff is.

Yep, the temporary stuff is not yet perfect. Ideas welcome, preferably
when we have it in mainline :wink: (there’s a review thread currently open -
afaik nobody answered yet).

I’d like to have a look if you can link it.

See link above. Feedback welcome!

Cheers,
/Manuel