Criticism over a clang tool to reorder fields on constructor declarations

Hello,

In the past month I’ve been working in a clang tool to refactor the fields which are declared on a class constructor to match the order declared on the header file, hence resolving the annoying Wreorder warning. After a series of difficulties finally I have something that is close to the ideal, and would like to someone more acquainted with clang internals to check if it’s the best form to implement this.

Also, I’ve got some problems that are still remaining:

1 - I’ve been following the samples I’ve managed to find through the internet, but every time I try to run the tool using the following command line (I’m intentionally not providing any include paths as I don’t want to std to be parsed):

…/cpp-fixya <SOURCE.CPP> –

I get the following errros:

: CommandLine Error: Argument ‘version’ defined more than once!
: CommandLine Error: Argument ‘print-all-options’ defined more than once!
: CommandLine Error: Argument ‘print-options’ defined more than once!
: CommandLine Error: Argument ‘help-hidden’ defined more than once!
: CommandLine Error: Argument ‘help’ defined more than once!
: CommandLine Error: Argument ‘debug-only’ defined more than once!
: CommandLine Error: Argument ‘debug-buffer-size’ defined more than once!
: CommandLine Error: Argument ‘debug’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘version’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘print-all-options’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘print-options’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘help-hidden’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘help’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘debug-only’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘debug-buffer-size’ defined more than once!
cpp-fixya: CommandLine Error: Argument ‘debug’ defined more than once!

2 - I haven’t found a way to determine the class name for nested classes. How would be the best way to detect and generate the correct naming to a nested class? I’ve been using:

d->getNameAsString()

but it only returns the enclosing class

3 - How do I determine if a class declaration is on a header or a source file? I mean that because if it’s a source-only declaration (i.e. private classes) the c++ compiler wont accept constructors in the form:

Constructor::Constructor() : <init_declrs> {
}

only in the form:

Constructor() : <init_declrs> {
}

Here is the code:

https://gist.github.com/3300494

Thanks in advance,

Victor

Hi Victor,

if I remember correctly we've been talking in IRC - have all your
issues been resolved?

Cheers,
/Manuel

Hello Manuel,

Sadly, no. I’m struggling with some problems now:

1 - How do I detect and generate the class name for nested classes? I’m using CXXConstructorDecl getNameAsString() but it only returns to me the name of the enclosing class.

2 - I would like to not replace constructor initializers that were set up with macros with their values:

#define MACRO 10

Class::Class() : myval(MACRO)

after I rewrite the outputted code is:

Class::Class() : mvyal(10)

Is there a way to avoid this? I’ve tried to use Compiler.getPreprocessor().SetMacroExpansionOnlyInDirectives() but this simply removes the myval definition from the generated AST

3 - Variables that are default initialized on the header are stored with it. This is a problem as the compiler complains when I rewrite the source file with the initialization on the source but it was previously defined on the header. Is there to detect if the default initializer is on the method definition or in the method declaration?

4 - I’ve found a bug on the DeclPrinter class, and fixed it. The attachment is the patch.

5 - The current way i’m using to replace the old constructor declaration with the new declaration is with a new range:

SourceRange rng(d->getSourceRange().getBegin(), d->getBody()->getSourceRange().getBegin().getLocWithOffset(-1));

This was the only way I’ve managed to get the correct end of the end of the initializer list. This is an inconvenience as I would like to not process the body and hence make the tool faster, but if I disable body parsing I can’t get its SourceRange.

2012/8/17 Manuel Klimek <klimek@google.com>

DeclPrinter.patch (521 Bytes)

Hi,

interesting questions. I'm new to clang, so if there is a better solution, let me know...

1 - How do I detect and generate the class name for nested classes? I'm
using CXXConstructorDecl getNameAsString() but it only returns to me the
name of the enclosing class.

Have you tried getQualifiedNameAsString()?
Maybe you're pointing an Element of the Enclosing class?

Also, maybe getParent() can point you at the correct Class Node.

regards,

Jens Weller

Hello Manuel,

Sadly, no. I'm struggling with some problems now:

1 - How do I detect and generate the class name for nested classes? I'm
using CXXConstructorDecl getNameAsString() but it only returns to me the
name of the enclosing class.

As Jends said, getQualifiedNameAsString.

2 - I would like to not replace constructor initializers that were set up
with macros with their values:

#define MACRO 10

Class::Class() : myval(MACRO)

after I rewrite the outputted code is:

Class::Class() : mvyal(10)

Is there a way to avoid this? I've tried to use
Compiler.getPreprocessor().SetMacroExpansionOnlyInDirectives() but this
simply removes the myval definition from the generated AST

Source locations have the notion of an expansion and a spelling
location. To get the code that was originally written, you'll need to
fiddle around with those. In your case, MACRO is the expansion
location of the literal 10.

3 - Variables that are default initialized on the header are stored with it.
This is a problem as the compiler complains when I rewrite the source file
with the initialization on the source but it was previously defined on the
header. Is there to detect if the default initializer is on the method
definition or in the method declaration?

I would assume so - I don't where to look of the top of my head though.

4 - I've found a bug on the DeclPrinter class, and fixed it. The attachment
is the patch.

I suggest you send an email to cfe-commits with the subject [PATCH]
Fix problem with DeclPrinter, containing the patch and what was wrong
(and ideally an automated regression test :wink:

5 - The current way i'm using to replace the old constructor declaration
with the new declaration is with a new range:

SourceRange rng(d->getSourceRange().getBegin(),
d->getBody()->getSourceRange().getBegin().getLocWithOffset(-1));

This was the only way I've managed to get the correct end of the end of the
initializer list. This is an inconvenience as I would like to not process
the body and hence make the tool faster, but if I disable body parsing I
can't get its SourceRange.

As far as I understand (and I might well be wrong there) skipping body
parsing has some very concrete use cases, and source-to-source
transformations is not on of them...

Cheers,
/Manuel

2012/8/17 Manuel Klimek <klimek@google.com>

Hello Manuel,

Sadly, no. I’m struggling with some problems now:

1 - How do I detect and generate the class name for nested classes? I’m
using CXXConstructorDecl getNameAsString() but it only returns to me the
name of the enclosing class.

As Jends said, getQualifiedNameAsString.

Thanks! It worked

2 - I would like to not replace constructor initializers that were set up
with macros with their values:

#define MACRO 10

Class::Class() : myval(MACRO)

after I rewrite the outputted code is:

Class::Class() : mvyal(10)

Is there a way to avoid this? I’ve tried to use
Compiler.getPreprocessor().SetMacroExpansionOnlyInDirectives() but this
simply removes the myval definition from the generated AST

Source locations have the notion of an expansion and a spelling
location. To get the code that was originally written, you’ll need to
fiddle around with those. In your case, MACRO is the expansion
location of the literal 10.

3 - Variables that are default initialized on the header are stored with it.
This is a problem as the compiler complains when I rewrite the source file
with the initialization on the source but it was previously defined on the
header. Is there to detect if the default initializer is on the method
definition or in the method declaration?

I would assume so - I don’t where to look of the top of my head though.

4 - I’ve found a bug on the DeclPrinter class, and fixed it. The attachment
is the patch.

I suggest you send an email to cfe-commits with the subject [PATCH]
Fix problem with DeclPrinter, containing the patch and what was wrong
(and ideally an automated regression test :wink:

5 - The current way i’m using to replace the old constructor declaration
with the new declaration is with a new range:

SourceRange rng(d->getSourceRange().getBegin(),
d->getBody()->getSourceRange().getBegin().getLocWithOffset(-1));

This was the only way I’ve managed to get the correct end of the end of the
initializer list. This is an inconvenience as I would like to not process
the body and hence make the tool faster, but if I disable body parsing I
can’t get its SourceRange.

As far as I understand (and I might well be wrong there) skipping body
parsing has some very concrete use cases, and source-to-source
transformations is not on of them…

Cheers,
/Manuel

Manuel, I think I’ve been thinking the wrong way. As the only thing I need is to reorder the initialization, I do not need to mess with the original source like I was doing and just concatenate the reordered, original CXXCtorInitializer string data. So I’ve been wondering how I’m supposed to recover the source string based on a SourceLocation. Should I use SourceManager’s getCharacterData? There is a way to recover the entire string based on a sourceRange?

Cheers,

Victor

> Hello Manuel,
>
> Sadly, no. I'm struggling with some problems now:
>
> 1 - How do I detect and generate the class name for nested classes? I'm
> using CXXConstructorDecl getNameAsString() but it only returns to me
> the
> name of the enclosing class.

As Jends said, getQualifiedNameAsString.

Thanks! It worked

> 2 - I would like to not replace constructor initializers that were set
> up
> with macros with their values:
>
> #define MACRO 10
>
> Class::Class() : myval(MACRO)
>
> after I rewrite the outputted code is:
>
> Class::Class() : mvyal(10)
>
> Is there a way to avoid this? I've tried to use
> Compiler.getPreprocessor().SetMacroExpansionOnlyInDirectives() but this
> simply removes the myval definition from the generated AST

Source locations have the notion of an expansion and a spelling
location. To get the code that was originally written, you'll need to
fiddle around with those. In your case, MACRO is the expansion
location of the literal 10.

> 3 - Variables that are default initialized on the header are stored with
> it.
> This is a problem as the compiler complains when I rewrite the source
> file
> with the initialization on the source but it was previously defined on
> the
> header. Is there to detect if the default initializer is on the method
> definition or in the method declaration?

I would assume so - I don't where to look of the top of my head though.

>
> 4 - I've found a bug on the DeclPrinter class, and fixed it. The
> attachment
> is the patch.

I suggest you send an email to cfe-commits with the subject [PATCH]
Fix problem with DeclPrinter, containing the patch and what was wrong
(and ideally an automated regression test :wink:

>
> 5 - The current way i'm using to replace the old constructor declaration
> with the new declaration is with a new range:
>
> SourceRange rng(d->getSourceRange().getBegin(),
> d->getBody()->getSourceRange().getBegin().getLocWithOffset(-1));
>
> This was the only way I've managed to get the correct end of the end of
> the
> initializer list. This is an inconvenience as I would like to not
> process
> the body and hence make the tool faster, but if I disable body parsing I
> can't get its SourceRange.

As far as I understand (and I might well be wrong there) skipping body
parsing has some very concrete use cases, and source-to-source
transformations is not on of them...

Cheers,
/Manuel

Manuel, I think I've been thinking the wrong way. As the only thing I need
is to reorder the initialization, I do not need to mess with the original
source like I was doing and just concatenate the reordered, original
CXXCtorInitializer string data. So I've been wondering how I'm supposed to
recover the source string based on a SourceLocation. Should I use
SourceManager's getCharacterData? There is a way to recover the entire
string based on a sourceRange?

RefactoringTool in lib/Tooling has some examples of how to deal with
that. They're not perfect yet, as source locations can be a little
fiddly to handle. In general, SourceManager's getCharacterData sounds
like a good way to go as long as it works for your use case.

Cheers,
/Manuel

Hi Manuel,

my rewrite basing only in source replacements is complete, but now I’m facing some strange side effects on some files. Consider the example:

header ----------

#include <B.h> // B defines a simple class named B

class A {
public:
A();

float x;
B b;
};

source ---------------

}

the strange behavior that I’m having is that if in the header file “x” is declared before “b”, when processing the source location for the initializer list on CXXConstructorDecl*, the source location for it get’s completely wrong pointing to right before the constructor name: “A::”. If i declare “b” before “x” then the source location points to "A::A() : " which was expected. Also, the method getNumCtorInitializers on this declaration is returning to me 2 ( should be 1 as only x is being initialized ), but if decide to initialize b I’m still getting 2. Also, node that the class I’m parsing does not have a base class, and I’m providing the correct includes as the tool run fine and exits returning 0. I’m a bit lost here, any comments are welcome. I could upload on gist the current tool source, cmake and sample code I’m using to test it if someone is willing to try.

I’ve also tried to dump the ast and verify it but the initalizer list is not showing up so I don’t know what might be happening.

Cheers,

Victor

2012/8/17 Manuel Klimek <klimek@google.com>

The AST for the constructor looks like this:

  CXXConstructorDecl <A.cc:2:1-3:1>
    NestedNameSpecifier <2:1-2> A::
      RecordType <2:1>
    DeclarationName <2:4> A
    FunctionProtoType <-6>
      BuiltinType void <>
    CXXCtorInitializer <2:10-13>
      FieldDeclRef <2:10> A::x
      ImplicitCastExpr <2:12>
        IntegerLiteral <2:12> 0
    CXXCtorInitializer <2:4>
      FieldDeclRef <2:4> A::b
    CompoundStmt <2:15-3:1>

Reversing the order of x and b just reverses the order of the two
CXXCtorInitializer in the AST.

So it looks like clang always creates implicit initializers for
members that aren't explicitly initialized, but it has no way of
flagging them as implicit. I don't know the expected way of
determining which are implicit, maybe checking for a NULL getInit().

CXXCtorInitializer::isWritten "Returns true if this initializer is
explicitly written in the source code."

-- James

Thanks, I missed that. I see RecursiveASTVisitor is already checking
that, but only to skip over getInit(). Would it make sense for RAV to
skip over the CXXCtorInitializer completely unless
shouldVisitImplicitCode() is true?

But why the source location for the first initializer is completely messed up? I’ve just tried to dump the construct xml using clang -dump-ast-xml, and that’s what I’ve got:

(CompoundStmt 0x4c666d8 )

for the folowing constructor:

Plane::Plane (const Vector3& normal, float d) : d(d) {
this->normal.set(normal).nor();
}

Somehow I’m not being able to recover the initializer list, nor the correct SourceLocation for it.

Victor

2012/8/19 Philip Craig <philipjcraig@gmail.com>

But why the source location for the first initializer is completely messed
up?

Because that's the implicit initializer. It can't have a real source
location, so it points to the constructor name instead. (Maybe this is
a clang bug and it should use an invalid source location instead, I
don't know.) So the suggestion is to skip over that one by testing
isWritten().

I've just tried to dump the construct xml using clang -dump-ast-xml, and
that's what I've got:

That uses DumpXML.cpp, which only has a stub for CXXCtorInitializer,
so it won't show them. I'm using a tool based on RecursiveASTVisitor
to dump it instead.

Try clang -ast-dump instead. It does show all of the
CXXCtorInitializer (in a partially pretty-printed form). And you can
see by changing the member order that the initializer order also
changes.

Hello,

thanks for everyone that have helped me to sort out the problems. Finally I’ve managed to make it work correctly to all cases (at least the ones I’ve found), and I’m glad to share it here if someone is interested:

https://github.com/scooterman/clang-tools

There are some minor things that could improve the tool. It would be desirable to not change all constructors, but as while parsing I’m receiving the initializers on the declared order already this would require me to get the original source as string, strip all spaces and newlines, and compare the resulting string with the newly ordered one. I don’t know if its worth to do that anyway.

Now the next one I’m thinking to implement is a tool to automatically add default values to variables.

Thanks again!

2012/8/19 Philip Craig <philipjcraig@gmail.com>