[clang-tidy] Some possible contributions

Hi Piotr,

I think that it would be great to add your checks to clang-tidy.
I've had a quick look at colobot-lint and, IMO, it would be nice to add
some other checks, e.g. NakedNew, NakedDelete, ImplicitBoolCast may be of
interest.

BTW, does your InconsistentDeclarationParameterNameRule handles unused
parameters (i.e. those that have names in the declaration, but are marked
/*unused*/ or something along these lines in the definition)?

Regards,
Marek Kurdej

---------- Wiadomość przekazana dalej ----------

Hello,

Thanks for all replies so far. I’m glad that you agree that my code would be useful. I’ll post some patches for review sometime during this weekend.

I also got a private reply from Eugene Zelenko, who pointed me to his feature request for localizing variables: . This feature request would be partly implemented by my check for old C-style functions. To fully implement the proposed behavior, I would need to extend my code with FixIt hints to enable automated refactoring. Since that would require quite a bit of work, I would propose to first commit only my basic check for finding such functions and once I find enough time, I would add support for refactoring. Regarding NakedNew and NakedDelete, I wasn’t sure if there were similar checks already implemented or proposed for implementation. I think that if I should post a patch here, I would join these two checks to single check called for instance ManualMemoryManagement. As for ImplicitBoolCast, I think it’s also worth to consider. It’s very useful in case of such unfortunate “features” of C++ as implicit bool → float casts, which caused quite a bit of problems in Colobot project at one point. However, this check also restricts pointer → bool casts, so it would find issue with such code: void foo(int* ptr) { if (ptr) // warning: implicit int* → bool cast { /* … */ } } I personally don’t like such style, so I added this restriction, but there is nothing really wrong in such code. I guess it’s more a matter of taste, so perhaps this conversion can be enabled/disabled with appropriate config option. No, it does not. Unused (unnamed) parameters are currently ignored in comparison. I wanted to check for such parameters which names are commented out, but unless I’m missing something, there is no easy way right now to match parameter declaration to a comment next to it, as we have access to comments only during the preprocessing phase. Of course, I can sort of “hack” it, by getting character buffer from SourceManager and scanning characters of source code, but it seems a very crude way to handle this. (Well, I was forced to use such mechanism for my BlockPlacement check, but that’s another topic.) Best regards, Piotr Dziwinski

I posted these two patches for review:

  • inconsistent declaration parameter name: - old style function: Should I also add reviewers to these reviews? Best regards, Piotr Dziwinski

Yes, normally you should look who has committed recently and add them as reviewers. I’ve added alexfh (Alexander Kornienko), cc’ed, who works on clang-tidy.

niedz., 30 sie 2015, 12:49 Piotr Dziwinski użytkownik <piotrdz@gmail.com> napisał:

Hi Piotr,

Nice that you have some clang-tidy checks to contribute. I’m glad to review the patches. Please assign clang-tidy reviews to me and put cfe-commits to the “Subscribers” field.

On a side note: it’s convenient to be able to see more context in the diffs. Depending on your setup, you could use diff -U10000, svn diff --diff-cmd diff -x -U10000, git diff -U10000 or a similar option, or use the arcanist tool, which I personally find convenient.

Hello again!

Unfortunately, I did not have as much time as I thought I had, so my work on porting my checks from colobot-lint to clang-tidy came to a standstill.

However, I did not forget about it, and I had some time recently to work on these patches:

As I mentioned previously, my check for implicit bool casts, may not be suited to some coding styles. Personally, I like to avoid implicit bool casts whenever possible, as I find code that relies on them more difficult to read. Sometimes this even leads to hidden bugs, or if not bugs, then at least inconsitent code.

I would like to hear, what you think about that. Should some of these styles, e.g. relying on pointer to bool conversion in “if (pointer) {}”, be allowed? Maybe I should add config option to enable/disable my check in such cases?

As regards old-style function check, or more broadly: localizing variables, (http://reviews.llvm.org/D12473), it turned out to be more difficult than I thought at first, and still have not made much progress. However, in the meantime, working on other pieces of code, I have become more familiar with Clang codebase, so I think it will be easier for me now. This is what I’ll focus on from now on.

Best regards,
Piotr Dziwinski

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <561A524F.209@gmail.com>,
    Piotr Dziwinski via cfe-dev <cfe-dev@lists.llvm.org> writes:

As I mentioned previously, my check for implicit bool casts, may not be
suited to some coding styles. Personally, I like to avoid implicit bool
casts whenever possible, as I find code that relies on them more
difficult to read. Sometimes this even leads to hidden bugs, or if not
bugs, then at least inconsitent code.

Yes, I would like this check. I agree that sometimes implicit cast
to bool reveals problems in the code. I think if you check for a few
common idioms that you can avoid too many false positives.

I would like to hear, what you think about that. Should some of these
styles, e.g. relying on pointer to bool conversion in "if (pointer) {}",
be allowed? Maybe I should add config option to enable/disable my check
in such cases?

I did some snooping of these situations in my simplify-bool-expr check.

In my check, I simplified

  if (p) { return true; } else { return false; }

to

  return p != nullptr;

This turned the implicit cast to bool into an explicit comparison
resulting in a bool.

The only idioms that I think you need to allow for are:

  if (p) x; else y;
  if (!p) x; else y;
  p ? e : f;
  !p ? e : f;

where p is some pointer. Maybe there are some other common idioms
that exploit implicit conversion to bool, but that's all I could think
of right now.

That list is pretty small, so it shouldn't be hard to handle. I do
recommend the idea of adding a configuration option to switch the
behavior to warn even in these cases. Some people do prefer the more
explicit check against nullptr.

As regards old-style function check, or more broadly: localizing
variables, (http://reviews.llvm.org/D12473), it turned out to be more
difficult than I thought at first, and still have not made much
progress.

Totally agree. If I could think of an easy way to implement it, I
would have tried it by now :).

However, in the meantime, working on other pieces of code, I
have become more familiar with Clang codebase, so I think it will be
easier for me now. This is what I'll focus on from now on.

The implicit-conversion-to-bool check might be a good stepping stone
in working with the clang-tidy infrastructure. I would like to see
this check. Although it's easier for you to code, I'll be honest and
say that I'd get more value out of the localizing variables check. If
that one could be highly reliable, I'd even have an automated process
that applied this on nightly builds every time the code changed.

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

As you wish.

I would like to hear, what you think about that. Should some of these
styles, e.g. relying on pointer to bool conversion in "if (pointer) {}",
be allowed? Maybe I should add config option to enable/disable my check
in such cases?

I did some snooping of these situations in my simplify-bool-expr check.

In my check, I simplified

   if (p) { return true; } else { return false; }

to

   return p != nullptr;

This turned the implicit cast to bool into an explicit comparison
resulting in a bool.

The only idioms that I think you need to allow for are:

   if (p) x; else y;
   if (!p) x; else y;
   p ? e : f;
   !p ? e : f;

where p is some pointer. Maybe there are some other common idioms
that exploit implicit conversion to bool, but that's all I could think
of right now.

That list is pretty small, so it shouldn't be hard to handle. I do
recommend the idea of adding a configuration option to switch the
behavior to warn even in these cases. Some people do prefer the more
explicit check against nullptr.

Yes, I was thinking along the same lines. It seems that these are the most common cases, where people rely on evaluating pointer as boolean. I'll provide a config option and conditionally exclude these cases.

As regards old-style function check, or more broadly: localizing
variables, (http://reviews.llvm.org/D12473), it turned out to be more
difficult than I thought at first, and still have not made much
progress.

Totally agree. If I could think of an easy way to implement it, I
would have tried it by now :).

I'm working on it now and I think I see how to approach this. I decided on writing something like a context-sensitive RecursiveASTVisitor, where I can keep track of variable declarations and their uses in different scopes. At the end, I just go over the collected data, and propose replacements as necessary. Sound complicated but it should do the job.

However, in the meantime, working on other pieces of code, I
have become more familiar with Clang codebase, so I think it will be
easier for me now. This is what I'll focus on from now on.

The implicit-conversion-to-bool check might be a good stepping stone
in working with the clang-tidy infrastructure. I would like to see
this check. Although it's easier for you to code, I'll be honest and
say that I'd get more value out of the localizing variables check. If
that one could be highly reliable, I'd even have an automated process
that applied this on nightly builds every time the code changed.

Thanks, it's nice to hear that my work is being appreciated. I think I will be able to make that check reliable enough for production use. Once I finish it, I'll test it myself by running it on Colobot codebase (100k+ LOC). This should prove to be a suitable field test for it.

Best regards,
Piotr Dziwinski

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <562004D4.20009@gmail.com>,
    Piotr Dziwinski via cfe-dev <cfe-dev@lists.llvm.org> writes:

> [Please reply *only* to the list and do not include my email directly
> in the To: or Cc: of your reply; otherwise I will not see your reply.
> Thanks.]
As you wish.

Sorry about that, but ever since they changed the mailing list software,
I've been losing replies to my posts if they aren't sent to me through
the mailing list. I try to set the Reply-To, but the mailing list
software keeps deciding it knows better and changes it.

Yes, I was thinking along the same lines. It seems that these are the
most common cases, where people rely on evaluating pointer as boolean.

Those are so idiomatic and prevalent throughout the community that I
think warning on those would be more irksome than helpful. However,
for all those cases where my int was implicitly converted to bool and
vice-versa, I think that would be really useful because in my experience
those most often represent coding mistakes.

[localizing variables]

I'm working on it now and I think I see how to approach this. I decided
on writing something like a context-sensitive RecursiveASTVisitor, where
I can keep track of variable declarations and their uses in different
scopes. At the end, I just go over the collected data, and propose
replacements as necessary. Sound complicated but it should do the job.

Yeah, I was thinking that it would need to be it's own visitor as well.
I was thinking of how I could do it in the clang-tidy tooling framework
and it just wasn't coming together in my head. I think the approach
you've outlined is a good one.

Thanks, it's nice to hear that my work is being appreciated. I think I
will be able to make that check reliable enough for production use. Once
I finish it, I'll test it myself by running it on Colobot codebase
(100k+ LOC). This should prove to be a suitable field test for it.

Oh, it is very much appreciated! Not only your contributions but
everyone who contributes to clang and associated tools is very much
appreciated.

If you want a real torture case for localizing variables you can
try iterated dynamics, my fork of the open source fractint fractal
renderer. It has a CMake based build and unfortunately too many
long functions with all the variables declared at the top C style.
<https://github.com/LegalizeAdulthood/iterated-dynamics>

Yes, and it's coming along nicely. I already have a sort-of working version on my github fork:
https://github.com/piotrdz/clang-tools-extra/blob/master/clang-tidy/readability/LocalizingVariablesCheck.cpp

It still needs work on corner cases and some hard cases like switch statements, which I've left out for now. And of course more tests. I will try to cover as much as possible in unit tests, but it's just impossible to test everything. The real verification will be running it on some production code. I will look at the project you gave as example, too.

Best regards,
Piotr Dziwinski

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <5626C26F.4040009@gmail.com>,
    Piotr Dziwinski via cfe-dev <cfe-dev@lists.llvm.org> writes:

It still needs work on corner cases and some hard cases like switch
statements, which I've left out for now. And of course more tests. I
will try to cover as much as possible in unit tests, but it's just
impossible to test everything. The real verification will be running it
on some production code. I will look at the project you gave as example,
too.

Areas where I have found refactoring tools to fail are with "exotic"
types like pointer to function, pointer to member data and pointer to
member function. Sometimes things freak out with unions, too.

I looked at your test file on github and it's already covering the
idea of multiple variables declared in a single statement where one of
them is localized later in the scope, so that looks good.

Hello,

I want to let you know that I’m still here and still working on clang-tidy checker for localizing variables.
Sorry for the long delay, the time flies so fast.

Anyway, I have some good and some bad news.

The good news is that I got my checker to work correctly when it comes to handling nested scopes and control statements, including all the corner cases that I predicted.
I even made provisions for things like introducing localized for loop counters and inserting braces where necessary in switch-case blocks.

The bad news is that I ran into two issues that I couldn’t resolve on my own, both of which are critical for the checker to be useful in real production code.

Issue #1. Handling multiple variables declared in one declaration statement.

The case of moving a single declaration is easy enough - I can just cut the whole declaration as written in program source and paste it somewhere else.
But if I have to cut out only selected parts of a multiple declaration and move them, I need some way of parsing the declaration to separate parts of the statement.
Let’s take for example:

long int a, *b, c[10];

I want to obtain the necessary source locations from AST so that I can parse this declaration into four component strings:

  • type identifier: “long int”
  • first variable declaration: “a”
  • second variable declaration: “*b”
  • third variable declaration: “c[10]”

Once I have that, it will be easy to independently cut out code between identified source locations and create insertions by putting together the strings from source code.

I tried different methods of achieving that but given the interface of DeclStmt and Decl, I cannot find a way to do that reliably for all possible cases.
I can obtain source locations for subsequent variable declarations, but these always point to beginning of variable name as it appears in code.
What I need is actual source location separating “long int” from “a” and further locations of separating commas between declarations.
Theoretically, I could use Lexer to find comma tokens, but there are also other contexts where comma may appear, such list of arguments of function pointer.

Issue #2. Merging multiple FixIt hint insertions in the same location.

For example, given:

int a; int b;
// …
call(a, b); // first use of “a” and “b”

My checker would generate two diagnostics:

  • “variable “a” declaration can be moved closer to its point of use” with the following FixIt hint:

int b;
// …
int a; call(a, b);

  • “variable “b” declaration can be moved closer to its point of use” with the following FixIt hint:

int a;
// …
int b; call(a, b);

The end result of applying these FixIt hints is:

// …
int b; call(a, b);

So the removals are handled correctly, but in case of insertions, the second one overrides the first one. What I want is to have both insertions present, one after the other.

So far, I have worked around this problem by clumping together such problematic FixIt hints to one of the diagnostics, so that they are applied all at once.
However, this can be quite confusing for the user who would see seemingly unrelated FixIt hints under a diagnostic referring to only one variable.
Needless to say, this also makes the code much more complicated than it needs to be as I have to store information first and emit diagnostics later.

I don’t know how to resolve this, as this is a more a problem with applying FixIt hints in general.
Perhaps this is actually the intended behavior in some cases. Even so, I would like to have a way to address my use case, without impacting other use cases.

I would appreciate any pointers of how to tackle these two issues.
I’m afraid that without solving them, my checker will remain in “always-beta” state.

Meanwhile, if you are interested in current version of the code, you can check out my Github fork:
https://github.com/piotrdz/clang-tools-extra

Best regards,
Piotr Dziwinski

Hello,

I want to let you know that I’m still here and still working on clang-tidy checker for localizing variables.
Sorry for the long delay, the time flies so fast.

Anyway, I have some good and some bad news.

The good news is that I got my checker to work correctly when it comes to handling nested scopes and control statements, including all the corner cases that I predicted.
I even made provisions for things like introducing localized for loop counters and inserting braces where necessary in switch-case blocks.

The bad news is that I ran into two issues that I couldn’t resolve on my own, both of which are critical for the checker to be useful in real production code.

Issue #1. Handling multiple variables declared in one declaration statement.

The case of moving a single declaration is easy enough - I can just cut the whole declaration as written in program source and paste it somewhere else.
But if I have to cut out only selected parts of a multiple declaration and move them, I need some way of parsing the declaration to separate parts of the statement.
Let’s take for example:

long int a, *b, c[10];

I want to obtain the necessary source locations from AST so that I can parse this declaration into four component strings:

  • type identifier: “long int”
  • first variable declaration: “a”
  • second variable declaration: “*b”
  • third variable declaration: “c[10]”

Once I have that, it will be easy to independently cut out code between identified source locations and create insertions by putting together the strings from source code.

I tried different methods of achieving that but given the interface of DeclStmt and Decl, I cannot find a way to do that reliably for all possible cases.
I can obtain source locations for subsequent variable declarations, but these always point to beginning of variable name as it appears in code.
What I need is actual source location separating “long int” from “a” and further locations of separating commas between declarations.
Theoretically, I could use Lexer to find comma tokens, but there are also other contexts where comma may appear, such list of arguments of function pointer.

I believe you’ll need to whip out the lexer for this currently.

Issue #2. Merging multiple FixIt hint insertions in the same location.

For example, given:

int a; int b;
// …
call(a, b); // first use of “a” and “b”

My checker would generate two diagnostics:

  • “variable “a” declaration can be moved closer to its point of use” with the following FixIt hint:

int b;
// …
int a; call(a, b);

  • “variable “b” declaration can be moved closer to its point of use” with the following FixIt hint:

int a;
// …
int b; call(a, b);

The end result of applying these FixIt hints is:

// …
int b; call(a, b);

So the removals are handled correctly, but in case of insertions, the second one overrides the first one. What I want is to have both insertions present, one after the other.

So far, I have worked around this problem by clumping together such problematic FixIt hints to one of the diagnostics, so that they are applied all at once.
However, this can be quite confusing for the user who would see seemingly unrelated FixIt hints under a diagnostic referring to only one variable.
Needless to say, this also makes the code much more complicated than it needs to be as I have to store information first and emit diagnostics later.

I don’t know how to resolve this, as this is a more a problem with applying FixIt hints in general.
Perhaps this is actually the intended behavior in some cases. Even so, I would like to have a way to address my use case, without impacting other use cases.

I think that has bitten us at some point, too - if this is still a problem, I’d consider it a bug in clang-tidy’s fixit handling.

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <CAOsfVvk=ZYrJF+N9R=SqyT-zzVZMguxKQur88X9=-Bs3kWUJ-Q@mail.gmail.com>,
    Manuel Klimek via cfe-dev <cfe-dev@lists.llvm.org> writes:

> long int a, *b, c[10];
>
> I want to obtain the necessary source locations from AST so that I can
> parse this declaration into four component strings:
> - type identifier: "long int"
> - first variable declaration: "a"
> - second variable declaration: "*b"
> - third variable declaration: "c[10]"

I believe you'll need to whip out the lexer for this currently.

IMO, the proper place to fix this is in the AST itself. I would like
to see the AST node should provide an iterator API that allows you to
walk over all the identifiers (SourceLocation begin/end for each
identifier) in a single declaration and obtain their actual types:

a -> long int
b -> long int *
c -> long int[10]

This is really important information for working with declarations and
should be available from the tooling infrastructure.

At one point I tried to look into how declarations were parsed, but
not being that familiar with the parser code, I couldn't quickly find
where such information could be stashed off for latery querying. It
seems to me that during parsing we must already have this information,
but it isn't directly encoded in the resulting AST.

Well, in a way, we do have this information, as part of QualType and Type components of AST which I can access for these declarations.
This means that I can generate the necessary insertions, like "long int *b;", using QualType::getAsString() (*) and this is what I actually do in the current version of the checker.

But the crux of the problem is the correct generation of removals in FixIt hints.
An example would be when I need to move only the declaration of "b" from this multiple declaration statement, leaving declarations of "a" and "c" where they are.
This is where I need to know the locations where the cuts are to be made.
Knowing the actual type from QualType may help in that, but I still need some help from either the AST or Lexer to extract these locations.

(*) - note this doesn't always work as we want. From what I understand, this string is meant to be displayed as a prettified version of type name in compiler diagnostic,
not to regenerate the actual code as it was typed in the program.
Also, this doesn't help in case of "exotic" types such as arrays or function pointers, as I would have to insert variable name somewhere in the middle of it.

Ideally, I would like a mechanism to somehow parse and use the original code as written in the source, cutting out parts of it and moving them where necessary, keeping all information intact.

Best regards,
Piotr Dziwinski

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <566F436A.2080902@gmail.com>,
    Piotr Dziwinski via cfe-dev <cfe-dev@lists.llvm.org> writes:

> IMO, the proper place to fix this is in the AST itself. I would like
> to see the AST node should provide an iterator API that allows you to
> walk over all the identifiers (SourceLocation begin/end for each
> identifier) in a single declaration and obtain their actual types:
>
> a -> long int
> b -> long int *
> c -> long int[10]
> [...]

Well, in a way, we do have this information, as part of QualType and
Type components of AST which I can access for these declarations.

Great!

But the crux of the problem is the correct generation of removals in
FixIt hints.
[...]
This is where I need to know the locations where the cuts are to be made.

Yes, this is what I tried to find in the parser before, but I was
unsuccessful being unfamiliar with the parser code.

Hi Piotr,

regarding your issue #1: I have already implemented a transformer which performs exactly this, including a possible initializer. I can provide you tomorrow with a simple snippet guiding you in the right direction. Basically, I am matching for DeclStmt and iterating through its DeclGroup. The group members are VarDecl as you can see by using “clang-check -ast-dump file.c --”. Once you have the VarDecl there is any type and initialzer information you need. Unfortunately, this works only for init-declarator lists, which are defined in a scope and not globally (a DeclStmt/Statement just exists in a scope). See my post: http://lists.llvm.org/pipermail/cfe-dev/2015-November/046262.html

I am going to debug this. Meanwhile, I think instead of matching for DeclStmt and iterating through the DeclGroup, looking for all VarDecl’s with the same line location should work too.

I am going to debug this. Meanwhile, I think instead of matching for DeclStmt and iterating through the DeclGroup, looking for all VarDecl’s with the same line location should work too.

Hmm, this won’t work, since you can spread the statement over multiple lines. Anyway, I will prepare the Group-Example for you.

Thanks, I could use your code. My checker only handles local variables declared in functions so that parser or transformer doesn’t have to work for globals necessarily.

Best regards,
Piotr Dziwinski

[Please reply *only* to the list and do not include my email directly
in the To: or Cc: of your reply; otherwise I will not see your reply.
Thanks.]

In article <56708712.6050402@gmail.com>,
    Piotr Dziwinski via cfe-dev <cfe-dev@lists.llvm.org> writes:

Thanks, I could use your code. My checker only handles local variables
declared in functions so that parser or transformer doesn't have to work
for globals necessarily.

I remember now that I was looking into this for a check that would
split a multiple variable declaration into multiple single variable
declarations.

This would simplify your job of localizing variable declarations if
you could depend on each variable declaration being a single variable
declaration instead of a multiple variable declaration.

With the suggestion made by Firat Kasmis, I should be able to write
this check now, which would handle globals and struct/class fields.

Thoughts?