[analyzer] moving alpha.IdenticalExpression to core

Hello Clang developers!

We wanted to see if alpha.IdenticlExpression can be moved to core. It has been in StaticAnalyzer for a while. We have run alpha.IdenticalExpression on 451 projects in debian and evaluated the results.

Identical expression found 127 warnings - 117 true positives and 10 of them seem to be false positives. All 10 false positives are however due to code is hidden by preprocessor (macros and ifdefs) so we can’t do anything about them. For example:

#ifdef clang

#define MACRO(X)

#endif

….

If (x > 10) {

MACRO(10);

} else {

MACRO(1000);

}

=> 2.c:9:5: warning: true and false branches are identical

There was a warning that is not “perfect”. Reduced:

if (x==10 || x == 20 || x == 10)

2.c:3:24: warning: identical expressions on both sides of logical operator

if (x==10 || x == 20 || x == 10)


Maybe it could be clarified sometime.

Our conclusion is that this checker is ready for core.

Do you have some opinions?

Best regards,

Daniel Marjamäki

Hello,

I had a look at a few of this checker's warnings and i generally liked it. I think it's worth working on. I've seen worse false positives than that, but i'd gladly welcome fixes.

This checker is a simple syntactic check that finds nice typos and copy-paste errors such as "a || a" or "a == a" or "if (a) { b } else { b }". The checker is working through a mechanism that is very similar to CloneDetector: it tries to "understand" (a-la CloneDetector's "hash") the AST of both sides/branches and see if those "hashes" are identical. This may lead us into trying to re-use CloneDetection framework instead of re-doing the same thing. (+Raphael)

I've seen false positives because of AST statement kinds that he doesn't "understand", in particular in C++ he wasn't clever enough to discriminate between constexprs such as "std::is_same<A, B>::value || std::is_same<A, C>::value". It would be very important to ensure the checker understands most of the stuff in C++ and Objective-C, so i guess we cannot turn it on by default until those are fixed.

I've got an alternative approach in mind though: instead of traversing the AST manually, make the checker compare results of Stmt::printPretty() (as strings) for both sides/branches. This might be super simple and reliable and enough for this checker; side effects of expressions, of course, would still need to be considered (as in Expr::hasSideEffects()). I think it should be the preferred way to go, unless unexpected problems show up.

The example with macros can be suppressed by adding some sort of "((void)0);" into one of the branches; i can imagine it being annoying on some projects, but i guess it's not super bad.

I agree that the "not perfect" error message you've found is worth improving, which should be easy.

Thanks for the response.

I will not do the necessary fixes right now but will put this in our todo list for now so we can look at this later.

CloneDetector: it tries to "understand" (a-la CloneDetector's "hash") the AST of both sides/branches and see if those "hashes" are identical.

This may lead us into trying to re-use CloneDetection framework instead of re-doing the same thing. (+Raphael)

As far as I know, the CloneDetector has lots of false positives.

I've seen false positives because of AST statement kinds that he doesn't "understand", in particular in C++ he wasn't clever enough to discriminate between constexprs such as "std::is_same<A, B>::value || std::is_same<A, C>::value". It would be very important to ensure the checker understands most of the stuff in C++ and Objective-C, so i guess we cannot turn it on by default until those are fixed.

Yes I agree that is very important.

I've got an alternative approach in mind though: instead of traversing the AST manually, make the checker compare results of

Stmt::printPretty() (as strings) for both sides/branches.

That has advantages. Maybe that would allow that we warn about identical code when macros are used. As far as I know the checker bails out when it see that macros are used so this won't generate a warning:

    If (a==HOURS_PER_DAY || a==HOURS_PER_DAY)

However that is how the Cppcheck checker worked before and we changed it to AST. The Cppcheck stringification is not rock-solid, details (indentation, macronames, etc) are lost. For this checker the indentation should be ignored but other details are important.

I agree that the "not perfect" error message you've found is worth improving, which should be easy.

Yes.

Best regards,
Daniel Marjamäki

Thanks for the response.

I will not do the necessary fixes right now but will put this in our todo list for now so we can look at this later.

CloneDetector: it tries to "understand" (a-la CloneDetector's "hash") the AST of both sides/branches and see if those "hashes" are identical.

This may lead us into trying to re-use CloneDetection framework instead of re-doing the same thing. (+Raphael)

It'd be great if we can centralize the implementation and reuse as much as possible the CloneDetection infrastructure. IIRC, Raphael has opened the API recently for clients as we were investigating whether we can reuse it in some other parts in clang.

As far as I know, the CloneDetector has lots of false positives.

I'd leave to Raphael to give details about that :wink: