[PATCH] -Wconversion-null

Hello,

the attached patch adds option -Wconversion-null . It is pretty much the same
like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35669 .

The purpose of the option can be seen in this testcase:

#include <stdlib.h>
void foo( int a )
    {
    }
int main()
    {
    foo( NULL );
    int a = NULL;
    int b = 1;
    short c = b + 1;
    (void)a;
    (void)c;
    return 0;
    }

$ clang++ -Wconversion null.cpp -c -fno-caret-diagnostics
null.cpp:9:10: warning: implicit conversion of NULL constant to integer
[-Wconversion]
null.cpp:10:13: warning: implicit conversion of NULL constant to integer
[-Wconversion]
null.cpp:12:15: warning: implicit conversion loses integer precision: 'int'
to 'short' [-Wconversion]

There are two obviously incorrect uses of NULL in this testcase. This is
currently warned about only with -Wconversion, which however also triggers
other conversion warnings that are not obviously incorrect. There are
probably no realistic usage scenarios where a conversion from NULL to integer
would be intended, but e.g. short<->int conversions may be wanted for example
for space saving reasons and casts that would silence those warnings may not
be deemed worth the decreased readability. In short, the benefits
of -Wconversion may be questionable depending on the codebase, but incorrect
usage of NULL should not (and it may not be as obvious as in this testcase).

This can be solved by introduction of -Wconversion-null, which only warns
about most probably incorrect usage of NULL, and is enabled by -Wconversion
or -Wall.

The attached patch should implement the change. This is my first clang
contribution, so I don't know if there is something more necessary. I also
couldn't find if/where options are documented.

clang-conversion-null.patch (4.02 KB)

Am I missing the point entirely? If I do this:

#undef NULL
#define NULL 0

The code you presented is valid C++ and the proposed warnings are all
misplaced. If NULL is defined as

  (void*)0;

or somesuch, warnings should occur anyway on architectures that have
bigger pointers than ints.

--jkl

the attached patch adds option -Wconversion-null . It is pretty much
the same like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35669 .

Am I missing the point entirely? If I do this:

#undef NULL
#define NULL 0

I believe/assume that would be non-conforming.

The code you presented is valid C++ and the proposed warnings are all
misplaced. If NULL is defined as

Actually if you do what you just did, the warnings won't trigger
because they're powered by the magic of how NULL is defined in the
glibc headers.

   \(void\*\)0;

or somesuch, warnings should occur anyway on architectures that have
bigger pointers than ints.

I believe that's entirely conforming - the size of the integer is not
important. All zero integer literals (including but not limited to
'\0' and false) are valid null pointer literals. We already have a
warning for 'false' as a null pointer literal, but we don't have a
warning for 0 as a null pointer literal (in part because it's /really/
common in existing code, so any such warning would probably be off by
default).

- David

> the attached patch adds option -Wconversion-null . It is pretty much
> the same like http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35669 .

Am I missing the point entirely? If I do this:

#undef NULL
#define NULL 0

The code you presented is valid C++ and the proposed warnings are all
misplaced.

It is valid C++, but that is why it is a warning and not an error. All
warnings are about code which is technically valid, yet is (more or less)
likely a mistake.

If NULL is defined as

  (void*)0;

or somesuch, warnings should occur anyway on architectures that have
bigger pointers than ints.

This, on the other hand, is invalid C++.

No, I can't agree. The purpose of a warning is to indicate when a
particular operation or construction might be unsafe. A warning about
something that cannot possibly be an error (as far as the machine is
concerned) is just noise.

You are diving here into the semantics of the name "NULL", and it's
hopeless. If I say

  #define NOPTR NULL
or
  typedef NULL NOPTR;

And then use NOPTR instead, have I not circumvented your warning and
produced identical machine code? Is the resulting source code any
"(more or less) likely a mistake"?

As the standard doesn't define NULL, the user is free to define it at
will. Since the compiler can't know what "NULL" means to the human,
it can't offer any advice about it.

--jkl

No, I can't agree. The purpose of a warning is to indicate when a
particular operation or construction might be unsafe. A warning about
something that cannot possibly be an error (as far as the machine is
concerned) is just noise.

Clang has a lot of noise then. Just one random example:

$clang++ a.cpp
a.cpp:6:16: warning: '&&' within '||' [-Wlogical-op-parentheses]
    if( a == 1 && b == 2 || c == 3 )
        ~~~~~~~^~~~~~~~~ ~~
a.cpp:6:16: note: place parentheses around the '&&' expression to silence this
warning
    if( a == 1 && b == 2 || c == 3 )
               ^
        ( )
1 warning generated.

As the standard doesn't define NULL

I think it actually does, to a certain extent, but just turn the warning off
if you like to be special.

> #undef NULL
> #define NULL 0
>
> The code you presented is valid C++ and the proposed warnings are
> all misplaced.

It is valid C++, but that is why it is a warning and not an error.
All warnings are about code which is technically valid, yet is (more
or less) likely a mistake.

No, I can't agree. The purpose of a warning is to indicate when a
particular operation or construction might be unsafe. A warning about
something that cannot possibly be an error (as far as the machine is
concerned) is just noise.

If by "error" you mean undefined behavior, then that doesn't seem to
be the philosophy that Clang (or any other compiler that I know of)
has taken in its warning (or even error) design. The vast majority of
warnings in compilers are for code with well defined but surprising
semantics.

You are diving here into the semantics of the name "NULL", and it's
hopeless. If I say

   \#define NOPTR NULL

or
typedef NULL NOPTR;

And then use NOPTR instead, have I not circumvented your warning and
produced identical machine code?

The first example would not circumvent the warning (since it is based
on the magic __null implementation detail that NULL is defined as -
any indirection to it will still trigger the warning).
The second example would not compile in the presence of an existing
definition of NULL, and would not provide the same semantics as
standard NULL (since it's a type, not a value) in the absence fo the
standard NULL.

Is the resulting source code any "(more or less) likely a mistake"?

Nope - in the functioning (first) example I'd say it's no less likely
a mistake - NULL is defined in C++ as follows:

18.2[support.types]p3

"The macro NULL is an implementation-defined C++ null pointer constant
in this International Standard
(4.10). foot 193"

(where foot 193 (non-normatively) says: "Possible definitions include 0
and 0L, but not (void*)0.")

Indeed, in C++11 it's even conforming to have NULL defined by the
implementation as nullptr - so any use of NULL as anything other than
a null pointer literal is at best non-portable, and at most represents
confusion/a mistake by the author.

Consider:

int *i;
...
*i = NULL;

Perhaps the user didn't intend to dereference the pointer?
Effectively, with good warnings, we should be able to make NULL as
typesafe as the new C++11 hotness, nullptr which is a proper type safe
null pointer.

As the standard doesn't define NULL,

The C++ standard does define NULL.

the user is free to define it at will.

That doesn't seem to be the case, per 17.6.4.3.1[macro.names]p1: "A
translation unit that includes a standard library header shall not
#define or #undef names declared in any standard library header."

Since the compiler can't know what "NULL" means to the human,
it can't offer any advice about it.

For what it's worth, GCC has been providing this advice for a couple
of major versions now.

- David

"Saying a true thing twice doesn't make it truer."

Babysitting good code is really, really easy. It also distracts from
harder and more important work, never mind those of us who want to
use it as a *tool*, not an instructional video. And it's not good for
the community.

I you squint really hard at the gcc archives, you'll find my name
raising an objection to including that warning in -Wall, which many
projects use as the "gold standard".

Why do people think Boolean operations are hard? Why don't
they get their knickers in a twist over !, too? When I looked into it
back then, I could find only two kinds of languages: those that define
logical precedence and those that don't. Of those that do, all bind AND
before OR.

The code above is absolutely clear. If you don't know && comes
before ||, and has for the last 25 years, learn!

That's the real poverty of that warning: instead of allowing clear
code that uses the language correctly to instruct the less
knowledgeable, it discourages such code and such learning. In
assuming incompetence it begets incompetence. If we keep it up, we'll
eventually arrive at

  ADD 1 to X GIVING Y

just as Grace Hopper always wanted.

When I see extra parentheses for Boolean operations, I treat the code
with skepticism. Why? Either something tricky is going on, or *not*
but the programmer was unsure. In such a place do bugs lurk of the
kind no compiler can warn.

--jkl

> No, I can't agree. The purpose of a warning is to indicate when a
> particular operation or construction might be unsafe. A warning
> about something that cannot possibly be an error (as far as the
> machine is concerned) is just noise.

Clang has a lot of noise then. Just one random example:

$clang++ a.cpp
a.cpp:6:16: warning: '&&' within '||' [-Wlogical-op-parentheses]
if( a == 1 && b == 2 || c == 3 )
~~~~~~~^~~~~~~~~ ~~
a.cpp:6:16: note: place parentheses around the '&&' expression to
silence this warning
if( a == 1 && b == 2 || c == 3 )
^
( )
1 warning generated.

   &quot;Saying a true thing twice doesn&#39;t make it truer\.&quot;

Babysitting good code is really, really easy. It also distracts from
harder and more important work, never mind those of us who want to
use it as a *tool*, not an instructional video.

You're welcome to disable the warnings - if that's your philosophy
you'll probably want to disable everything & then opt-in to warnings
that meet your bar.

And it's not good for the community.

That's not so clear to me.

I you squint really hard at the gcc archives, you'll find my name
raising an objection to including that warning in -Wall, which many
projects use as the "gold standard".

Personally I find the original assignment -Wparentheses warning a bit
curious - certainly once it was adopted as a common convention it
seems like a useful construct to make the intent more obvious (to both
other developers and the compiler) it's helpful, but it is a fairly
arbitrary convention.

The bool parentheses warning is a bit different - not so much an
arbitrary convention, but just a recommendation to make the precedence
more clear.

Why do people think Boolean operations are hard?

Hard enough, apparently.

Why don't they get their knickers in a twist over !, too?

You seem to be misunderstanding the approach here. Rather than "people
complain about not understanding a language feature" => "compiler
engineers implement a warning for that feature" it's really more like
"people keep making the same mistakes that cost companies real money
to fix those bugs" => "companies invest in tools that save them money
by avoiding that class of bugs".

When I looked into it
back then, I could find only two kinds of languages: those that define
logical precedence and those that don't. Of those that do, all bind AND
before OR.

The code above is absolutely clear. If you don't know && comes
before ||, and has for the last 25 years, learn!

It's not a matter of learning. People know this but when writing a lot
of code they eventually make mistakes. Even the best of us make these
mistakes - only a couple of days ago, see r152551 and r152559. I
really doubt I would've caught that on code review if it weren't for
the build break it caused.

That's the real poverty of that warning: instead of allowing clear
code that uses the language correctly to instruct the less
knowledgeable, it discourages such code and such learning.

If you think it's cheaper to educate people than provide the compiler
warning (& incur the cost of developers writing all those extra
parentheses (or using NULL rather than 0)) then perhaps you can profit
by selling people such education & saving them money.

Another point that crosses my mind: your approach seems to imply that
any language feature that made it to standardization is "right" & any
use of it in any form is no more or less of a good idea. The standards
committee/process really isn't that infallible.

- David

Why do people think Boolean operations are hard? Why don't
they get their knickers in a twist over !, too? When I looked into it
back then, I could find only two kinds of languages: those that define
logical precedence and those that don't. Of those that do, all bind AND
before OR.

The only reason that this is the case is because and and or are equivalent to multiplication and plus in Boolean algebra. If you don't know this fact (or don't think about it), there really is no rationale for why it ought to be one way or the other. When I'm coding, I generally don't get thought to the underlying ring theory of computers, so it's not immediately evident to me why this ought to be the case. Face it, most people don't memorize the precedence charts and instantly plan for minimizing parenthesis usage, especially when you consider that precedence does vary between languages and not all programmers stick to one language their entire career.

Indeed, I would suppose most people have the following model of the operators, in decreasing order of precedence:
Postfix unary operators and call, array, arrow, dot operators.
Prefix unary operators
"Arithmetic operators" (note that + - are lower than * / %, as we have had beaten into us since grade school)
comparison operators (==, >, etc.)
logical operators (&&, ||)
ternary
assignment

Most programmers probably can't deduce the correct paranthesizations of the following operations:
a << b + c (although a << b | c is more likely to be guessed correctly)
flag & mask == mask (unless you've been bitten by this constantly before. Note that, e.g., python has a different answer than C/C++)
a & b ^ c | d (even if you rationalize why & has higher precedence than

, where does ^ go?)

*x.*ptr

The most logical reasoning for some of these instructions (the second and fourth in particular) turn out to be wrong (&/== and the different precedence of .*/->* and ./->), while the others don't have a strong, intuitive argument for any precedence over the other one. I should be able to look at code and not have to reason about the formal underlying algebraic model to figure out what the intent is, and I would rather have a compiler that is too quick to accuse my operations of not being what I intended than one which would stand by and let me commit code (and pass review) because the language's syntax is not intuitive.

Finally, note that language syntax can have irreparable bugs in this regard: the most famous example is PHP's incorrect associativity of ?: which makes a ? b : c ? d : e equal to (a ? b : c) ? d : e instead of the arguably more natural a ? b : (c ? d : e). So it may be correct according to the language specification, but that doesn't mean it is correct according to intuition.

I should be
able to look at code and not have to reason about the formal
underlying algebraic model to figure out what the intent is

You don't really mean that, do you? What is programming except the use
of logical constructs, and how is it possible to program in any
language without know the meaning expressed through its syntax?

Regarding bit-wise operators, I didn't make that case. They're
relatively rarely used, and even Dennis Ritchie expressed some regret
over their precedence. Logical operators, on the other hand, appear in
all but the most trivial programs.

most people don't memorize the precedence charts and instantly
plan for minimizing parenthesis usage, especially when you consider
that precedence does vary between languages

I'm not prepared to make any claim about what most programmers do or
think. I have a hard time believing, per above, that much useful work
can be accomplished in C++ without knowing the precedence of C++
operators.

If you know a language that 1) defines operator precedence and 2)
defines OR before AND, I'd be interested to hear. I was unable to find
one. For that reason I reject the "multi-language confusion"
hypothesis.

AFAIK the question of bugs stemming from logical operator confusion
hasn't been carefully studied. Rather than assume I'm smarter than the
poor schlump using my compiler, I prefer to think that whatever I know,
anyone else can learn, too.

Note you're positing a very peculiar kind of ignorance: the
programmer who isn't sure of the precedence *and* decides not to use
parentheses to make the logic clear to *himself*. I would not presume
to try to help such a person, and I don't think Clang should, either.

--jkl

I should be
able to look at code and not have to reason about the formal
underlying algebraic model to figure out what the intent is

You don't really mean that, do you? What is programming except the use
of logical constructs, and how is it possible to program in any
language without know the meaning expressed through its syntax?

Regarding bit-wise operators, I didn't make that case. They're
relatively rarely used, and even Dennis Ritchie expressed some regret
over their precedence. Logical operators, on the other hand, appear in
all but the most trivial programs.

most people don't memorize the precedence charts and instantly
plan for minimizing parenthesis usage, especially when you consider
that precedence does vary between languages

I'm not prepared to make any claim about what most programmers do or
think. I have a hard time believing, per above, that much useful work
can be accomplished in C++ without knowing the precedence of C++
operators.

If you know a language that 1) defines operator precedence and 2)
defines OR before AND, I'd be interested to hear. I was unable to find
one. For that reason I reject the "multi-language confusion"
hypothesis.

AFAIK the question of bugs stemming from logical operator confusion
hasn't been carefully studied. Rather than assume I'm smarter than the
poor schlump using my compiler, I prefer to think that whatever I know,
anyone else can learn, too

Turning these warnings on in our codebase yielded a significant number of real bugs with very few false positives.

Note you're positing a very peculiar kind of ignorance: the
programmer who isn't sure of the precedence *and* decides not to use
parentheses to make the logic clear to *himself*. I would not presume
to try to help such a person, and I don't think Clang should, either.

They exist, and I've returned their bug reports with an explanation of the precedence rules. Clang helped them already.

Programmers make mistakes, even when they know the precedence rules perfectly. Code gets refactored, and bugs get introduced.

You are following a well-trodden path with your line of argument against Wparentheses and similar and I personally am not interested in going down it again. Real data from real codebases supports the hypothesis that these warnings do more good than harm, and no "programmers should know better" argument will change that.

  - Doug

Not such if you meant to CC the list or not, but I'll bring it back into the discussion.

I appreciate your effort to educate me. "Any fool can learn from his
own experience; a wise man learns from others'."

Turning these warnings on in our codebase yielded a significant
number of real bugs with very few false positives.

Would "very few" false positives be, say, less than 1%? Are you then
saying 99% of uses that lacked parenthesis did so incorrectly?
IOW most cases of

  a && b || c

should have been

  a && (b || c)

?

I would expect the ratio to be the reverse: 99% redundant parenthesis
and 1% error. Or so. But if you have hard data or could point me to
published results, I'm interested. I'm not so in love with my opinion
that I can't change it.

I can't give hard data from a proprietary code base. What I can say is that at least two major corporations vet Clang's warnings across their entire code bases when they go into Clang, and we tweak said warnings until the false positive rate is acceptable (or jettison the warning if it can't be made to be acceptable). We're strongly user-focused here, and we tune our diagnostics to what works out in the real world.

Of course, denominators matter. If the code base of which you're
speaking had previously had been reviewed and had redundant parenthesis
added to reduce the warning count to zero, and then was checked again at
some later date, I might expect a somewhat higher hit ratio. But then
we'd be excluding redundant parenthesis from the false-positive count.

At least one of the large codebases referenced above had never been run through a compiler with a "real" parentheses warning.

I also recognize the value of verification. I've seen what e.g.
Coverity can do, and I've read papers about the value of even partial
verification. I'm all for it. (As usual, Dijkstra was right.)

Verification is out of scope for the compiler proper, of course.

What I object to is the idea that "(a && b) || c" should become "best
practice", that we should expect "good code" to use extra parenthesis
just in case. That conditional logic should be checked, yes. That
what's been said already needs to be reinforced with extra parentheis,
no.

One could say that this ship has already sailed, given GCC's warnings about parentheses.

Unfortunately, as things stand, the only evidence that the code has
been checked is the presence of extra parenthesis. I can think of
better ways, as I'm sure you can, too. A database of verified tests,
for instance, would be more useful and not in the least harmful.

If there's a better way to find these issues within the constraints of the compiler, that would be very interesting.

They exist, and I've returned their bug reports with an explanation
of the precedence rules. Clang helped them already.

Interesting. So the warning didn't help them at all! They got the
logic wrong, wrote it wrong, tested it wrong (or not) and chose not to
use or not to mind the warning.

They had a simple test that would only have ended up wrong in a failure case. It was unlikely they would hit such a case in normal testing.

Instead, the warning helped *you* help them. It pointed out instances
of reliance on Boolean logic precedence. (That doesn't make the
warning is pointless. It does support my point that Clang alone can't
help the helpless.)

The warning pointed out an existing bug in the user's code. That the programmer assumed it was a compiler bug rather than his own misunderstanding of precedence rules proves nothing except that at least some programmers *haven't* fully internalized the precedence rules (or are dealing with code complex enough that they aren't clear).

Code gets refactored, and bugs get introduced.

Granted. Can you explain to me, though, how warning about parenthesis
protects against that? If "a && b || c" is correct, "(a && b) || c" is
correct. If not, not. If refactoring changes the correctness, either
form becomes equally wrong. I'm afraid I don't see any benefit.

Parts of expressions get copied around as conditions get more complicated, functions get manually "inlined", etc. The indentation may even make it look like one has the right precedence when it is, indeed, wrong.

I have no wish to argue only to hear the sound of my own voice (as it
were). I am trying to challenge what I see as the conventional wisdom
that Boolean logical operator precedence is problematic. I find it
hard to separate that line of thinking from smug egotism on the part of
those writing the warnings. And I see nothing at all cavalier or
unsafe in depending on the compiler or the *reader* to understand the
intent of Boolean tests in the code.

Programmers, even good ones, make mistakes, and one of the roles of the compiler is to catch those classes of mistakes it can and alert the user to the problem before it manifests. Conventional wisdom holds that Boolean logical operator precedence is problematic, and I've seen that borne out in the code bases I work with. On purely philosophical grounds, or if I viewed this warning solely as something that may or not help me as a programmer, I would probably tend to agree with you. But for me, data is king, and the data I've seen supports the warning for the vast majority of users.

  - Doug