discriminating explicit boolean expressions from implicit boolean expressions in the AST

When analyzing ternary operator and if statement conditional expressions,
I'm trying to determine whether or not the expression is directly a boolean
expression or an expression that is implicitly convertible to bool.

Example:

  int i = 1;
  return (i > 10) ? true : false;

I have a clang-tidy check that transforms this to:

  int i = 1;
  return i > 10;

That works great when the conditional expression is an explicit boolean
expression. When the conditional expression is implicitly converted
to bool, then things get a little more interesting. Here is some actual
code from llvm/lib/Support/APFloat.cpp b/lib/Support/APFloat.cpp:

  subtract ^= (sign ^ rhs.sign) ? true : false;

'sign' and 'rhs.sign' are both one bit fields of type 'unsigned int'.
'subtract' is a function parameter of type bool. The type of the
conditional expression is implicitly converted to bool. The type of
the right hand side of the xor-assign expression is bool.

With the simplistic replacement strategy shown above, this becomes:

  subtract ^= (sign ^ rhs.sign);

Now the type of the right hand side of the xor-assign expression is
unsigned int.

How can I identify these implicit boolean statements? When I enqure
the type of the conditional expression in these cases by using
Expr::getType(), they both tell me that they are bool.

When analyzing ternary operator and if statement conditional expressions,
I'm trying to determine whether or not the expression is directly a boolean
expression or an expression that is implicitly convertible to bool.

Example:

        int i = 1;
        return (i > 10) ? true : false;

I have a clang-tidy check that transforms this to:

        int i = 1;
        return i > 10;

That works great when the conditional expression is an explicit boolean
expression. When the conditional expression is implicitly converted
to bool, then things get a little more interesting. Here is some actual
code from llvm/lib/Support/APFloat.cpp b/lib/Support/APFloat.cpp:

  subtract ^= (sign ^ rhs.sign) ? true : false;

'sign' and 'rhs.sign' are both one bit fields of type 'unsigned int'.
'subtract' is a function parameter of type bool. The type of the
conditional expression is implicitly converted to bool. The type of
the right hand side of the xor-assign expression is bool.

With the simplistic replacement strategy shown above, this becomes:

  subtract ^= (sign ^ rhs.sign);

Now the type of the right hand side of the xor-assign expression is
unsigned int.

How can I identify these implicit boolean statements? When I enqure
the type of the conditional expression in these cases by using
Expr::getType(), they both tell me that they are bool.

If you ast-dump (or is it dump-ast? I forget) the clang invocation, you
should see the AST that builds these and I believe you'll see that the RHS
is an ImplicitConversionExpr or something like that, which is what's hiding
the underlying type. There are utility functions for stripping implicit
casts and parentheses (utilities are probably in ASTContext).

- David

In article <CAENS6EvT8Z0Y5Vn8bHh6XKHiBa8G1JSFLo3EL===wXExDVFKWA@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> [...]
> How can I identify these implicit boolean statements? When I enqure
> the type of the conditional expression in these cases by using
> Expr::getType(), they both tell me that they are bool.

If you ast-dump (or is it dump-ast? I forget) the clang invocation, you
should see the AST that builds these and I believe you'll see that the RHS
is an ImplicitConversionExpr or something like that, which is what's hiding
the underlying type. There are utility functions for stripping implicit
casts and parentheses (utilities are probably in ASTContext).

Thanks David, I've prototyped this and it works pretty well. I want
to make sure I cover all the different ways that an implicit bool
conversion can take place. The ones I can think of were the
arithmetic expressions that are implicitly converted to bool by
comparing them to zero (integral types, pointers, floating-point
types) and objects (class, struct, union) that provide an operator
bool() that does the conversion. For the former I am replacing
'if (e) return true; else return false;' with 'return e != 0;
and for the latter I am replacing with 'return bool(e);'.

Am I missing anything?

In article <CAENS6EvT8Z0Y5Vn8bHh6XKHiBa8G1JSFLo3EL===
wXExDVFKWA@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

>
> > [...]
> > How can I identify these implicit boolean statements? When I enqure
> > the type of the conditional expression in these cases by using
> > Expr::getType(), they both tell me that they are bool.
>
> If you ast-dump (or is it dump-ast? I forget) the clang invocation, you
> should see the AST that builds these and I believe you'll see that the
RHS
> is an ImplicitConversionExpr or something like that, which is what's
hiding
> the underlying type. There are utility functions for stripping implicit
> casts and parentheses (utilities are probably in ASTContext).

Thanks David, I've prototyped this and it works pretty well. I want
to make sure I cover all the different ways that an implicit bool
conversion can take place. The ones I can think of were the
arithmetic expressions that are implicitly converted to bool by
comparing them to zero (integral types, pointers, floating-point
types) and objects (class, struct, union) that provide an operator
bool() that does the conversion. For the former I am replacing
'if (e) return true; else return false;' with 'return e != 0;

Maybe special case chars for "return e != '\0';" ? hard to tell if it's
chars being used as bytes or chars being used as textual characters,
though... - and hard to tell what the common codebase convention is as to
whether "return e;" is better than "return e != 0;" etc... - I'd be sort of
inclined to just use the expression directly if it's valid (eg: for builtin
types and implicit conversion operators).

and for the latter I am replacing with 'return bool(e);'.

If the type's conversion operator is implicit, it might be fine to "return
e;"
If the type's conversion operator is explicit, it might be more suitable to
use static_cast<bool>. There is "return !!e;" but that's lame... otherwise
there's probably more explicit type-specific ways to test the object, but
you won't be able to suggest those automatically. (eg: "return o !=
nullptr;" for a unique_ptr, etc)

In article <CAENS6EvLVQkc-_0Ajogg0QAvvEMCca8wX8N9prft891FEO37mg@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> Am I missing anything?

Maybe special case chars for "return e != '\0';" ? hard to tell if it's
chars being used as bytes or chars being used as textual characters,
though... - and hard to tell what the common codebase convention is as to
whether "return e;" is better than "return e != 0;" etc... - I'd be sort of
inclined to just use the expression directly if it's valid (eg: for builtin
types and implicit conversion operators).

When I looked this up in the standard, it has language that says it is
as if you had written 'bool(e)'.

Personally I never liked the "c != '\0'" style because '\0' is just a
more verbose way of writing 0.

If the type's conversion operator is implicit, it might be fine to "return
e;"

Well, for the case where it's a return value it's not as big a deal
because the implicit conversion will be done for you, but when it's a
ternary expression hidden inside another expression, then it can
subtly change the type of the expression and in that case you really
want the explicit comparison to the appropriate kind of zero or an
explicit conversion to bool.

If the type's conversion operator is explicit, it might be more suitable to
use static_cast<bool>.

I like the idea of checking to see if the conversion operator is
implicit or explicit. I will incorporate that.

As to whether it should be static_cast<bool>(e) or bool(e), I was
using the latter because that's the wording in the standard. However,
it's my understanding that the two are equivalent in the case of bool.

There is "return !!e;" but that's lame...

Yeah, that's my least favorite and I also agree it is lame :).

otherwise
there's probably more explicit type-specific ways to test the object, but
you won't be able to suggest those automatically. (eg: "return o !=
nullptr;" for a unique_ptr, etc)

AFAIK, all the smart pointer classes have bool conversion operators.

In article <
CAENS6EvLVQkc-_0Ajogg0QAvvEMCca8wX8N9prft891FEO37mg@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> > Am I missing anything?
>
> Maybe special case chars for "return e != '\0';" ? hard to tell if it's
> chars being used as bytes or chars being used as textual characters,
> though... - and hard to tell what the common codebase convention is as to
> whether "return e;" is better than "return e != 0;" etc... - I'd be sort
of
> inclined to just use the expression directly if it's valid (eg: for
builtin
> types and implicit conversion operators).

When I looked this up in the standard, it has language that says it is
as if you had written 'bool(e)'.

Personally I never liked the "c != '\0'" style because '\0' is just a
more verbose way of writing 0.

> If the type's conversion operator is implicit, it might be fine to
"return
> e;"

Well, for the case where it's a return value it's not as big a deal
because the implicit conversion will be done for you, but when it's a
ternary expression hidden inside another expression, then it can
subtly change the type of the expression and in that case you really
want the explicit comparison to the appropriate kind of zero or an
explicit conversion to bool.

> If the type's conversion operator is explicit, it might be more suitable
to
> use static_cast<bool>.

I like the idea of checking to see if the conversion operator is
implicit or explicit. I will incorporate that.

As to whether it should be static_cast<bool>(e) or bool(e), I was
using the latter because that's the wording in the standard. However,
it's my understanding that the two are equivalent in the case of bool.

>There is "return !!e;" but that's lame...

Yeah, that's my least favorite and I also agree it is lame :).

> otherwise
> there's probably more explicit type-specific ways to test the object, but
> you won't be able to suggest those automatically. (eg: "return o !=
> nullptr;" for a unique_ptr, etc)

AFAIK, all the smart pointer classes have bool conversion operators.

They do, but they're explicit and my point was "return bool(sp);" is less
good than "return sp != nullptr;" - so without knowing the domain-specific
bool test it's probably hard to pick/suggest the best fix.

(std::experimental::optional is different, for example, it's comparable to
"nullopt" not nullptr)

In article <CAENS6EsAsF0NYiZkb2x_cbLm2HWH_m73Jd5rkAGGNkZnCLv1tw@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> AFAIK, all the smart pointer classes have bool conversion operators.
>

They do, but they're explicit and my point was "return bool(sp);" is less
good than "return sp != nullptr;" - so without knowing the domain-specific
bool test it's probably hard to pick/suggest the best fix.

I think this is where the wording in the standard allows you to write:

if (sp) { ... } else { ... }

without having to write an explicit conversion to bool. I didn't look
up the language of how the ternary expression interprets the
conditional expression. I will do that and see what it says.

In article <E1YbBnb-00023F-6K@shell.xmission.com>,
    Richard <legalize@xmission.com> writes:

I think this is where the wording in the standard allows you to write:

if (sp) { ... } else { ... }

without having to write an explicit conversion to bool. I didn't look
up the language of how the ternary expression interprets the
conditional expression. I will do that and see what it says.

As I read the standard, the same interpretation is applied to
conditions of if statements as is applied to the condition in a
ternary operator:

4. [conv]

    "Certain language constructs require that an expression be converted
    to a Boolean value. An expression e appearing in such a context is
    said to be contextually converted to bool and is well-formed if and
    only if the declaration bool t(e); is well-formed, for some invented
    temporary variable t (8.5)."

5.16 [expr.cond]

    "The first expression is contextually converted to bool (Clause 4)."

6.4 [stmt.select]

    "The value of a condition that is an initialized declaration in
    a statement other than a switch statement is the value of the
    declared variable contextually converted to bool (Clause 4)."

So I would be in step with the standard if I replaced all expressions
e with bool(e), but I'd like to avoid the unnecessary conversion where
possible.

In article <E1YbBnb-00023F-6K@shell.xmission.com>,
    Richard <legalize@xmission.com> writes:

> I think this is where the wording in the standard allows you to write:
>
> if (sp) { ... } else { ... }
>
> without having to write an explicit conversion to bool. I didn't look
> up the language of how the ternary expression interprets the
> conditional expression. I will do that and see what it says.

As I read the standard, the same interpretation is applied to
conditions of if statements as is applied to the condition in a
ternary operator:

4. [conv]

    "Certain language constructs require that an expression be converted
    to a Boolean value. An expression e appearing in such a context is
    said to be contextually converted to bool and is well-formed if and
    only if the declaration bool t(e); is well-formed, for some invented
    temporary variable t (8.5)."

5.16 [expr.cond]

    "The first expression is contextually converted to bool (Clause 4)."

6.4 [stmt.select]

    "The value of a condition that is an initialized declaration in
    a statement other than a switch statement is the value of the
    declared variable contextually converted to bool (Clause 4)."

So I would be in step with the standard if I replaced all expressions
e with bool(e), but I'd like to avoid the unnecessary conversion where
possible.

Right - sorry, the issue I'm trying to address isn't that bool(e) isn't
valid for all the cases where you're doing this conversion (it is) but it's
also valid for a bunch of other cases & that can make code harder to read -
it's the usual reason for using C++ style casts (that can only do a few
specific things each) rather than the C style cast which can do many
things, sometimes not what you want. Yes, in this case, it'd do the same
thing, but when the developer comes back to read the code next
week/year/etc or refactors the code (changing the type of the variable),
using the least powerful, most legible, mechanism will help them avoid some
potential traps.

Does that make sense?

- David

In article <CAENS6EsAsF0NYiZkb2x_cbLm2HWH_m73Jd5rkAGGNkZnCLv1tw@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> AFAIK, all the smart pointer classes have bool conversion operators.
>

They do, but they're explicit and my point was "return bool(sp);" is less
good than "return sp != nullptr;" - so without knowing the domain-specific
bool test it's probably hard to pick/suggest the best fix.

Exactly.

It is hard to know the best idiom without coding in a giant table that
has to be updated all the time.

In the case of a return statement, you don't need to do anything.

If they wrote 'if (sp != nullptr) return true; else return false;',
then the tool would replace this with 'return sp != nullptr;'.

If they wrote 'if (sp) return true; else return false;', the tool
would replace this with 'return sp;'.

In both cases, the tool assumes that you wrote one or the other based
on your preferences of what is readable and it shouldn't be changed.

It's only in the case of a ternary operator embedded within a larger
expression that you have to do something. I don't want the tool to subtly
change the type of the expression from bool to something else that is
implicitly convertible to bool but might be interpreted as something
other than bool in the larger context.

However, in the case of a ternary operation, something like:

  foo ^= (p ? true : false);

changing that to:

  foo ^= p;

can introduce errors depending on the type of foo and p. In these
cases, at the very least the tool should do:

  foo ^= static_cast<bool>(p);

in order to leave the interpretation of the expression unchanged.

At a later date, more heuristics can be added so that the expression
is something more idiomatic than static_cast, but that isn't an issue
of correctness, it's an issue of style and hence subject to all kinds
of varying opinions.

At this point I am only concerned about correctness.

In article <
CAENS6EsAsF0NYiZkb2x_cbLm2HWH_m73Jd5rkAGGNkZnCLv1tw@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> > AFAIK, all the smart pointer classes have bool conversion operators.
> >
>
> They do, but they're explicit and my point was "return bool(sp);" is less
> good than "return sp != nullptr;" - so without knowing the
domain-specific
> bool test it's probably hard to pick/suggest the best fix.

Exactly.

It is hard to know the best idiom without coding in a giant table that
has to be updated all the time.

In the case of a return statement, you don't need to do anything.

If they wrote 'if (sp != nullptr) return true; else return false;',
then the tool would replace this with 'return sp != nullptr;'.

If they wrote 'if (sp) return true; else return false;', the tool
would replace this with 'return sp;'.

This ^ transformation is most likely not valid for any modern user defined
type (because they'll have an explicit operator bool, not an implicit one):

$ cat sp.cpp
#include <memory>

bool f1(std::unique_ptr<int> u) {
  if (u)
    return true;
  return false;
}

bool f2(std::unique_ptr<int> u) {
  return u;
}
$ clang++-tot sp.cpp -fsyntax-only -std=c++11
sp.cpp:10:10: error: no viable conversion from 'std::unique_ptr<int>' to
'bool'
  return u;
         ^
1 error generated.

In both cases, the tool assumes that you wrote one or the other based
on your preferences of what is readable and it shouldn't be changed.

It's only in the case of a ternary operator embedded within a larger
expression that you have to do something. I don't want the tool to subtly
change the type of the expression from bool to something else that is
implicitly convertible to bool but might be interpreted as something
other than bool in the larger context.

However, in the case of a ternary operation, something like:

        foo ^= (p ? true : false);

changing that to:

        foo ^= p;

can introduce errors depending on the type of foo and p. In these
cases, at the very least the tool should do:

        foo ^= static_cast<bool>(p);

in order to leave the interpretation of the expression unchanged.

At a later date, more heuristics can be added so that the expression
is something more idiomatic than static_cast, but that isn't an issue
of correctness, it's an issue of style and hence subject to all kinds
of varying opinions.

At this point I am only concerned about correctness.

Sure, but your transformation is stylistic in nature - so it's not just
important to preserve semantics, but also to improve readability. Perhaps
it's worth not offering a fixit hint in these cases, until good quality
ones can be provided? I'm not sure.

- David

In article <CAENS6EuerX1Fvf0NEo7fYvn3pR1RJHQxahGA+-C8ztZkgRi5yQ@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> In the case of a return statement, you don't need to do anything.
>
> If they wrote 'if (sp != nullptr) return true; else return false;',
> then the tool would replace this with 'return sp != nullptr;'.
>
> If they wrote 'if (sp) return true; else return false;', the tool
> would replace this with 'return sp;'.
>

This ^ transformation is most likely not valid for any modern user defined
type (because they'll have an explicit operator bool, not an implicit one):

$ cat sp.cpp
#include <memory>

bool f1(std::unique_ptr<int> u) {
  if (u)
    return true;
  return false;
}

bool f2(std::unique_ptr<int> u) {
  return u;
}
$ clang++-tot sp.cpp -fsyntax-only -std=c++11
sp.cpp:10:10: error: no viable conversion from 'std::unique_ptr<int>' to
'bool'
  return u;
         ^
1 error generated.

Good point. So I guess I need to handle these cases and it looks like
I can't avoid some type introspection in order to figure out what to
do.

> At this point I am only concerned about correctness.

Sure, but your transformation is stylistic in nature - so it's not just
important to preserve semantics, but also to improve readability. Perhaps
it's worth not offering a fixit hint in these cases, until good quality
ones can be provided? I'm not sure.

I think some type introspection can be done to deal with builtin types
and standard library types.

Yes, the transformation is stylistic, but it is also human invoked.
The user has asked for this transformation to be performed, so I'm ok
with transforming it in such a way that it leaves correctness intact
albeit possibly with an ugly static_cast<> lurking in there if I can't
figure out something better. (Personally I don't know why people are
against the 'functional cast' bool(e), particularly when that's the
wording in the standard.)

In article <
CAENS6EuerX1Fvf0NEo7fYvn3pR1RJHQxahGA+-C8ztZkgRi5yQ@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> > In the case of a return statement, you don't need to do anything.
> >
> > If they wrote 'if (sp != nullptr) return true; else return false;',
> > then the tool would replace this with 'return sp != nullptr;'.
> >
> > If they wrote 'if (sp) return true; else return false;', the tool
> > would replace this with 'return sp;'.
> >
>
> This ^ transformation is most likely not valid for any modern user
defined
> type (because they'll have an explicit operator bool, not an implicit
one):
>
> $ cat sp.cpp
> #include <memory>
>
> bool f1(std::unique_ptr<int> u) {
> if (u)
> return true;
> return false;
> }
>
> bool f2(std::unique_ptr<int> u) {
> return u;
> }
> $ clang++-tot sp.cpp -fsyntax-only -std=c++11
> sp.cpp:10:10: error: no viable conversion from 'std::unique_ptr<int>' to
> 'bool'
> return u;
> ^
> 1 error generated.

Good point. So I guess I need to handle these cases and it looks like
I can't avoid some type introspection in order to figure out what to
do.

> > At this point I am only concerned about correctness.
>
> Sure, but your transformation is stylistic in nature - so it's not just
> important to preserve semantics, but also to improve readability. Perhaps
> it's worth not offering a fixit hint in these cases, until good quality
> ones can be provided? I'm not sure.

I think some type introspection can be done to deal with builtin types
and standard library types.

Yes, the transformation is stylistic, but it is also human invoked.

clang-tidy isn't necessarily used for transformation in particular - I
think of it more as a linter tool: suggesting problems, sometimes being
able to provide likely solutions. So I wouldn't necessarily worry about
providing fixit hints in all cases, especially as the fixit gets harder to
provide with high quality, etc.

The user has asked for this transformation to be performed, so I'm ok
with transforming it in such a way that it leaves correctness intact
albeit possibly with an ugly static_cast<> lurking in there if I can't
figure out something better.

I don't fundamentally object to a fixit hint including static_cast<bool>()
or bool(), I'm just not sure it's necessarily better than not suggesting
one & leaving it up to the user in cases where it's not obvious *shrug*

(Personally I don't know why people are
against the 'functional cast' bool(e), particularly when that's the
wording in the standard.)

I tend to treat c-style casts with more care because I know they can do
more problematic transformations (I suppose when the target type is bool
this isn't an issue? I'm not sure - I'd have to go back & read the spec
about the sequence of conversions c-style casts can perform)

- David

In article <CAENS6Eu+=d2Wafd_psrFMECWaOxHUEvLwRNg31jP5dbDSFDozA@mail.gmail.com>,
    David Blaikie <dblaikie@gmail.com> writes:

> Yes, the transformation is stylistic, but it is also human invoked.
>

clang-tidy isn't necessarily used for transformation in particular - I
think of it more as a linter tool: suggesting problems, sometimes being
able to provide likely solutions.

IMO, clang-tidy wouldn't be interesting at all if it didn't provide
the transformations. YMMV.

I don't fundamentally object to a fixit hint including static_cast<bool>()
or bool(), I'm just not sure it's necessarily better than not suggesting
one & leaving it up to the user in cases where it's not obvious *shrug*

In the larger context, the user decides to accept the change or not.
In the short run, to me that means source control and rejecting or
accepting the changes proposed by the tool, possibly after manually
adjusting them.

In the long run, I'd like to provide options based on feedback from
using the tool.

However, no matter what, the tool should never propose a change that
results in changing the meaning of the code.

> (Personally I don't know why people are
> against the 'functional cast' bool(e), particularly when that's the
> wording in the standard.)
>

I tend to treat c-style casts with more care because I know they can do
more problematic transformations (I suppose when the target type is bool
this isn't an issue? I'm not sure - I'd have to go back & read the spec
about the sequence of conversions c-style casts can perform)

Hrm. Looking deeper into the standard, I see that 5.2.3 Explicit type
conversion (functional notation) [expr.type.conv] says that it treats
'bool(e)' as '(bool) e' when e is a single expression. This is a
little disappointing because the 'bool(e)' syntax was introduced in
C++ and I was under the impression that it was more like
static_cast<bool>(e), but apparently not. So I guess I should be
substituting static_cast<bool>(e) in ternary expressions when the type
of e is not directly a bool.