bitwise ops on booleans

Hi Language Lawyers!

In PR23827 ( https://llvm.org/bugs/show_bug.cgi?id=23827 ), a bitwise op on booleans is considered equivalent to a logical op:

if ((x < 3) & (y > 10))

effectively becomes:

if ((x < 3) && (y > 10))

where x and y are of type ‘int’. The second statement (&&) requires short-circuit evaluation to bypass the y comparison when the x comparison is false (creates an extra branch vs. the original code).

From N4296 -
4.5 Integral Promotions:
“A prvalue of type bool can be converted to a prvalue of type int, with false becoming zero and true becoming one.”

5.11 Bitwise AND operator:
“The usual arithmetic conversions are performed; the result is the bitwise AND function of the operands.”

Is the type promotion optional (“can be”)? Under what conditions would the promotion be performed?

Assuming the transform is correct, what is the recommended way to write this in C/C++ to achieve the desired effect: we want both comparisons to be evaluated (do not want short-circuiting)?

FWIW, the IR coming out of clang looks like what I expect: the i1 types are zexted and fed into an ‘and i32’. It’s the IR and backend optimizations that are surprising me.

Why do you want that? As long as the evaluation of x and y is
side-effect free, the transform is valid and how the backend is lowering
is doesn't matter. Keep in mind that materializing the 0/1 bits is often
as expensive as doing a short-circuite jump.

Joerg

For performance. The statement that materializing the condition bits is as
expensive as a branch is dependent on the arch, uarch, and data set. We're
currently only using arch for the DAG transform. So for example, x86
globally changes all of these while PPC does not. SimplifyCFG doesn't even
consider arch and does the inverse transform.

The programmer may have more knowledge about the predictability of a
specific branch and would like a way to specify that to the compiler.

Hi Language Lawyers!

In PR23827 ( 23827 – bad generated code for conditionals connected by (&)
<23827 – bad generated code for conditionals connected by (&);
), a bitwise op on booleans is considered equivalent to a logical op:

  if ((x < 3) & (y > 10))

effectively becomes:

  if ((x < 3) && (y > 10))

where x and y are of type 'int'. The second statement (&&) requires
short-circuit evaluation to bypass the y comparison when the x comparison
is false (creates an extra branch vs. the original code).

From N4296 -
4.5 Integral Promotions:
"A prvalue of type bool can be converted to a prvalue of type int, with
false becoming zero and true becoming one."

5.11 Bitwise AND operator:
"The usual arithmetic conversions are performed; the result is the bitwise
AND function of the operands."

Is the type promotion optional ("can be")? Under what conditions would the
promotion be performed?

"can be" here just means "this is one of the available conversions".

Assuming the transform is correct, what is the recommended way to write
this in C/C++ to achieve the desired effect: we want both comparisons to be
evaluated (do *not* want short-circuiting)?

This all falls under the as-if rule; there is nothing the user can do that
will *require* the compiler to do either thing.

-- Sean Silva

I don’t have any problem teaching codegen to use very specific information to weigh the costs of an additional branch. But that needs to be done in codegen, as it is rare on x86 at this point that a well predicted branch is more expensive than anything, and given the quality of branch predictors on x86, it is also relatively rare that branches are so unpredictable.

Again, we need to support both sides (lots of architectures outside of x86 with weaker branch prediction!) but it should be a detailed arch decision of how to lower this kind of code.

We could introduce an unpredictabie annotation to LLVM if there is sufficient demand for this. Specifically, this is different from a 50/50 probability, as it implies a lack of pattern which the processor can exploit to predict the behavior. This doesn’t seem like a bad construct for LLVM to have, although it seems like an expensive one to add and so I would want to see a lot of really compelling evidence that we have to let programmers make this annotation.

However, I’m really strongly opposed to tying this in any way to the bitwise and operator. That is madness. What about masking bits signifies anything about predictability? What about all the cases where I’m not using and but I still have unpredictable branches that should be avoided if possible?

If you want to add support to clang for this, you’d be much better off proposing something like __builtin_expect which instead annotates a lack of predictability. I’m much more willing to tolerate code that looks like:

if (UNPREDICTABLE(x < 3) && UNPREDICTABLE(y > 7)) { … }

That is, if we actually must support this. I’m still extremely skeptical of the entire value proposition here.

From: "Chandler Carruth" <chandlerc@google.com>
To: "Sanjay Patel" <spatel@rotateright.com>, "Clang" <cfe-dev@cs.uiuc.edu>, llvmdev@cs.uiuc.edu,
joerg@britannica.bec.de
Sent: Friday, June 26, 2015 8:55:22 PM
Subject: Re: [LLVMdev] [cfe-dev] bitwise ops on booleans

> Assuming the transform is correct, what is the recommended way to
> write
> this in C/C++ to achieve the desired effect: we want both
> comparisons to be
> evaluated (do *not* want short-circuiting)?

Why do you want that?

For performance. The statement that materializing the condition bits
is as expensive as a branch is dependent on the arch, uarch, and
data set. We're currently only using arch for the DAG transform. So
for example, x86 globally changes all of these while PPC does not.
SimplifyCFG doesn't even consider arch and does the inverse
transform.

I don't have any problem teaching codegen to use very specific
information to weigh the costs of an additional branch. But that
needs to be done in codegen, as it is rare on x86 at this point that
a well predicted branch is more expensive than anything, and given
the quality of branch predictors on x86, it is also relatively rare
that branches are so unpredictable.

Again, we need to support both sides (lots of architectures outside
of x86 with weaker branch prediction!) but it should be a detailed
arch decision of how to lower this kind of code.

The programmer may have more knowledge about the predictability of a
specific branch and would like a way to specify that to the
compiler.

We could introduce an *unpredictabie* annotation to LLVM if there is
sufficient demand for this. Specifically, this is different from a
50/50 probability, as it implies a lack of pattern which the
processor can exploit to predict the behavior. This doesn't seem
like a bad construct for LLVM to have, although it seems like an
expensive one to add and so I would want to see a lot of really
compelling evidence that we *have* to let programmers make this
annotation.

However, I'm really strongly opposed to tying this in any way to the
bitwise and operator. That is madness. What about masking bits
signifies anything about predictability? What about all the cases
where I'm not using and but I still have unpredictable branches that
should be avoided if possible?

I am also strongly opposed to this. Amusingly enough, I was taking part in a meeting over the last few days where, among other things, experienced developers from several labs were providing feedback on compiler development priorities for future systems. Their top general request was that we try harder to make sure that different ways of writing the same thing were optimized in equivalent ways. This seems like a nice example of the kind of thing to which they were referring. Implicit hints like this are a) not portable b) often not documented c) not stable across compiler upgrades and d) not obvious to readers of the code. We should try to fix these kinds of things when we find them, and not introduce them on purpose.

If you want to add support to *clang* for this, you'd be much better
off proposing something like __builtin_expect which instead
annotates a lack of predictability. I'm much more willing to
tolerate code that looks like:

+1

-Hal

Ok, I understand better why the ‘bitwise and’ hack looks insane to you guys. :slight_smile:
It would take a change in the C language spec to happen. Since that’s practically impossible, it actually makes this mission a little clearer, if not easier.

There’s widespread misunderstanding:
http://stackoverflow.com/questions/4014535/differences-in-boolean-operators-vs-and-vs
http://stackoverflow.com/questions/1279217/difference-between-and-or-and-for-comparison

I’ve personally used the & vs. && hack for optimization purposes on many occasions over the years, so I wonder if C optimizers have only recently (last 5 years or so?) gotten smart enough to transform those bitwise expressions back into logical.

In any case, I’ll come up with some perf data to better illustrate the motivation.

Thanks!

I agree that it's rare, but it's common enough that it makes some of our
very knowledgeable and perf-starved customers furious when LLVM
unilaterally declares that it knows best and turns all bit-logic into
branches.

At the least, I think this optimization needs a chicken switch because
there's no way we can know in advance that adding a branch will beat a
compound predicate in all cases. I attached a test program and some results
to PR23827. Feedback and more data points are certainly appreciated.

Also, we should be careful here: there's no provision for branch prediction
in the x86 arch. That's purely a micro-arch concept for x86 AFAIK. We don't
know what micro-arches will look like N years from now...even if they're
all excellent at prediction today.

From: "Sanjay Patel" <spatel@rotateright.com>
To: "Hal Finkel" <hfinkel@anl.gov>
Cc: "Chandler Carruth" <chandlerc@google.com>, "Clang" <cfe-dev@cs.uiuc.edu>, llvmdev@cs.uiuc.edu,
joerg@britannica.bec.de
Sent: Monday, June 29, 2015 6:26:53 PM
Subject: Re: [LLVMdev] [cfe-dev] bitwise ops on booleans

> From: "Chandler Carruth" < chandlerc@google.com >
> I don't have any problem teaching codegen to use very specific
> information to weigh the costs of an additional branch. But that
> needs to be done in codegen, as it is rare on x86 at this point
> that
> a well predicted branch is more expensive than anything, and given
> the quality of branch predictors on x86, it is also relatively rare
> that branches are so unpredictable.

I agree that it's rare, but it's common enough that it makes some of
our very knowledgeable and perf-starved customers furious when LLVM
unilaterally declares that it knows best and turns all bit-logic
into branches.

At the least, I think this optimization needs a chicken switch

Sure, at least to make benchmarking here easier.

because there's no way we can know in advance that adding a branch
will beat a compound predicate in all cases. I attached a test
program and some results to PR23827. Feedback and more data points
are certainly appreciated.

If using the bitwise ops are faster sometimes, I think we need a better cost model here. When the expressions don't have side effects, we should use the fastest method regardless of whether & or && was typed. I find it difficult to believe that we cannot come up with a reasonable cost model for evaluating (a && b), branches vs. bitops, if we try.

-Hal