Boolean floats and v4i1

Hello,

I'm working on support for the SIMD instruction set on our new BG/Q
supercomputer. This instruction set is v4f64 (with the exception of
some int <-> fp conversions, floating-point only). The vectorized
comparisons, logical operations and selects also exclusively use
floating-point inputs. For those inputs that are logically vectors of
booleans the system uses the following convention: positive numbers are
true, everything else (including NaNs) are false. The outputs of
logical operations are -1.0 and 1.0.

I am not sure how to best support this in LLVM. LLVM does not have
an MVT::v4i1. One thing that I can do (without modifying LLVM core) is
to add v4i64 to the vector registers, and pretend that the v4i1 is being
promoted to that type (I match loads and stores to pairs of memory
operations and fp<->int conversions). This works somewhat (CodeGen will
happily generate vectorized selects, comparisons and logical ops on the
comparison results), but leaves me with a broken v4i64 type (it is
broken because the operations defined on that type only essentially
respect the sign bit of the numbers -- so long as these are used only
for the promoted v4i1 operations almost everything is fine, but these
operations are not true v4i64 operations).

What should I do here? Should I add MVT::v?i1 types so that they can be
directly used without promotion?

Thanks again,
Hal

Hi Hal,

Why do say that the type v4i64 is broken ? You can specify that this type has no legal operations and the codegen will lower ("legalize") them to something that works on your platform.

Nadav

Hi Hal,

Why do say that the type v4i64 is broken ? You can specify that this
type has no legal operations and the codegen will lower ("legalize")
them to something that works on your platform.

For example, the AND operation is really only an AND operation on the
sign bits of the underlying floating-point numbers, it does not AND all
of the bits (and it always changes them so that the operation always
returns -1.0 or 1.0). But I need this AND to be used for the promoted
v4i1 values (so I need to mark it as legal and match it to the
associated vector logical operation).

Thanks again,
Hal

You could set the AND operation action to custom. The problem is that you would have no way of knowing if the type 'v4i64' originated from v4i1 or v4i64. And I don't think that you can use SimplifyDemandedBits (to discover if only the high bit is set) during the legalizer because the DAG is in a strange state, but I could be mistaken on this one.

Okay, here is another idea. There are several DAGCombine invocations, including one before the type-legalizer. You can define a target-specific DAGCombine optimization which converts "v4i1-and" into a known intrinsic (it is actually a target specific ISD). You would have to any-extend and trunc before and after the operation because the type of the ISD has to be legal. The extend and trunc will go away after the type-legalizer and dag-combine rounds. I will try to think if I can come up with a cleaner idea.

You could set the AND operation action to custom. The problem is
that you would have no way of knowing if the type 'v4i64' originated
from v4i1 or v4i64. And I don't think that you can use
SimplifyDemandedBits (to discover if only the high bit is set)
during the legalizer because the DAG is in a strange state, but I
could be mistaken on this one.

Okay, here is another idea. There are several DAGCombine invocations,
including one before the type-legalizer. You can define a
target-specific DAGCombine optimization which converts "v4i1-and"
into a known intrinsic (it is actually a target specific ISD). You
would have to any-extend and trunc before and after the operation
because the type of the ISD has to be legal. The extend and trunc
will go away after the type-legalizer and dag-combine rounds. I
will try to think if I can come up with a cleaner idea.

Interesting idea, it seems like that would work. If you think of
anything else also, please let me know.

If I do it this way, then I'll declare v4i64 legal (by assigning it to
the associated register class), but then declare all operations as
Expand? Will this work for loads and stores too (meaning that an
unaltered v4i64 load will be scalarized into something that uses the
normal scalar registers)?

Thanks again,
Hal

What should I do here? Should I add MVT::v?i1 types so that they can be
directly used without promotion?

Our backend(TCE) would also benefit from v?i1 types,
lack of them is a bad bottleneck for us also.

So I'd say we v2i1, v4i1, v8i1, v16i1 types should be added.