Legalizing v32i1, v64i1 for Haswell pext/pdep instructions

I have a group of students working with me on some
LLVM projects related to our Parabix research.

One interesting issue that has come up for us is
code generation support for the Haswell new instructions
pext and pdep. These instructions shuffle bits within
a 64-bit word, either gathering all selected bits to
the beginning (pext) or scattering some initial bits
throughout (pdep).

A natural model for this is to use shufflevector
on v32i1 and v64i1 vectors. We've got some preliminary
notes here:
http://parabix.costar.sfu.ca/wiki/BitShuffle

Since we're quite new at this, I have some questions
about strategy.

(1) First, it seems that legalizing v32i1 and v64i1 types
for x86 would make sense. This will allow us to
retain the shufflemasks involving these vector types
into code generation.

(2) To legalize these types, we need to support all
the vector operations.

  This involves implementing all of the IR operations
on these vector types. For most of these it seems
that simple substitutions suffice. Adding two
32vi1 vectors just involves bitwise xor, while
mul of such vectors just involves bitwise and.

My questions are these.

(1) Is this strategy basically reasonable?

(2) Are there alternative mechanisms for generating
pext/pdep from v32i1 and v64i1 shufflevector invocations?

(3) The legalization method for most operations (other
than shufflevector) is generic, i.e, not processor-specific.
Is there a way to make types such as v64i1 legal for any processor
that supports i64?

Hi Rob,

One interesting issue that has come up for us is
code generation support for the Haswell new instructions
pext and pdep. These instructions shuffle bits within
a 64-bit word, either gathering all selected bits to
the beginning (pext) or scattering some initial bits
throughout (pdep).

A natural model for this is to use shufflevector
on v32i1 and v64i1 vectors. We've got some preliminary
notes here:
http://parabix.costar.sfu.ca/wiki/BitShuffle

I agree that one way to model these instructions is as shuffles but like you mentioned in your email I expect that implementing this would be non trivial. Are you expecting the pext/pdep instructions to interact with other instructions? If not, then going with intrinsics is probably the best option.

Thanks,
Nadav

Hi, Nadav.

We are expecting to do a lot with bit vector manipulation
at the IR level and then to target multiple back ends.

For example, one of our core operations is the Parabix
Transform - this can be implemented using bit shuffling
among other operations.
http://parabix.costar.sfu.ca/wiki/ParabixTransform

We're also trying to move away from the use of intrinsics
in our libraries and target LLVM IR instead. If we can
add capabilities to LLVM that give us the performance we
currently achieve with intrinsics-based libraries, we will
be happy.

This is part of an ongoing research effort; we realize
that we have much to learn and work to do.