Convert the result of a vector comparison into a scalar bit mask?

When LLVM does a comparison of two vectors, in this case with 16
elements, the returned type of setcc is v16i1. The architecture I'm
targeting allows storing the result of a vector comparison as a bit
mask in a scalar register, but I'm having trouble converting the
result of setcc into a value that is usable there. For example, if I
try to AND together masks that are the results of two comparisons, it
can't select an instruction because the operand types are v16i1 and no
instructions can deal with that. I don't want to have to modify every
instruction to be aware of v16i1 as a data type (which doesn't seem
right anyway). Ideally, I could just tell the backend to treat the
result of a vector setcc as an i32. I've tried a number of things,
including:

- Using setOperationAction for SETCC to Promote and set the Promote
type to i32. It asserts internally because it tries to do a sext
operation on the result, which is incompatible.

- Using a custom lowering action to wrap the setcc in a combination of
BITCAST/ZERO_EXTEND nodes (which I could match and eliminate in the
instruction pattern). However those DAG nodes get removed during one
of the passes and I the result type is still v16i1

So, my question is: what is the proper way to convert the result of a
vector comparison into a scalar bitmask?

After some thought, I realize that the second approach doesn't work
because the operation would be applied to each element in the vector
(thus the result is still a vector). There doesn't appear a promotion
type that will pack a vector.

I tried adding a lowering that will transform SETCC into into a custom
node that returns a scalar:

SDValue
VectorProcTargetLowering::LowerSETCC(SDValue Op, SelectionDAG &DAG) const
{
    return DAG.getNode(SPISD::VECTOR_COMPARE, Op.getDebugLoc(),
MVT::i32, Op.getOperand(0),
        Op.getOperand(1), Op.getOperand(2));
}

def veccmp : SDNode<"SPISD::VECTOR_COMPARE", SDTypeProfile<1, 1, [SDTCisInt<0>,
    SDTCisSameAs<1, 2>, SDTCisVec<1>]>>;

And changing the pattern that matches vector comparisions to:

  [(set i32:$dst, (veccmp v16i32:$a, v16i32:$b, condition))]

Unfortunately, this ends up tripping an assert:

Assertion failed: (Op.getValueType().getScalarType().getSizeInBits()
== BitWidth && "Mask size mismatches value type size!"), function
SimplifyDemandedBits, file
llvm/lib/CodeGen/SelectionDAG/TargetLowering.cpp, line 357.

(another variant of this would be to keep the setcc and wrap it with a
custom node 'PACK_VECTOR' that takes v16i1 as a param and returns i32
as a result).

I'm not sure if I'm on the right track with this approach or not.

Since I'm exposing this with a built-in in clang anyway (since there
is no other way to do in C this that I know of), I could just punt
entirely and use an intrinsic to expose packed vector comparisons.
But that doesn't seem like the right thing to do.