passing vector of booleans to functions

Hi all,

I'm currently trying to figure out the best way to pass vector of
booleans to other functions. Take this small example:

define <4 x float> @vcmp_add(<4 x float> %a, <4 x float> %b) {
entry:
  %cmp = fcmp olt <4 x float> %a, %b
  %add = fadd <4 x float> %a, %b
  %sel = select <4 x i1> %cmp, <4 x float> %add, <4 x float> %a
  ret <4 x float> %sel
}

I will get (on SSE):
  movaps %xmm0, %xmm2
  cmpltps %xmm1, %xmm0
  addps %xmm2, %xmm1
  blendvps %xmm1, %xmm2
  movaps %xmm2, %xmm0
  ret

great :slight_smile:
But now, let us try to pass a mask to a function.

define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float> %b) {
entry:
  %add = fadd <4 x float> %a, %b
  %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a
  ret <4 x float> %sel
}

I will get:

addps %xmm1, %xmm2
pslld $31, %xmm0
blendvps %xmm2, %xmm1
movaps %xmm1, %xmm0
ret

While this is correct and works, I'm unhappy with the pssld. Apparently,
LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask
bit. But blendvps expects the MSB as mask bit and therefore the shift.

OK, let's try better. This time, I will directly use <4 x i32>:

define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float> %a, <4 x float> %b)
{
entry:
  %add = fadd <4 x float> %a, %b
  %trunc = trunc <4 x i32> %mask to <4 x i1>
  %sel = select <4 x i1> %trunc, <4 x float> %add, <4 x float> %a
  ret <4 x float> %sel
}

But damn, I have to truncate the mask in order to use the select. So in
the end, LLVM will produce the same code as above. So what code do I
have to use, in order to get rid of the shift?

If there would be a way to somehow tell LLVM that each element of %mask
is guaranteed to be 0xFFFFFFFF or 0x0...

Thanks,
Roland

Hi Roland,

> define <4 x float> @masked_add_1(<4 x i1> %mask, <4 x float> %a, <4 x float> %b) {

entry:
   %add = fadd <4 x float> %a, %b
   %sel = select <4 x i1> %mask, <4 x float> %add, <4 x float> %a
   ret <4 x float> %sel
}

I will get:

addps %xmm1, %xmm2
pslld $31, %xmm0
blendvps %xmm2, %xmm1
movaps %xmm1, %xmm0
ret

While this is correct and works, I'm unhappy with the pssld. Apparently,
LLVM uses a <4 x i32> to hold the <4 x i1> while the LSB holds the mask
bit. But blendvps expects the MSB as mask bit and therefore the shift.

try plunking a signext attribute on the mask parameter. That's supposed to tell
the code generators that the caller passed in an all-zero or all-one value.

Ciao, Duncan.

Hi Duncan,

thanks for the hint. I tried both variants:

define <4 x float> @masked_add_1(<4 x i1> signext %mask, <4 x float> %a, <4 x

%b)

define <4 x float> @masked_add_32(<4 x i32> %mask, <4 x float> %a, <4 x float> %b)

Unfortunately, this will raise an assertion:
Wrong types for attribute: zeroext signext noalias nocapture sret byval nest

Should I file a bug report?

Hi Roland,