__builtin_shufflevector vs. llvm shufflevector

Hi

From the llvm documentation I gather that llvm's shufflevector allows indices to be set to <undef> and interprets that as "don't care". From the initial discussion on clang's __builtin_shufflevector (http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-February/001151.html) it sounds as if the plan was to make "-1" translate to llvm's <undef>. This code fragment from clang's lib/CodeGen/CGExprScalar.cpp supports that theory

  static llvm::Constant *getMaskElt(llvm::ShuffleVectorInst *SVI, unsigned Idx,
                                    unsigned Off, llvm::Type *I32Ty) {
    int MV = SVI->getMaskValue(Idx);
    if (MV == -1)
      return llvm::UndefValue::get(I32Ty);
    return llvm::ConstantInt::get(I32Ty, Off+MV);
  }

clang 3.0, however, responds to

  typedef signed char v2s8_t __attribute__ ((__ext_vector_type__ (2)));
  v2s8_t v = (v2s8_t){(signed char)1, (signed char)2};
  v = __builtin_shufflevector(v, v, 1, -1);

with

  error: index for __builtin_shufflevector must be less than the total number of vector elements

Does that mean that the initial plan was abolished and masks with undef element aren't supported in clang, or is this simply a signed vs. unsigned bug in Sema::SemaBuiltinShuffleVector (as the phrasing of the error message makes me suspect)?

Also, are there plans to add support for zeroed elements to llvm's shufflevector? At least SSE's pshufb instruction supports that, but currently the only way to access that functionality is via __builtin__pshufb().

best regards,
Florian Pflug

PS: Sorry if this is the wrong list to asks this question. I'd have posted on cfe-users, but such a thing does not seem to exist.

Hi

>From the llvm documentation I gather that llvm's shufflevector allows indices to be set to <undef> and interprets that as "don't care". From the initial discussion on clang's __builtin_shufflevector (http://lists.cs.uiuc.edu/pipermail/cfe-dev/2008-February/001151.html) it sounds as if the plan was to make "-1" translate to llvm's <undef>. This code fragment from clang's lib/CodeGen/CGExprScalar.cpp supports that theory

static llvm::Constant *getMaskElt(llvm::ShuffleVectorInst *SVI, unsigned Idx,
unsigned Off, llvm::Type *I32Ty) {
int MV = SVI->getMaskValue(Idx);
if (MV == -1)
return llvm::UndefValue::get(I32Ty);
return llvm::ConstantInt::get(I32Ty, Off+MV);
}

clang 3.0, however, responds to

typedef signed char v2s8_t __attribute__ ((__ext_vector_type__ (2)));
v2s8_t v = (v2s8_t){(signed char)1, (signed char)2};
v = __builtin_shufflevector(v, v, 1, -1);

with

error: index for __builtin_shufflevector must be less than the total number of vector elements

Does that mean that the initial plan was abolished and masks with undef element aren't supported in clang, or is this simply a signed vs. unsigned bug in Sema::SemaBuiltinShuffleVector (as the phrasing of the error message makes me suspect)?

I'm pretty sure this is just an accident... I wasn't really thinking
about undef shuffle indexes when I wrote the code in SemaChecking.cpp.

Also, are there plans to add support for zeroed elements to llvm's shufflevector? At least SSE's pshufb instruction supports that, but currently the only way to access that functionality is via __builtin__pshufb().

In theory, we should generate a pshufb if you shuffle a vector with
the zero vector and there isn't a more efficient choice. It wouldn't
surprise me if we don't always pick the best lowering, though.

best regards,
Florian Pflug

PS: Sorry if this is the wrong list to asks this question. I'd have posted on cfe-users, but such a thing does not seem to exist.

This list is fine.

-Eli

I tried relaxing the check during sema checking to allow -1, and fixed
codegen to consistently translate index value -1 to undef. And voila,
it works. __builtin_shufflevector(v, v, 1, -1) is now translated to

  shufflevector <2 x i8> %v, <2 x i8> undef, <2 x i32> <i32 1, i32 undef>

Patch is attached.

The patch also changes the error message for out-of-range index values
to be more specific about the allowed range (-1 to 2*N), changes the
diag code from err_shufflevector_argument_too_large to
err_shufflevector_argument_out_of_range since the former seemed strange
for signed values, and finally verifies that the indices fit into 32-bit
signed integers. The previous check for getActiveBits() <= 64 seemed
overly lax, since the indices are later treated as 32-bit quantities,
by both clang and llvm.

best regards,
Florian Pflug

clang.shufflevector.undef.v1.patch (2.53 KB)

Index: lib/CodeGen/CGExprScalar.cpp

Index: lib/CodeGen/CGExprScalar.cpp

--- lib/CodeGen/CGExprScalar.cpp (revision 149985)
+++ lib/CodeGen/CGExprScalar.cpp (working copy)
@@ -785,10 +785,13 @@
  llvm::VectorType *VTy = cast<llvm::VectorType>(V1->getType());
  SmallVector<llvm::Constant*, 32> indices;
  for (unsigned i = 2; i < E->getNumSubExprs(); i++) {
- unsigned Idx = E->getShuffleMaskIdx(CGF.getContext(), i-2);
+ int Idx = E->getShuffleMaskIdx(CGF.getContext(), i-2);

Is this patch missing some changes? This change doesn't match the
signature of getShuffleMaskIdx, and getShuffleMaskIdx has an assertion
which should break this... (although it might mostly work with
assertions disabled).

Ups, I missed that fact that getShuffleMaskIdx() needs to be changed too.
Funnily enough, it works despite this signature mismatch, even in my
Debug+Assert built. My guess is that the mask indices usually have a width
of at least 32 bits, and that makes getZExtValue() followed by a cast from
unsigned to int equivalent to getSExtValue().

I didn't find the assertion you expect to break, though - getShuffleMaskIdx
only seems to check the index into the Mask (i.e. the second parameter N),
not the mask's contents.

Anyway, I now changed the return value to int, and made it use getSExtValue()
instead of getZExtValue(). That should be safe, since Sema has already
verified that the index fits into a signed int, right?

Index: lib/Sema/SemaChecking.cpp

--- lib/Sema/SemaChecking.cpp (revision 149985)
+++ lib/Sema/SemaChecking.cpp (working copy)
@@ -1247,9 +1247,10 @@
                  diag::err_shufflevector_nonconstant_argument)
                << TheCall->getArg(i)->getSourceRange());

- if (Result.getActiveBits() > 64 || Result.getZExtValue() >= numElements*2)
+ if ((Result.getMinSignedBits() > 32) ||
+ (Result.getSExtValue() < -1) || (Result.getSExtValue() >=
numElements*2))
      return ExprError(Diag(TheCall->getLocStart(),
- diag::err_shufflevector_argument_too_large)
+ diag::err_shufflevector_argument_out_of_range)
               << TheCall->getArg(i)->getSourceRange());
  }

Ideally, this check would correctly handle both the case where Result
is signed and the case where it is unsigned... you're changing it from
assuming the value is unsigned to assuming the value is signed.

Hm, I don't see why this would break for unsigned integers. Isn't the
check for (getMinSignedBits > 32) sufficient to guarantee that getSExtValue()
won't mangle the value, regardless of Result's signed-ness?

New patch attached.

best regards,
Florian Pflug

clang.shufflevector.undef.v2.patch (3.14 KB)

Suppose that the value in question is "-1U"... getSExtValue will treat
it as a signed -1.

-Eli