Patch to add __builtin_shufflevector

Per subject, patch to add __builtin_shufflevector. This is needed to
implement mmintrin.h and friends efficiently on top of LLVM, because
LLVM isn't very smart about vectors, and vectors are hard(tm). The
diagnostics aren't quite right, but other than that, does this patch
look okay?

-Eli

tdiff.txt (11.2 KB)

Per subject, patch to add __builtin_shufflevector. This is needed to
implement mmintrin.h and friends efficiently on top of LLVM, because
LLVM isn't very smart about vectors, and vectors are hard(tm). The
diagnostics aren't quite right, but other than that, does this patch
look okay?

This is very nice! I have some minor stuff below, mostly little details.

+ ShuffleVectorExpr(Expr **args, unsigned nexpr,
+ QualType Type, SourceLocation BLoc,
+ SourceLocation RP) :
+ Expr(ShuffleVectorExprClass, Type), BuiltinLoc(BLoc),
+ RParenLoc(RP), SubExprs(args), NumExprs(nexpr) {}

This needs a comment that says that "args" must be new'd, and that ownership is transfered to the copy ctor. Better yet, it would be good if the ctor actually did the new + memcpy itself, instead of having this behavior.

+ ~ShuffleVectorExpr() {
+ delete SubExprs;
+ }

This should delete the children ASTs as well as the array.

+ /// getExpr - Return the Expr at the specified index.
+ Expr *getExpr(unsigned Index) {
+ assert((Index < NumExprs) && "Arg access out of range!");
+ return SubExprs[Index];
+ }

This should probably also come in const form. It would be nice to have an "int getShuffleMaskIdx(unsigned N)" method that converts the shuffle mask AST to a constant int and returns it, as a helper function.

--- Sema/SemaChecking.cpp (revision 47702)
+++ Sema/SemaChecking.cpp (working copy)
@@ -40,23 +40,36 @@
+ if (!CheckBuiltinCFStringArgument(TheCall->getArg(0))) {
+ delete TheCall;
+ return true;
+ }

SemaChecking should never delete the call on an error because ActOnCallExpr has an OwningPtr that will delete it if sema is not successful. I think just removing the delete is ok from the error paths. Actually, with this code in SemaExpr:

        if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr- >getDecl()))
- if (CheckFunctionCall(FDecl, TheCall.get()))
- return true;
+ return CheckFunctionCall(FDecl, TheCall.take());

the call will always be deleted. I think this code should be:

        if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr- >getDecl())) {
+ ExprResult R = CheckFunctionCall(FDecl, TheCall.take());
          if (R.error) return;
          TheCall = R.Val;
        }

+/// SemaBuiltinUnorderedCompare - Handle functions like __builtin_isgreater and
+/// friends. This is declared to take (...), so we have to check everything.
+Action::ExprResult Sema::SemaBuiltinShuffleVector(CallExpr *TheCall) {

plz update the comment.

+ if (!Context.typesAreCompatible(
+ TheCall->getArg(0)->getType().getUnqualifiedType(),
+ TheCall->getArg(1)->getType().getUnqualifiedType())) {
+ return Diag(TheCall->getLocEnd(), diag::err_typecheck_call_too_few_args);
+ }

I think the (canonical) types should be required to be exactly identical, not just compatible, what do you think?

+ if (!TheCall->getArg(i)->isIntegerConstantExpr(Result, Context))
+ return Diag(TheCall->getLocEnd(), diag::err_typecheck_call_too_few_args);
+ if (Result >= llvm::APSInt(llvm::APInt(32, numElements*2), false))
+ return Diag(TheCall->getLocEnd(), diag::err_typecheck_call_too_few_args);

These diags are good, but should be more descriptive. Do you want to support -1 as undef?

I think that "Result >= llvm::APSInt(llvm::APInt(32, numElements*2)" will segfault if the index is something like 1LL. Try something like:

   Result >= llvm::APSInt(llvm::APInt(Result.getBitWidth(), numElements*2))

oh except you also have to make the APSInt have the same sign as Result (try 1U as an index), and if it's signed, you want to reject things like -2.

+ Expr** exprs = new Expr*[TheCall->getNumArgs()];

This is very nice! I have some minor stuff below, mostly little
details.

+ ShuffleVectorExpr(Expr **args, unsigned nexpr,
+ QualType Type, SourceLocation BLoc,
+ SourceLocation RP) :
+ Expr(ShuffleVectorExprClass, Type), BuiltinLoc(BLoc),
+ RParenLoc(RP), SubExprs(args), NumExprs(nexpr) {}

This needs a comment that says that "args" must be new'd, and that
ownership is transfered to the copy ctor. Better yet, it would be
good if the ctor actually did the new + memcpy itself, instead of
having this behavior.

Okay.

+ ~ShuffleVectorExpr() {
+ delete SubExprs;
+ }

This should delete the children ASTs as well as the array.

Okay.

+ /// getExpr - Return the Expr at the specified index.
+ Expr *getExpr(unsigned Index) {
+ assert((Index < NumExprs) && "Arg access out of range!");
+ return SubExprs[Index];
+ }

This should probably also come in const form. It would be nice to
have an "int getShuffleMaskIdx(unsigned N)" method that converts the
shuffle mask AST to a constant int and returns it, as a helper function.

Fine. Although it's probably worth asking, how wide are vectors allowed to be?

--- Sema/SemaChecking.cpp (revision 47702)
+++ Sema/SemaChecking.cpp (working copy)
@@ -40,23 +40,36 @@
+ if (!CheckBuiltinCFStringArgument(TheCall->getArg(0))) {
+ delete TheCall;
+ return true;
+ }

SemaChecking should never delete the call on an error because
ActOnCallExpr has an OwningPtr that will delete it if sema is not
successful. I think just removing the delete is ok from the error
paths. Actually, with this code in SemaExpr:

        if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr-
  >getDecl()))
- if (CheckFunctionCall(FDecl, TheCall.get()))
- return true;
+ return CheckFunctionCall(FDecl, TheCall.take());

the call will always be deleted. I think this code should be:

        if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr-
  >getDecl())) {
+ ExprResult R = CheckFunctionCall(FDecl, TheCall.take());
          if (R.error) return;
          TheCall = R.Val;
        }

Hmm? I'm not quite following this bit... the current model is that if
we end up in CheckFunctionCall, it gets ownership of the call pointer
(because we call take), and then it either returns it or deletes it.
What exactly is the model you're suggesting?

+/// SemaBuiltinUnorderedCompare - Handle functions like
__builtin_isgreater and
+/// friends. This is declared to take (...), so we have to check
everything.
+Action::ExprResult Sema::SemaBuiltinShuffleVector(CallExpr *TheCall) {

plz update the comment.

Okay.

+ if (!Context.typesAreCompatible(
+ TheCall->getArg(0)->getType().getUnqualifiedType(),
+ TheCall->getArg(1)->getType().getUnqualifiedType())) {
+ return Diag(TheCall->getLocEnd(),
diag::err_typecheck_call_too_few_args);
+ }

I think the (canonical) types should be required to be exactly
identical, not just compatible, what do you think?

The two aren't different in any substantial way, as far as I can tell, but okay.

+ if (!TheCall->getArg(i)->isIntegerConstantExpr(Result, Context))
+ return Diag(TheCall->getLocEnd(),
diag::err_typecheck_call_too_few_args);
+ if (Result >= llvm::APSInt(llvm::APInt(32, numElements*2), false))
+ return Diag(TheCall->getLocEnd(),
diag::err_typecheck_call_too_few_args);

These diags are good, but should be more descriptive. Do you want to
support -1 as undef?

Ah, forgot the -1 == undef bit. Okay.

I think that "Result >= llvm::APSInt(llvm::APInt(32, numElements*2)"
will segfault if the index is something like 1LL. Try something like:

Okay.

   Result >= llvm::APSInt(llvm::APInt(Result.getBitWidth(),
numElements*2))

oh except you also have to make the APSInt have the same sign as
Result (try 1U as an index), and if it's signed, you want to reject
things like -2.

Ah, right.

+ Expr** exprs = new Expr*[TheCall->getNumArgs()];
+
+ for (unsigned i = 0; i < TheCall->getNumArgs(); i++) {
+ exprs[i] = TheCall->getArg(i);
+ TheCall->setArg(i, 0);
+ }
+
+ ShuffleVectorExpr* E = new ShuffleVectorExpr(exprs, numElements+2,
FAType,
+ TheCall->getCallee()-
  >getLocStart(),
+ TheCall-
  >getRParenLoc());

Instead of doing the new in Sema here, it would be nice if sema used a
SmallvEctor<32>, and then had the ctor do the new/memcpy just to keep
the ownership simple and clear. I think the getLocStart() line is too
long.

Okay.

+ for (unsigned i = 2; i < E->getNumSubExprs(); i++) {
+ indices.push_back(cast<llvm::Constant>(CGF.EmitScalarExpr(E-
  >getExpr(i))));
+ }

It would be nice to use a helper on ShuffleVectorExpr to get the index
as an unsigned (evaluating as an i-c-e), instead of relying on
EmitScalarExpr to do it. Also, it would be nice to handle the undef
case.

Fine.

As a random thought, it would be relatively easy to make
__builtin_shufflevector also support: __builtin_shufflevector(Vec,
idx1, idx2, idx3, idx4) - a shuffle with a single vector (implicitly
making the second vector be undef). I think this would be handy, but
is jsut syntactic sugar and may not be worth it. What do you think?

I don't think it's worth bothering with... you can always just do
__builtin_shufflevector(Vec, Vec, ...), and adding that would
significantly complicate the typechecking. If it turns out to have
significant uses, we can consider adding it.

Thanks for working on this!

No problem.

-Eli

This should probably also come in const form. It would be nice to
have an "int getShuffleMaskIdx(unsigned N)" method that converts the
shuffle mask AST to a constant int and returns it, as a helper function.

Fine. Although it's probably worth asking, how wide are vectors allowed to be?

Arbitrary in theory, in practice, vectors of up to 256 are interesting I guess, but 16 and less are used in practice.

       if (FunctionDecl *FDecl = dyn_cast<FunctionDecl>(DRExpr-

getDecl())) {

+ ExprResult R = CheckFunctionCall(FDecl, TheCall.take());
         if (R.error) return;
         TheCall = R.Val;
       }

Hmm? I'm not quite following this bit... the current model is that if
we end up in CheckFunctionCall, it gets ownership of the call pointer
(because we call take), and then it either returns it or deletes it.
What exactly is the model you're suggesting?

Ooooh, I completely missed the .take() there. If you think this is the best way to go then your code is exactly right, sorry for the oversight!

+ if (!Context.typesAreCompatible(
+ TheCall->getArg(0)->getType().getUnqualifiedType(),
+ TheCall->getArg(1)->getType().getUnqualifiedType())) {
+ return Diag(TheCall->getLocEnd(),
diag::err_typecheck_call_too_few_args);
+ }

I think the (canonical) types should be required to be exactly
identical, not just compatible, what do you think?

The two aren't different in any substantial way, as far as I can tell, but okay.

Fair enough, checking for type equality makes it more explicit what you mean though.

As a random thought, it would be relatively easy to make
__builtin_shufflevector also support: __builtin_shufflevector(Vec,
idx1, idx2, idx3, idx4) - a shuffle with a single vector (implicitly
making the second vector be undef). I think this would be handy, but
is jsut syntactic sugar and may not be worth it. What do you think?

I don't think it's worth bothering with... you can always just do
__builtin_shufflevector(Vec, Vec, ...), and adding that would
significantly complicate the typechecking. If it turns out to have
significant uses, we can consider adding it.

Great point, lets wait until there is a client for it.

Thanks again Eli,

-Chris

Updated version of __builtin_shufflevector patch; should address
review comments. (Sorry this took so long; you probably forgot what
your review comments were.)

-Eli

clangshufflevector.txt (13.3 KB)