[OpenCL patch] support for __builtin_astype, __builtin_convert, __builtin_vec_step

I realize there is already some patches to the mailing list regarding vecstep, astype, etc. I realize some discussion will need to be had to determine which is the right approach. I need to get caught up on the emails, but thought I would send this patch to the list first.

The attached patch adds support for builtins for astype, convert, and vec_step. I chose to group these together because they require very similar code modifications. I’ve included test cases for each.

Please consult the OpenCL spec for a full description but here is a brief summary of each:

__builtin_astype(): Used to reinterpreted as another data type of the same size using for both scalar and vector data types.
__builtin_vec_step(): The vec_step built-in function takes a built-in scalar or vector data type argument and returns an integer value representing the number of elements in the scalar or vector.
__builtin_convert(): provides a way to perform scalar and vector conversions with rounding and saturation.

Please review.

asType.patch (39.1 KB)

Hi Tanya,

Thanks for the patch; it’s great to get your contribution for the OpenCL support.

Creating operators that may get types as arguments for as_typen and convert is a very elegant solution; however it’s quite intrusive, as these features might be implemented as standard functions, and linked to the user’s code later. As_typen needs some special checks, so it could be a clang built-in, but conversions basically could be implemented without changing clang at all.

Vec_step is very similar to the sizeof and the alignof operators, so I think it’s OK to implement it using the same expression type, instead of code duplication.

Index: include/clang/Basic/DiagnosticSemaKinds.td

image001.png

Hi Tanya,
Thanks for the patch; it’s great to get your contribution for the OpenCL support.

Sorry for the delay in my response.

Creating operators that may get types as arguments for as_typen and convert is a very elegant solution; however it’s quite intrusive, as these features might be implemented as standard functions, and linked to the user’s code later. As_typen needs some special checks, so it could be a clang built-in, but conversions basically could be implemented without changing clang at all.

Having AsType or Convert as part of the IR is beneficial for error reporting and tools such as the static analyzer. There is no negative impact on the end user, only benefits.

Vec_step is very similar to the sizeof and the alignof operators, so I think it’s OK to implement it using the same expression type, instead of code duplication.

I agree that this is a good alternative to our implementation. I've looked at your patch and it looks like we are doing the same work but merging it with sizeof is a good idea to prevent code duplication.

Index: include/clang/Basic/DiagnosticSemaKinds.td

--- include/clang/Basic/DiagnosticSemaKinds.td (revision 125808)
+++ include/clang/Basic/DiagnosticSemaKinds.td (working copy)
@@ -3723,6 +3723,26 @@
   "%0 does not refer to the name of a parameter pack; did you mean %1?">;
def note_parameter_pack_here : Note<"parameter pack %0 declared here">;

+def err_cvt_arg_must_be_constant : Error<
+ "__builtin_convert requires constant 3rd and 4th argument">;
+def err_invalid_astype_of_different_size : Error<
+ "invalid astype between type '%0' and '%1' of different size">;
+def err_vec_step_bitfield : Error<
+ "invalid application of 'vec_step' to bitfield">;

Bitfields are not supported in OpenCL at all – is this error message really needed?

Probably not then.

Index: include/clang/Basic/TokenKinds.def

--- include/clang/Basic/TokenKinds.def (revision 125808)
+++ include/clang/Basic/TokenKinds.def (working copy)
@@ -356,6 +356,11 @@
KEYWORD(__vector , KEYALTIVEC)
KEYWORD(__pixel , KEYALTIVEC)

+// OpenCL Extensions.
+KEYWORD(__builtin_astype , KEYALL)
+KEYWORD(__builtin_convert , KEYALL)
+KEYWORD(__builtin_vec_step , KEYALL)
+

The latest version of CLANG already has KEYOPENCL flag, so keywords can be flagged as OpenCL only. This way vec_step could be a keyword, instead of using macros later to turn it to __builtin_vec_step. Also, it saves later checking of right target language.

Ok, I'll change this.

Index: lib/CodeGen/CGExprScalar.cpp

--- lib/CodeGen/CGExprScalar.cpp (revision 125808)
+++ lib/CodeGen/CGExprScalar.cpp (working copy)
@@ -2534,6 +2537,86 @@
   return CGF.EmitBlockLiteral(block);
}

+Value *ScalarExprEmitter::VisitAsTypeExpr(AsTypeExpr *E) {
+ Value *Src = CGF.EmitScalarExpr(E->getSrcExpr());
+ const llvm::Type * DstTy = ConvertType(E->getDstType());
+
+ // Going from vec4->vec3 or vec3->vec4 is a special case and requires
+ // a shuffle vector instead of a bitcast.
+ const llvm::Type *SrcTy = Src->getType();
+ if (isa<llvm::VectorType>(DstTy) && isa<llvm::VectorType>(SrcTy)) {
+ unsigned numElementsDst = cast<llvm::VectorType>(DstTy)->getNumElements();
+ unsigned numElementsSrc = cast<llvm::VectorType>(SrcTy)->getNumElements();
+
+ if ((numElementsDst == 3 && numElementsSrc == 4)
+ || (numElementsDst == 4 && numElementsSrc == 3)) {

The OpenCL spec defines the behavior of vec4->vec3, but about vec3->vec4 it writes:
  float3 f;
  // Error. Result and operand have different sizes
  float4 g = as_float4(f);

Also, shuffle vector is not enough – there might be a conversion from int4 to float3, which is legal, but will crash the compiler if it’s done only by shuffle. There must be a bitcast here.

Ok. I'll modify.

+ if (dst_fp && src_fp)
+ ID = llvm::Intrinsic::convertff;
+ else if (dst_fp && !src_fp)
+ ID = src_s ? llvm::Intrinsic::convertfsi : llvm::Intrinsic::convertfui;
+ else if (!dst_fp && src_fp)
+ ID = dst_s ? llvm::Intrinsic::convertsif : llvm::Intrinsic::convertuif;
+ else if (dst_s && src_s)
+ ID = llvm::Intrinsic::convertss;
+ else if (dst_s && !src_s)
+ ID = llvm::Intrinsic::convertsu;
+ else if (!dst_s && src_s)
+ ID = llvm::Intrinsic::convertus;
+ else
+ ID = llvm::Intrinsic::convertuu;
+

These intrinsics are not documented. Are they safe to use? Do they fulfill the OpenCL spec’s precision requirements?

I believe they meet OpenCL's spec precision requirements as they are working in our implementation. Do you have a specific example where you think they won't?

As for being safe to use, what do you mean?

Thanks,
Tanya

Hi Tanya,
Thanks for the patch; it’s great to get your contribution for the OpenCL support.

Sorry for the delay in my response.

Creating operators that may get types as arguments for as_typen and convert is a very elegant solution; however it’s quite intrusive, as these features might be implemented as standard functions, and linked to the user’s code later. As_typen needs some special checks, so it could be a clang built-in, but conversions basically could be implemented without changing clang at all.

Having AsType or Convert as part of the IR is beneficial for error reporting and tools such as the static analyzer. There is no negative impact on the end user, only benefits.

Vec_step is very similar to the sizeof and the alignof operators, so I think it’s OK to implement it using the same expression type, instead of code duplication.

I agree that this is a good alternative to our implementation. I've looked at your patch and it looks like we are doing the same work but merging it with sizeof is a good idea to prevent code duplication.

Index: include/clang/Basic/DiagnosticSemaKinds.td

--- include/clang/Basic/DiagnosticSemaKinds.td (revision 125808)
+++ include/clang/Basic/DiagnosticSemaKinds.td (working copy)
@@ -3723,6 +3723,26 @@
   "%0 does not refer to the name of a parameter pack; did you mean %1?">;
def note_parameter_pack_here : Note<"parameter pack %0 declared here">;

+def err_cvt_arg_must_be_constant : Error<
+ "__builtin_convert requires constant 3rd and 4th argument">;
+def err_invalid_astype_of_different_size : Error<
+ "invalid astype between type '%0' and '%1' of different size">;
+def err_vec_step_bitfield : Error<
+ "invalid application of 'vec_step' to bitfield">;

Bitfields are not supported in OpenCL at all – is this error message really needed?

Probably not then.

Index: include/clang/Basic/TokenKinds.def

--- include/clang/Basic/TokenKinds.def (revision 125808)
+++ include/clang/Basic/TokenKinds.def (working copy)
@@ -356,6 +356,11 @@
KEYWORD(__vector , KEYALTIVEC)
KEYWORD(__pixel , KEYALTIVEC)

+// OpenCL Extensions.
+KEYWORD(__builtin_astype , KEYALL)
+KEYWORD(__builtin_convert , KEYALL)
+KEYWORD(__builtin_vec_step , KEYALL)
+

The latest version of CLANG already has KEYOPENCL flag, so keywords can be flagged as OpenCL only. This way vec_step could be a keyword, instead of using macros later to turn it to __builtin_vec_step. Also, it saves later checking of right target language.

Ok, I'll change this.

Index: lib/CodeGen/CGExprScalar.cpp

--- lib/CodeGen/CGExprScalar.cpp (revision 125808)
+++ lib/CodeGen/CGExprScalar.cpp (working copy)
@@ -2534,6 +2537,86 @@
   return CGF.EmitBlockLiteral(block);
}

+Value *ScalarExprEmitter::VisitAsTypeExpr(AsTypeExpr *E) {
+ Value *Src = CGF.EmitScalarExpr(E->getSrcExpr());
+ const llvm::Type * DstTy = ConvertType(E->getDstType());
+
+ // Going from vec4->vec3 or vec3->vec4 is a special case and requires
+ // a shuffle vector instead of a bitcast.
+ const llvm::Type *SrcTy = Src->getType();
+ if (isa<llvm::VectorType>(DstTy) && isa<llvm::VectorType>(SrcTy)) {
+ unsigned numElementsDst = cast<llvm::VectorType>(DstTy)->getNumElements();
+ unsigned numElementsSrc = cast<llvm::VectorType>(SrcTy)->getNumElements();
+
+ if ((numElementsDst == 3 && numElementsSrc == 4)
+ || (numElementsDst == 4 && numElementsSrc == 3)) {

The OpenCL spec defines the behavior of vec4->vec3, but about vec3->vec4 it writes:
  float3 f;
  // Error. Result and operand have different sizes
  float4 g = as_float4(f);

Also, shuffle vector is not enough – there might be a conversion from int4 to float3, which is legal, but will crash the compiler if it’s done only by shuffle. There must be a bitcast here.

Ok. I'll modify.

+ if (dst_fp && src_fp)
+ ID = llvm::Intrinsic::convertff;
+ else if (dst_fp && !src_fp)
+ ID = src_s ? llvm::Intrinsic::convertfsi : llvm::Intrinsic::convertfui;
+ else if (!dst_fp && src_fp)
+ ID = dst_s ? llvm::Intrinsic::convertsif : llvm::Intrinsic::convertuif;
+ else if (dst_s && src_s)
+ ID = llvm::Intrinsic::convertss;
+ else if (dst_s && !src_s)
+ ID = llvm::Intrinsic::convertsu;
+ else if (!dst_s && src_s)
+ ID = llvm::Intrinsic::convertus;
+ else
+ ID = llvm::Intrinsic::convertuu;
+

These intrinsics are not documented. Are they safe to use? Do they fulfill the OpenCL spec’s precision requirements?

I believe they meet OpenCL's spec precision requirements as they are working in our implementation. Do you have a specific example where you think they won't?

As for being safe to use, what do you mean?

Thanks,
Tanya

The use of the conversions intrinsics depends on a llvm target backend that support those conversion operations. Backends are not required to support them. If a backend does support them, they are meant to fulfill OpenCL spec's precision requirements. One could support the OpenCL conversions through libraries calls, which would be invisible to clang.

  -- Mon Ping

Hi Tanya,

If I get it right, you want as_typen and convert implemented in LLVM already in the compiled kernel, and not as function calls, in order to enjoy the benefits of the LLVM static verifier. This can be achieved also by inlining the called function prior to the verification, but due to the strong-typedness of LLVM, you don’t even need that. If you have the function signature declared right, and the kernel calling it verifies, and also your run time functions module is verified, you’re on the safe side.

As for as_typen, defining it as a real clang built-in function instead of creating a new expression class for it doesn’t mean we can’t replace it in the built-in implementation by a single LLVM bitcast. That’s actually what we do in our implementation.

Also, I think the testing for these three features should be more exhaustive, the tests in your patch are really short.

For vec_step, you should also check the result, for convert you could check that the right intrinsic call is generated, and for as_typen you could check that the LLVM bitcast is generated right.

— test/Parser/vecstep.c (revision 0)

+++ test/Parser/vecstep.c (revision 0)

@@ -0,0 +1,13 @@

+// RUN: %clang_cc1 -fsyntax-only -verify %s

image001.png

Hi Tanya,

If I get it right, you want as_typen and convert implemented in LLVM already in the compiled kernel, and not as function calls, in order to enjoy the benefits of the LLVM static verifier. This can be achieved also by inlining the called function prior to the verification, but due to the strong-typedness of LLVM, you don’t even need that. If you have the function signature declared right, and the kernel calling it verifies, and also your run time functions module is verified, you’re on the safe side.

This is not the same as inlining. They would be actual nodes in the AST and mean something to the static analyzer and perhaps other tools. Why are you against having as_type as a node in the AST? Does this cause some problem? I don't understand your statement below.

As for as_typen, defining it as a real clang built-in function instead of creating a new expression class for it doesn’t mean we can’t replace it in the built-in implementation by a single LLVM bitcast. That’s actually what we do in our implementation.

Also, I think the testing for these three features should be more exhaustive, the tests in your patch are really short.
For vec_step, you should also check the result, for convert you could check that the right intrinsic call is generated, and for as_typen you could check that the LLVM bitcast is generated right.

I'm happy to improve the testing if this is the selected approach.

Thanks,
Tanya

Does anyone else have any comments on the patch? I'd like to move forward with it.

Thanks,
Tanya