[OpenCL patch] Half type as native when OpenCL's cl_khr_fp16 extension is enabled

I've reviewed the patch against trunk top-of-tree (revision 156617).
I have not compiled the code.

Generally the code changes look good. I would like to see some test
cases for code generation updates:
- half constant generation
- conversion to and from half, with other type being "float"
- ++ and --
I understand that the code generation cases may depend on inclusion of
your half-llvm.patch
(http://lists.cs.uiuc.edu/pipermail/llvmdev/2012-April/049117.html)

I have one detailed comment:
The following part of the patch seems unnecessary.
I think the existing code should work because the HalfTypeID should be
lower than the other floating types?

   @@ -641,7 +641,10 @@ Value
*ScalarExprEmitter::EmitScalarConversion(Value *Src, QualType SrcType,
      } else {
        assert(SrcTy->isFloatingPointTy() && DstTy->isFloatingPointTy() &&
               "Unknown real conversion");
   - if (DstTy->getTypeID() < SrcTy->getTypeID())
   + int DstId = DstTy->getTypeID();
   + int SrcId = Src->getType()->getTypeID();
   + if ((SrcId != llvm::Type::HalfTyID && DstId < SrcId) ||
   + DstId == llvm::Type::HalfTyID)
          Res = Builder.CreateFPTrunc(Src, DstTy, "conv");
        else
          Res = Builder.CreateFPExt(Src, DstTy, "conv");

thanks,
david

Hi David,

Many thanks for your comments.

Generally the code changes look good. I would like to see some test
cases for code generation updates:
- half constant generation
- conversion to and from half, with other type being "float"

Please see new test/CodeGenOpenCL/half.cl.

- ++ and --

OpenCL 1.1 forbids using the increment and decrement operators on
floating-point values. We haven't open-sourced the semantic check for this,
so it's hard for us to test. However, incrementing and decrementing a half
variable by 2.0f is done in the above test.

I have one detailed comment:
The following part of the patch seems unnecessary.
I think the existing code should work because the HalfTypeID should be
lower than the other floating types?

   @@ -641,7 +641,10 @@ Value
*ScalarExprEmitter::EmitScalarConversion(Value *Src, QualType SrcType,
      } else {
        assert(SrcTy->isFloatingPointTy() && DstTy->isFloatingPointTy()
&&
               "Unknown real conversion");
   - if (DstTy->getTypeID() < SrcTy->getTypeID())
   + int DstId = DstTy->getTypeID();
   + int SrcId = Src->getType()->getTypeID();
   + if ((SrcId != llvm::Type::HalfTyID && DstId < SrcId) ||
   + DstId == llvm::Type::HalfTyID)
          Res = Builder.CreateFPTrunc(Src, DstTy, "conv");
        else
          Res = Builder.CreateFPExt(Src, DstTy, "conv");

Good point! Removed.

Many thanks,
Anton.

half-clang.patch (16.2 KB)

Two small comments, if I may:

@@ -348,13 +351,18 @@ llvm::Type *CodeGenTypes::ConvertType(QualType T) {

       // FIXME: Ask target which kind of half FP it prefers (storage only vs
       // native).
- ResultType = llvm::Type::getInt16Ty(getLLVMContext());
+ if (Context.getLangOpts().OpenCL)
+ ResultType = getTypeForFormat(getLLVMContext(),
+ Context.getFloatTypeSemantics(T), /* isOpenCL = */ true);
+ else
+ ResultType = llvm::Type::getInt16Ty(getLLVMContext());
       break;
     case BuiltinType::Float:
     case BuiltinType::Double:

It seems like the if is unnecessary here, since the logic is already in getTypeForFormat.

@@ -1763,7 +1763,8 @@ bool Sema::IsFloatingPointPromotion(QualType FromType, QualType ToType) {
         return true;

       // Half can be promoted to float.
- if (FromBuiltin->getKind() == BuiltinType::Half &&
+ if (!(getLangOpts().OpenCL || getOpenCLOptions().cl_khr_fp16) &&
+ FromBuiltin->getKind() == BuiltinType::Half &&
           ToBuiltin->getKind() == BuiltinType::Float)
         return true;
     }

Did you mean the following? As written, it returns true in the OpenCL case whether or not cl_khr_fp16 is true.

if ((!getLangOpts().OpenCL || getOpenCLOptions().cl_khr_fp16) &&

John

Another comment: be sure to check for dereferences of half pointers using array subscripting as well as dereferences using the * operator. For example:

half *a;
half *b;
half *c;
...
c[10] = a[10] + b[10]; // this should be an error

John

Hi John,

Sorry I missed your original email. Yes, you are right that the previous
patch missed the case of array subscripting. Please review the updated
patch.

Many thanks,
Anton.

half-clang.patch (16.9 KB)

I'm rather unnerved by the prevalence of this pattern in IR generation:

@@ -1363,7 +1363,7 @@ ScalarExprEmitter::EmitScalarPrePostIncDec(const UnaryOperator *E, LValue LV,
     // Add the inc/dec to the real part.
     llvm::Value *amt;

- if (type->isHalfType()) {
+ if (type->isHalfType() && !CGF.getContext().getLangOpts().OpenCL) {
       // Another special case: half FP increment should be done via float
       value =
     Builder.CreateCall(CGF.CGM.getIntrinsic(llvm::Intrinsic::convert_from_fp16),

Using the OpenCL language option here seems wrong, especially given that we tend to permit OpenCL extensions in other language dialects. I think what you really want to be checking for here is whether the target wants to use native half types vs. faking them with a storage-only int16, per this FIXME you're zapping:

@@ -346,15 +349,15 @@ llvm::Type *CodeGenTypes::ConvertType(QualType T) {
       // Half is special: it might be lowered to i16 (and will be storage-only
       // type),. or can be represented as a set of native operations.

- // FIXME: Ask target which kind of half FP it prefers (storage only vs
- // native).
- ResultType = llvm::Type::getInt16Ty(getLLVMContext());
+ ResultType = getTypeForFormat(getLLVMContext(),
+ Context.getFloatTypeSemantics(T), Context.getLangOpts().OpenCL);

However, that seems like something that would also impact our semantic analysis. I don't have a solid understanding of the similarities and differences between the OpenCL and ARM NEON's half types, but from your patch it's starting to feel like they are conceptually different types.

diff --git a/lib/Sema/SemaDecl.cpp b/lib/Sema/SemaDecl.cpp
index 68f7469..b576624 100644
--- a/lib/Sema/SemaDecl.cpp
+++ b/lib/Sema/SemaDecl.cpp
@@ -4020,6 +4020,26 @@ Sema::ActOnVariableDeclarator(Scope *S, Declarator &D, DeclContext *DC,
   assert(SCSpec != DeclSpec::SCS_typedef &&
          "Parser allowed 'typedef' as storage class VarDecl.");
   VarDecl::StorageClass SC = StorageClassSpecToVarDeclStorageClass(SCSpec);

Hi Doug,

I'm rather unnerved by the prevalence of this pattern in IR generation:

- if (type->isHalfType()) {
+ if (type->isHalfType() && !CGF.getContext().getLangOpts().OpenCL)

Using the OpenCL language option here seems wrong, especially given
that we tend to permit OpenCL extensions in other language dialects.

Could you please clarify what you mean by OpenCL extensions being permitted
in other language dialects?

I think what you really want to be checking for here is whether the
target wants to use native half types vs. faking them with a storage-
only int16, per this FIXME you're zapping:
  
- // FIXME: Ask target which kind of half FP it prefers (storage
only vs
- // native).

However, that seems like something that would also impact our semantic
analysis. I don't have a solid understanding of the similarities and
differences between the OpenCL and ARM NEON's half types, but from your
patch it's starting to feel like they are conceptually different types.

I'm looking to Anton Korobeynikov here (who I believe left this comment), as
I'm only vaguely familiar with the NEON half semantics. Do you think we
should have a LangOpt such as HalfTypeNative, which would be always enabled
for OpenCL, with a typical pattern of:

- if (type->isHalfType()) {
+ if (type->isHalfType() &&

!CGF.getContext().getLangOpts().HalfTypeNative)

This would probably be a reasonable refactoring if any language dialect
other than OpenCL needs to use the half type natively.

There's no need to map down to the canonical type here; isHalfType()
will do it for you. This would be much cleaner as:

  if (R->isHalfType()) {
    // diagnose error
  } else if (const ArrayType *ArrayTy = R->getAs<ArrayType>()) {
    if (ArrayTy->getElementType()->isHalfType()) {
      // diagnose error
    }
  }

Sadly, getAs<ArrayType>() is explicitly forbidden for qualified types.
Please see the updated patch for other improvements.

Many thanks,
Anton.

half-clang.patch (17 KB)

However, that seems like something that would also impact our semantic
analysis. I don't have a solid understanding of the similarities and
differences between the OpenCL and ARM NEON's half types, but from your
patch it's starting to feel like they are conceptually different types.

I'm looking to Anton Korobeynikov here (who I believe left this comment), as
I'm only vaguely familiar with the NEON half semantics. Do you think we
should have a LangOpt such as HalfTypeNative, which would be always enabled
for OpenCL, with a typical pattern of:

- if (type->isHalfType()) {
+ if (type->isHalfType() &&

!CGF.getContext().getLangOpts().HalfTypeNative)

This would probably be a reasonable refactoring if any language dialect
other than OpenCL needs to use the half type natively.

+1

The semantics of half being storage only type vs native type is
definitely different. E.g. in the former case we should prohibit all
the possible usages which might end with bare 'half' type and not
half*.

This LangOpt seems like a good idea to me. I wonder if it should be expressed as HalfTypeNative (use convert.*.fp16 if false) or as something like PromoteHalfToFloat (use convert.*.fp16 if true). I lean slightly toward PromoteHalfToFloat, but I don't have a strong opinion.

It looks like we have four possibilities. Is this right?
1. half is completely unsupported (C, C++ default)
2. half is storage-only, support half* but no operations on half (OpenCL without cl_khr_fp16)
3. half is storage-only, operations on half automatically promoted to float (__fp16 NEON)
4. half is completely supported (OpenCL with cl_khr_fp16)

HalfTypeNative (or PromoteHalfToFloat) would differentiate between the last two cases in CodeGen; the first two cases are irrelevant in CodeGen because in those cases it should be impossible to pass a half type to EmitScalarConversion.

John

Hi John,

It looks like we have four possibilities. Is this right?
1. half is completely unsupported (C, C++ default)
2. half is storage-only, support half* but no operations on half
(OpenCL without cl_khr_fp16)
3. half is storage-only, operations on half automatically promoted to
float (__fp16 NEON)
4. half is completely supported (OpenCL with cl_khr_fp16)

HalfTypeNative (or PromoteHalfToFloat) would differentiate between the
last two cases in CodeGen; the first two cases are irrelevant in
CodeGen because in those cases it should be impossible to pass a half
type to EmitScalarConversion.

I agree.

We found a problem with our solution addressing your previous comment [1].
One of the OpenCL conformance tests attempts to do something like this:

__global half * buffer;
size_t index;
...
vstore_half(
    /* float data = */ 1.0f,
    /* size_t offset = */ 0,
    /* __global half *p = */ &buffer[index]
);

Disallowing 'buffer[index]' as in [2] results in:
error: dereferencing half pointer ('half *') is not allowed.

While the conformance tests are not sacred, it can be argued that the
composition of '&' and '' should result in half*, so should not be
rejected. Unfortunately, we found no obvious way to implement this in
Clang, so leave this case as TODO.

Please review the updated patch.

Many thanks,
Anton.

[1] http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012-June/022001.html
[2] http://lists.cs.uiuc.edu/pipermail/cfe-dev/2012-June/022079.html

half_type.patch (18 KB)

One way to handle the "storage-only" version is for buffer[index] to produce a non-modifiable lvalue (so you can't write it) and to ban the lvalue-to-rvalue conversion for such a value (for the latter, see Sema::DefaultLValueConversion).

  - Doug

Hi Doug,

> While the conformance tests are not sacred, it can be argued that the
> composition of '&' and '' should result in half*, so should not be
> rejected. Unfortunately, we found no obvious way to implement this
> in Clang, so leave this case as TODO.

One way to handle the "storage-only" version is for buffer[index] to
produce a non-modifiable lvalue (so you can't write it) and to ban the
lvalue-to-rvalue conversion for such a value (for the latter, see
Sema::DefaultLValueConversion).

We've tried to implement your proposal, but ended up with putting
OpenCLOptions into ASTContext, which is not the cleanest of solutions. If
it's the only problem with this patch, would it be possible to commit it
as-is? Meanwhile, we'll try to come up with a better solution (fixing the
FIXMEs).

Many thanks,
Anton.

half_type.patch (18 KB)

Hi,

We are working on adding cl_khr_fp16 extension support to pocl
(https://launchpad.net/pocl) based on this patch. FWIW it would be very
helpful to us to have this patch committed to trunk. It has worked without
problems so far.