[PATCH] OpenCL half support

Hi Anton,

Does the spec force evaluation to happen in half mode, or does it specify that there is a promotion to float (or some other type) an operation, then truncation back to half?

-Chris

Hi Chris,

Does the spec force evaluation to happen in half mode, or does it
specify that there is a promotion to float (or some other type), an
operation, then truncation back to half?

The last paragraph in section 9.6 says: "NOTE: Implementations may perform
floating-point operations on half scalar or vector data types by converting
the half values to single precision floating-point values and performing the
operation in single precision floating-point. In this case, the
implementation will use the half scalar or vector data type as a storage
only format."

That is, an implementation may perform operations on half scalar and vector
values either using half-precision operations (if supported natively) or
using single-precision operations (always supported natively). In either
case, it's desirable to represent half operations in the IR, and let the
backend make the decision.

Cheers,
Anton.

Hi Chris,

So what do you think about this proposal? If you agree, it would be good to
include the patch into the 2.9 release (to avoid breaking compatibility
later).

Best regards,
Anton.

Hi Anton,

So what do you think about this proposal? If you agree, it would be good to
include the patch into the 2.9 release (to avoid breaking compatibility
later).

Regardless of the review, it's too late for 2.9, stuff was already branched.

PS: my 2 cents: do not forget to handle the existing half fp <-> float
conversion intrinsics.

Hi Chris,

So what do you think about this proposal? If you agree, it would be good to
include the patch into the 2.9 release (to avoid breaking compatibility
later).

Hi Anton, I'm sorry I don't have the patch anymore. Please resend. It is too late for new features in 2.9 though.

The last paragraph in section 9.6 says: "NOTE: Implementations may
perform floating-point operations on half scalar or vector data types
by converting the half values to single precision floating-point values
and performing the operation in single precision floating-point. In
this case, the implementation will use the half scalar or vector data
type as a storage only format."

Ok.

That is, an implementation may perform operations on half scalar and
vector values either using half-precision operations (if supported
natively) or using single-precision operations (always supported
natively). In either case, it's desirable to represent half operations
in the IR, and let the backend make the decision.

It doesn't impact the utility of your approach, but I could not disagree more here. It would be *absolutely* the wrong thing to do for backends to compile IR half float operations into full float operations. Doing this would cause all sorts of problems with constant folding being inconsistent etc.

Adding half float to LLVM IR is *only* reasonable if you have hardware that supports half float, or if you want to add softfloat operations for these. If we have a fp16 datatype in the IR, code generation *must* codegen these to something that implements the correct fp16 semantics.

C is not a portable language, and trying to make LLVM IR magically fix this is a bad approach. Just like C compilers need to know sizeof(long), sizeof(void*) and many many other target specific details, an OpenCL compiler would need to know whether to generate fp16 or not.

-Chris

From: llvmdev-bounces@cs.uiuc.edu [mailto:llvmdev-bounces@cs.uiuc.edu]
On Behalf Of Chris Lattner
Sent: Friday, March 18, 2011 1:15 PM
To: Anton.Lokhmotov@arm.com
Cc: llvmdev@cs.uiuc.edu
Subject: Re: [LLVMdev] [PATCH] OpenCL half support

> Hi Chris,
>
> So what do you think about this proposal? If you agree, it would be
good to
> include the patch into the 2.9 release (to avoid breaking
compatibility
> later).

Hi Anton, I'm sorry I don't have the patch anymore. Please resend. It
is too late for new features in 2.9 though.

>> The last paragraph in section 9.6 says: "NOTE: Implementations may
>> perform floating-point operations on half scalar or vector data
types
>> by converting the half values to single precision floating-point
values
>> and performing the operation in single precision floating-point. In
>> this case, the implementation will use the half scalar or vector
data
>> type as a storage only format."

Ok.

>> That is, an implementation may perform operations on half scalar and
>> vector values either using half-precision operations (if supported
>> natively) or using single-precision operations (always supported
>> natively). In either case, it's desirable to represent half
operations
>> in the IR, and let the backend make the decision.

It doesn't impact the utility of your approach, but I could not
disagree more here. It would be *absolutely* the wrong thing to do for
backends to compile IR half float operations into full float
operations. Doing this would cause all sorts of problems with constant
folding being inconsistent etc.

Adding half float to LLVM IR is *only* reasonable if you have hardware
that supports half float, or if you want to add softfloat operations
for these. If we have a fp16 datatype in the IR, code generation
*must* codegen these to something that implements the correct fp16
semantics.

C is not a portable language, and trying to make LLVM IR magically fix
this is a bad approach. Just like C compilers need to know
sizeof(long), sizeof(void*) and many many other target specific
details, an OpenCL compiler would need to know whether to generate fp16
or not.

[Villmow, Micah] Chris, In OpenCL, the user has to explicitly state that they want to use fp16 and it is illegal to use the half data type for computation if it isn't natively supported. I think it would be useful to have fp16 in the IR for the reason that we support load/stores of the data type, but not operations on the data type. Right now we handle that by treating them like 16bit ints, but it would be nice to be able to represent them correctly.

Villmow, Micah wrote:

  
From:  []
On Behalf Of Chris Lattner
Sent: Friday, March 18, 2011 1:15 PM
To: 
Cc: 
Subject: Re: [LLVMdev] [PATCH] OpenCL half support


    
Hi Chris,

So what do you think about this proposal?  If you agree, it would be
      
good to
    
include the patch into the 2.9 release (to avoid breaking
      
compatibility
    
later).
      
Hi Anton, I'm sorry I don't have the patch anymore.  Please resend.  It
is too late for new features in 2.9 though.

    
The last paragraph in section 9.6 says: "NOTE: Implementations may
perform floating-point operations on half scalar or vector data
        
types
    
by converting the half values to single precision floating-point
        
values
    
and performing the operation in single precision floating-point. In
this case, the implementation will use the half scalar or vector
        
data
    
type as a storage only format."
        
Ok.

    
That is, an implementation may perform operations on half scalar and
vector values either using half-precision operations (if supported
natively) or using single-precision operations (always supported
natively).  In either case, it's desirable to represent half
        
operations
    
in the IR, and let the backend make the decision.
        
It doesn't impact the utility of your approach, but I could not
disagree more here.  It would be *absolutely* the wrong thing to do for
backends to compile IR half float operations into full float
operations.  Doing this would cause all sorts of problems with constant
folding being inconsistent etc.

Adding half float to LLVM IR is *only* reasonable if you have hardware
that supports half float, or if you want to add softfloat operations
for these.  If we have a fp16 datatype in the IR, code generation
*must* codegen these to something that implements the correct fp16
semantics.

C is not a portable language, and trying to make LLVM IR magically fix
this is a bad approach.  Just like C compilers need to know
sizeof(long), sizeof(void*) and many many other target specific
details, an OpenCL compiler would need to know whether to generate fp16
or not.
    
[Villmow, Micah] Chris, In OpenCL, the user has to explicitly state that they want to use fp16 and it is illegal to use the half data type for computation if it isn't natively supported. I think it would be useful to have fp16 in the IR for the reason that we support load/stores of the data type, but not operations on the data type. Right now we handle that by treating them like 16bit ints, but it would be nice to be able to represent them correctly.
  

Maybe worth pointing out that there are architectures that natively support 16bit floating point in llvm. PTX, the new backend of which has just been added to 2.9 can handle fp16 → fp32 conversion in hardware. I agree we should have support for fp16 in the IR, it’s fiddly trying to make do without this and gets used frequently in simulations and graphics in particular.

Maybe worth pointing out that there are architectures that natively support
16bit floating point in llvm. PTX, the new backend of which has just been
added to 2.9 can handle fp16 -> fp32 conversion in hardware.

FWIW: there are already intrinsics for such conversions (currently
only used in ARM backend).
There is no need for new type if you want just to convert stuff.

From: Anton Korobeynikov [mailto:anton@korobeynikov.info]
Sent: Friday, March 18, 2011 4:06 PM
To: Dan Bailey
Cc: Villmow, Micah; llvmdev@cs.uiuc.edu; Anton.Lokhmotov@arm.com
Subject: Re: [LLVMdev] [PATCH] OpenCL half support

> Maybe worth pointing out that there are architectures that natively
support
> 16bit floating point in llvm. PTX, the new backend of which has just
been
> added to 2.9 can handle fp16 -> fp32 conversion in hardware.
FWIW: there are already intrinsics for such conversions (currently
only used in ARM backend).
There is no need for new type if you want just to convert stuff.

[Villmow, Micah] I've looked into this, but the problem with the intrinsic is that they only support scalar types and do not support any saturation or rounding modes. If there was a way in the current approach to handle these cases, then I would say use what is there, but what is currently there is very basic and doesn't cover even all of the load/store + conversion cases.

[Villmow, Micah] Chris, In OpenCL, the user has to explicitly state that they want to use fp16 and it is illegal to use the half data type for computation if it isn't natively supported. I think it would be useful to have fp16 in the IR for the reason that we support load/stores of the data type, but not operations on the data type. Right now we handle that by treating them like 16bit ints, but it would be nice to be able to represent them correctly.

My understanding is that OpenCL allows promoting to float (32-bit) types. OpenCL doesn't (afaik) require support for fp16.

Maybe worth pointing out that there are architectures that natively support 16bit floating point in llvm. PTX, the new backend of which has just been added to 2.9 can handle fp16 -> fp32 conversion in hardware. I agree we should have support for fp16 in the IR, it's fiddly trying to make do without this and gets used frequently in simulations and graphics in particular.

LLVM already fully supports fp16 <-> fp32 conversions. If you want to add saturation support for these conversions, that is completely orthogonal to adding fp16 as a "native" llvm type: Adding fp16 as a "native" LLVM IR type doesn't give you saturating conversions.

-Chris

Adding half float to LLVM IR is *only* reasonable if you have hardware
that supports half float, or if you want to add softfloat operations
for these.

Yes, our graphics hardware natively supports some fp16 arithmetic
operations.

Just like C compilers need to know sizeof(long), sizeof(void*) and
many many other target specific details, an OpenCL compiler would need
to know whether to generate fp16 or not.

Yes, it's just another example of LLVM-IR non-portability. Basically, any
fp16 arithmetic code can be generated only if the cl_khr_fp16 extension is
supported (otherwise, the frontend would reject even declaring fp16
variables, leave alone performing arithmetic on them).

Anton.

Adding half float to LLVM IR is *only* reasonable if you have hardware
that supports half float, or if you want to add softfloat operations
for these.

Yes, our graphics hardware natively supports some fp16 arithmetic
operations.

Ok.

Just like C compilers need to know sizeof(long), sizeof(void*) and
many many other target specific details, an OpenCL compiler would need
to know whether to generate fp16 or not.

Yes, it's just another example of LLVM-IR non-portability. Basically, any
fp16 arithmetic code can be generated only if the cl_khr_fp16 extension is
supported (otherwise, the frontend would reject even declaring fp16
variables, leave alone performing arithmetic on them).

If the backend generates softfloat (or some other expansion) for fp16, then a native fp16 type would be perfectly portable. This is just not the "portability" that you're looking for (which is not behavior preserving, so it isn't portability by its standard definition).

-Chris

Hi Chris,

It is important for embedded/mobile computation to have efficient fp16 support, otherwise those users will suffer from the merging problem with their local LLVM with native fp16 type they add (locally). So we should either add full fp16 support as a basic floating point type or enhance the LLVM infrastructure to make floating point type as scalable as int type.

-Chihong

Hi Chris,

It is important for embedded/mobile computation to have efficient fp16 support, otherwise those users will suffer from the merging problem with their local LLVM with native fp16 type they add (locally). So we should either add full fp16 support as a basic floating point type or enhance the LLVM infrastructure to make floating point type as scalable as int type.

As I've said several times now :), I'm ok with having fp16 as a native LLVM type so long as there is hardware that implements fp16 arithmetic operations like add and sub with correct fp16 rounding etc.

-Chris

Sorry. I should have clearly said: "there are already quite some embedded/mobile chips providing fp16 ALU operations for performance".

Thanks,
Chihong

Chris Lattner wrote:

I'm sorry I don't have the patch anymore. Please resend.

Attached. (Copying to cfe-dev, as the patch is dual Clang/LLVM.)

Anton Korobeynikov wrote:

PS: my 2 cents: do not forget to handle the existing half fp <-> float
conversion intrinsics.

We are not quite sure what to do with them. Can anyone help?

Best wishes,
Anton.

00004-half-llvm.patch (11.7 KB)

00004-half-clang.patch (24.1 KB)

Any comments at all?

Many thanks,
Anton.

Hi Anton,

Sorry for dropping this. Can you resend your current patch? Lets just start and iterate on the llvm patch first.

-Chris

Hi Chris,

Sorry for dropping this. Can you resend your current patch? Lets just
start and iterate on the llvm patch first.

Please find the LLVM patch attached.

Many thanks,
Anton.

llvm00001.patch (11.6 KB)

Hi Anton,

This looks like a great start, but it needs a lot of testcases showing that these things constant fold, handle conversions correctly, and generally work. Also, there is no code generator support for these.

-Chris