In Arm we are considering/discussing changing the semantics of storage-only type
__fp16 and we are looking for feedback on this. The motivation is that in A-profile,
architecture extension FP16 natively supports half-precision arithmetic. It is
also supported by SVE, and in M-profile MVE optionally supports it.
The problem is that float16_t is defined in the Arm C-Language Extensions
(ACLE) specification  as an alias for __fp16. Thus, using the float16_t /
__fp16 storage-only type which performs arithmetic in single-precision, we are
not taking advantage of the native half-precision FP16 instructions.
One obvious solution is to change the float16_t typedef in the ACLE from this:
typedef __fp16 float16_t;
to use _Float16 instead of __fp16, where _Float16 is the type with
half-precision arithmetic semantics. An alternative is to change the semantics
of __fp16, and both approaches have their pros and cons:
Changing the semantics of __fp16 (approach A):
– There is no ABI break.
– Code that uses __fp16 also benefits from the more optimal implementation.
– No type would retain the old __fp16 semantics.
– We’d need to change the compiler frontends (both Clang and GCC).
– Existing code could rely on current __fp16 behaviour.
Keeping the semantics of __fp16 (approach B):
– People who want the old behaviour can use __fp16 directly.
– We only need to change a typedef in a header file.
– Changing float16_t requires an ABI break.
– Code that directly uses __fp16 would not benefit from the new float16_t optimisation.
Deciding for one of these approaches is difficult as people may get
happy/unhappy either way and it is difficult to quantify this, which is why we
welcome any feedback on this from e.g. users of __fp16. If for example the
opinion is that breaking the ABI is a last resort, then that would point into the
direction of Approach A and changing the semantics of __fp16.