Set excess_precision_type to FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 to
round after each operation could keep semantics right.
And I'll document the behavior difference between soft-fp and
AVX512FP16 instruction for exceptions.
I got some feedback from my colleague who's working on supporting
_Float16 for llvm.
The LLVM side wants to set FLT_EVAL_METHOD_PROMOTE_TO_FLOAT for
soft-fp so that codes can be more efficient.
_Float16 a, b, c, d;
d = a + b + c;
would be transformed to
float tmp, tmp1, a1, b1, c1;
a1 = (float) a;
b1 = (float) b;
c1 = (float) c;
tmp = a1 + b1;
tmp1 = tmp + c1;
d = (_Float16) tmp;
so there's only 1 truncation in the end.
if users want to round back after every operation. codes should be
explicitly written as
_Float16 a, b, c, d, e;
e = a + b;
d = e + c;
That's what Clang does, quote from 
_Float16 arithmetic will be performed using native half-precision
support when available on the target (e.g. on ARMv8.2a); otherwise it
will be performed at a higher precision (currently always float) and
then truncated down to _Float16. Note that C and C++ allow
intermediate floating-point operands of an expression to be computed
with greater precision than is expressible in their type, so Clang may
avoid intermediate truncations in certain cases; this may lead to
results that are inconsistent with native arithmetic.
and so does arm gcc
quote from arm.c
/* We can calculate either in 16-bit range and precision or
32-bit range and precision. Make that decision based on whether
we have native support for the ARMv8.2-A 16-bit floating-point
instructions or not. */