ARMv7 float->int rounding

Hi everyone,
after years of 'trying&waiting' I still cannot get Clang to generate efficient code for float->int conversion on armv7 with either explicit round-to-nearest or 'current'/'ambient' rounding mode:

std::int32_t round( float const floatingPointValue )
{
#if A
   return __builtin_lrintf( floatingPointValue );
#elif B
   std::int32_t integerValue;
   __asm__
   (
     "vcvtr.s32.f32 %0, %1" : "=w"( integerValue ) : "w"( floatingPointValue );
   );
   return integerValue;
#elif C
   return __builtin_arm_vcvtr_f( floatingPointValue, 0 );
#else // fallback
   return floatingPointValue + __builtin_copysignf( 0.5f, floatingPointValue );
#endif
}

A) 'fails'/is no good because https://llvm.org/bugs/show_bug.cgi?id=11544 ("Trivial math builtins not inlined") is still alive

B) crashes with an assertion ('why on earth' is clang distributed with assertions turned on?):
"error: couldn't allocate output register for constraint 'w'"

C) crashes with:
"fatal error: error in backend: Cannot select: intrinsic %llvm.arm.vcvtr"

...tested with Clang 3.6 from Android NDK r10e (latest) and Apple Clang from Xcode 7.2.1 (latest)...

Is there anything I can do to make Clang emit the vcvtr instruction?

ps. I stumbled on __builtin_arm_vcvtr_f by pure chance (it isn't documented anywhere, especially not its second parameter)...

std::int32_t round( float const floatingPointValue )
{
#if A
  return __builtin_lrintf( floatingPointValue );

lrint is not optimised on any platforms AFAICT, x86 has a small note
about it.

#elif B
  std::int32_t integerValue;
  __asm__
  (
    "vcvtr.s32.f32 %0, %1" : "=w"( integerValue ) : "w"( floatingPointValue );
  );
  return integerValue;

This doesn't work because the types are wrong. You want to do something
like
      "vcvtr.s32.f32 %1, %1; vmov %0, %1"
or so with =r for the integerValue. Not sure why this is not caught in
the frontend already.

#elif C
  return __builtin_arm_vcvtr_f( floatingPointValue, 0 );

WFM

Joerg