ARM vectorized fp16 support

Hi,

I'm trying to compile half precision program for ARM, while it seems
LLVM fails to automatically generate fused-multiply-add instructions
for c += a * b. I'm wondering whether I did something wrong, if not,
is it a missing feature that will be supported later? (I know there're
fp16 FMLA intrinsics though)

Test programs and outputs,

$ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp32.c
test_vfma_lane_f16: // @test_vfma_lane_f16
                fmla v2.4s, v1.4s, v0.4s // fp32 is GOOD
                mov v0.16b, v2.16b
                ret
$ cat vfp32.c
#include <arm_neon.h>
float32x4_t test_vfma_lane_f16(float32x4_t a, float32x4_t b, float32x4_t c) {
  c += a * b;
  return c;
}

$ clang -O3 -march=armv8.2-a+fp16fml -ffast-math -S -o- vfp16.c
test_vfma_lane_f16: // @test_vfma_lane_f16
                fmul v0.4h, v1.4h, v0.4h
                fadd v0.4h, v0.4h, v2.4h // fp16 does NOT use FMLA
                ret
$ cat vfp16.c
#include <arm_neon.h>
float16x4_t test_vfma_lane_f16(float16x4_t a, float16x4_t b, float16x4_t c) {
  c += a * b;
  return c;
}

Hi,
Which version of Clang are you using? I do get a “vfma.f16” with a recent trunk build. I haven’t looked at older versions and when this landed, but we had an effort to plug the remaining fp16 holes not that long ago, so again hopefully a newer version will just work for you.

Cheers,
Sjoerd.

Thanks for reply. I was using LLVM 8.0. Let me try trunk and will let
you know if it works.

Hello again!
I got confused by the “compile half precision program for ARM” and was assuming --target=arm because it wasn’t in your compile commands but you’re targeting AArch64! Sorry about that, and I didn’t look careful enough at your assembly… Anyway, it looks like you’re right and we’re missing an opportunity here!

Usually this is a simple missing pattern. I am not promising anything, but I will see if I can do this on the side.

Feel free to open a bug report.

Cheers,
Sjoerd.

I posted a patch for review: https://reviews.llvm.org/D67297