RFC: Do we still need @llvm.convert.to.fp16 and the reverse?

Hi all,

What do people think of doing away with the @llvm.convert.to.fp16 and
@llvm.convert.from.fp16 intrinsics, in favour of using "half" and
fpext/fptrunc? [1]

It looks like those intrinsics originally date from before "half"
actually existed in LLVM, and of course the backends have grown up
assuming that's what Clang will produce, so we'd have to improve their
support first. But the benefit would be a more uniform interface to
this type, both for front-ends and backends.

I've been poking around a bit, and as far as I can see the following
accommodations would need to be made in CodeGen:

1. Generic support to Promote f16 operations narrowed by InstCombine
back to f32.

And in Targets:

1. If there's *no* native f16 support, they can probably remain unchanged.
2. Targets with scalar f16 conversion would need it to become a legal
type (in some register class). This seems like a reasonable
requirement if there actually are instructions acting on it. But it
adds the usual overheads of supporting bitcasts and loads/stores.
3. Targets with vector f16 conversion would similarly need to legalize
the type (and in this case would probably need argument-passing
support too, since Clang forbids passing a direct __fp16 but not a
vector).

I think it would be a worthwhile change, but want to make sure backend
owners with an interest in f16 don't object to the direction. That'd
seem to be: AArch64, ARM, NVPTX, R600 and X86.

Cheers.

Tim.

[1] I think this is an orthogonal issue to whether __fp16 is a
storage-only type or actually gets native arithmetic operations: we
can still have Clang promote any value to float before actually doing
anything.

Hi all,

What do people think of doing away with the @llvm.convert.to.fp16 and
@llvm.convert.from.fp16 intrinsics, in favour of using "half" and
fpext/fptrunc? [1]

I am in favor of using fpext/fptrunc instead of the intrinsics.

It looks like those intrinsics originally date from before "half"
actually existed in LLVM, and of course the backends have grown up
assuming that's what Clang will produce, so we'd have to improve their
support first. But the benefit would be a more uniform interface to
this type, both for front-ends and backends.

I've been poking around a bit, and as far as I can see the following
accommodations would need to be made in CodeGen:

1. Generic support to Promote f16 operations narrowed by InstCombine
back to f32.

Are there any in-tree targets that natively support fp16 operations?

-Tom

> Hi all,
>
> What do people think of doing away with the @llvm.convert.to.fp16 and
> @llvm.convert.from.fp16 intrinsics, in favour of using "half" and
> fpext/fptrunc? [1]
>

I am in favor of using fpext/fptrunc instead of the intrinsics.

I don't see any issues with NVPTX here. Since we have a 'half' type in
LLVM IR, it makes sense to go ahead and use it instead of having a half
value be either 'i16' or 'half' depending on what you're trying to do
with it.

Historically the only reason for these intrinsics was the lack of
native support for f16 on any target. On ARM f16 was storage only, so
stuff was promoted to / from float at every operation.

Looks like we need to be very careful here and not to change
storage-only semantics. Besides this I'm ok with dropping them.

If that's similar to named registers (@llvm.read_register and
@llvm.write_register), than staying as intrinsics is a good choice for
that very reason.

--renato

Hi all,

What do people think of doing away with the @llvm.convert.to.fp16 and
@llvm.convert.from.fp16 intrinsics, in favour of using “half” and
fpext/fptrunc? [1]

I am in favor of using fpext/fptrunc instead of the intrinsics.

The problem with this is the half type assumes you have half registers. I came up with some ugly hack that involved forcing custom lowering to make half loads/stores work for R600, but it was much easier to handle the intrinsic. The 16-bit integer type extload to the 32-bit register with the intrinsic for the half conversion just worked (see r212676)

> I am in favor of using fpext/fptrunc instead of the intrinsics.

The problem with this is the half type assumes you have half registers. I
came up with some ugly hack that involved forcing custom lowering to make
half loads/stores work for R600, but it was much easier to handle the
intrinsic.

Is that the code in performStoreCombine? It looks unrelated, so
possibly not, but it was the closest I could find in R600.

Anyway, I can see that there could well be targets that have
conversion operations available but don't want to mark f16 as a legal
type for whatever reason. I think using i16 globally is a hack to
achieve that, though.

I think the best approach would be to let generic softening code
handle it during type legalization (after some poking around, I think
part of your load/store problem was because f16 isn't marked for
promotion or softening in computeRegisterProperties even when there's
no register class around).

As a half-way house, the fptrunc/fpext operations could be softened to
the already existing FP16_FROM_FP32/FP16_TO_FP32 ISD nodes, and then
lowered to libcalls if even those operations aren't available.

The main difficulty is going to be preventing non-storage operations
from creeping in. InstCombine is already capable of forming them from
fptrunc/fadd/fpext and so on. That suggests we might want the default
handling to be something between Promote and Soften; or to disable
that InstCombine.

Cheers.

Tim.

I am in favor of using fpext/fptrunc instead of the intrinsics.

The problem with this is the half type assumes you have half registers. I
came up with some ugly hack that involved forcing custom lowering to make
half loads/stores work for R600, but it was much easier to handle the
intrinsic.

Is that the code in performStoreCombine? It looks unrelated, so
possibly not, but it was the closest I could find in R600.

No, I never committed that