Half Float fp16 Native Support

hi all,

i am trying to implement native support for fp16 in llvm-3.1

i have already used the opencl patch for clang so the IR that is generated
is correct.

i tried to add some code so the the fp16 type is handled correctly but no
luck.

We have a target that has native fp16 units and tried to run a simple
program

int main ()
{

__fp16 a,b,c,d;

        a= 1.1;
        b=2.2;
        c=3.3;
        d = a + b + c;

        return 0;
}

and when i try to call llc produces this error

LLVM ERROR: Cannot select: 0x234bab0: f16 = fadd 0x234b8b0, 0x234c2b0
[ORD=9] [ID=29]
  0x234b8b0: f16 = fadd 0x2349970, 0x2349a70 [ORD=7] [ID=28]
    0x2349970: f16,ch = load 0x234db10, 0x2349170, 0x2348e70<LD2[%a]>
[ORD=5] [ID=26]
      0x2349170: i32 = FrameIndex<1> [ORD=2] [ID=4]
      0x2348e70: i32 = undef [ORD=1] [ID=3]
    0x2349a70: f16,ch = load 0x234db10, 0x2349470, 0x2348e70<LD2[%b]>
[ORD=6] [ID=25]
      0x2349470: i32 = FrameIndex<2> [ORD=3] [ID=5]
      0x2348e70: i32 = undef [ORD=1] [ID=3]
  0x234c2b0: f16,ch = load 0x23207a0, 0x234dd10,
0x2348e70<LD2[ConstantPool]> [ID=24]
    0x234dd10: i32 = add 0x2349370, 0x234dc10 [ID=22]
      0x2349370: i32,ch = load 0x23207a0, 0x234c3b0,
0x2348e70<LD4[ConstantPool]> [ID=20]
        0x234c3b0: i32 = NemaCoreISD::Wrapper 0x2349870, 0x2349670 [ID=17]
          0x2349870: i32 = Register %GP [ID=14]
          0x2349670: i32 = TargetConstantPool<half 0x400A680000000000> 0
[TF=2] [ID=13]
        0x2348e70: i32 = undef [ORD=1] [ID=3]
      0x234dc10: i32 = NemaCoreISD::Lo 0x2349570 [ID=18]
        0x2349570: i32 = TargetConstantPool<half 0x400A680000000000> 0
[TF=6] [ID=15]
    0x2348e70: i32 = undef [ORD=1] [ID=3]

So my question is

As we are working on half float fp16 support in LLVM are there any plans to
support it
on the main trunk ?

thanks

Nikos Stavropoulos

Hi Nikos

and when i try to call llc produces this error

LLVM ERROR: Cannot select: 0x234bab0: f16 = fadd 0x234b8b0, 0x234c2b0
[ORD=9] [ID=29]

This error suggests things are working on the generic LLVM side (as
I'd expect). It's what I'd expect to see for your code snippet if
there wasn't a target-specific pattern that could handle the addition
properly and select it to a valid instruction. What patterns do you
have for f16 addition so far?

If there's really nothing obviously wrong with them, the next step is
probably to delve into what the DAG matcher is doing behind the
scenes. If you give llc the option "-debug" it should tell you what
patterns it's tried to match against the fadd and where they failed.
If you cross-reference this with the table in
build/lib/Target/XXX/XXXGenDAGISel.inc you should be able to work out
where things are going wrong. (There are comments giving what each
original pattern was *below* the check that'll fail in each case).

As we are working on half float fp16 support in LLVM are there any plans to
support it on the main trunk ?

As I sort of implied, it's mostly down to the targets implementing
support now. The generic LLVM code doesn't need to do much with it. I
suppose there could be parts of the DAG combiner that assume
float/double or calls out to runtime library support functions that
aren't implemented, but that's not what you're hitting here. In fact
I'd expect most of the generic code to simply not care whether the
float it was considering is 16/32/64-bits wide and Just Work for
16-bit ones.

Hope this helps.

Tim.

after a long time i managed to make a progress with this problem. i can store
and load fp16 as i16 in to some registers and do an add instruction. the
problem now is that this messes up the real i16 (short, unsigned short).

i have
def FADD_H : NemaCorePseudo< (outs HGR16:$fd), (ins HGR16:$fs, HGR16:$ft),
"add.h\t$fd, $fs, $ft", [(set (i16 HGR16:$fd),(i16 (f32_to_f16 (f32 (fadd
(f32 (f16_to_f32 (i16 HGR16:$fs))),
(f32 (f16_to_f32 (i16 HGR16:$ft))))))))]>;

so i can have a half floating point add two half point variables and seems
to work fine.

def FADD_H : NemaCorePseudo< (outs HGR16:$fd), (ins HGR16:$fs, HGR16:$ft),
"add.h\t$fd, $fs, $ft", [(set (i16 HGR16:$fd),(i16 (f32_to_f16 (f32 (fadd
(f32 (f16_to_f32 (i16 HGR16:$fs))),
(f32 (f16_to_f32 (i16 HGR16:$ft))))))))]>;

so i can have a half floating point add two half point variables and seems
to work fine.

This does not look right. Note that you're matching f16_to_f32
intrinsics and friends. They are used for storage-only half FP stuff
and you're trying to match them instead of native fp16.

So, in short - you need to generate IR with proper fp16 arithmetics,
not via storage-only wrappers.

i understand that is not right but this was the only way not to use the fadd
for f32 "add.s" and use the "add.h" what ever i tried llvm moved everything
to the float registers and did add.s and not the half add.h

is there any trick to do that? i tried a lot but with no luck

i understand that is not right but this was the only way not to use the fadd
for f32 "add.s" and use the "add.h" what ever i tried llvm moved everything
to the float registers and did add.s and not the half add.h

It seems you do not understand the issue.

Half floating poing operations can be done in two ways:

1. Storage-only (fp16 is used to store value, all the operations are
performed on floats). For such f32 <-> f16 conversion the special
intrinsic is used (which is lowered to native instruction on ARM NEON
for example)
2. Native fp16.

Note that for both mode *frontend* is involved, because in case of 1.
it should generate appropriate conversion when necessary.

It seems that you have IR from 1. case, but you really want to do
stuff in mode 2. So, generate proper IR (with native fp16 operations,
not storage only stuff) and almost all your problems will go away.

you had right
i used opencl patch for clang to generate IR with half and after that
declared fp16 legal instructions and some defs and it worked.
the with a load half fp with immediate instruction.
:slight_smile:

p.s
then i did something i can not remember and it stopped :S