Math Library Intrinsics as native intrinsics

There seems to be some math library functions that are already built into llvm as intrinsic(pow, exp, etc…) but there are lots that are not built in yet. Is there currently work going on that is implementing these? I do not want to duplicate work so I want to see what is out there.

The math functions that I will be adding in are from the following spec, section 6.

http://www.khronos.org/registry/cl/

Hi Micah,

There seems to be some math library functions that are already built
into llvm as intrinsic(pow, exp, etc...) but there are lots that are not
built in yet. Is there currently work going on that is implementing
these? I do not want to duplicate work so I want to see what is out
there.

another approach is to get rid of the llvm intrinsics, because they
don't buy you anything that you can't get with function attributes
and a list of libcalls.

The math functions that I will be adding in are from the following spec,
section 6.

Khronos OpenCL Registry - The Khronos Group Inc

Ciao,

Duncan.

FWIW, the reason pow, exp, and others were added as intrinsics was to
allow them to be overloaded with vector types, and to allow them
to be legalized (split, scalarized, etc.).

Dan

Dan,
I have a large list of functions(60+) that I want to be legalized. I
have currently been adding them in the same manner as pow/exp etc...
These functions come in both scalar and vector versions of up to 16
elements as the 1.0 spec requires. Is this something that I could
Merge back into the tree or is another approach required?
Some of the thoughts we were having as not to clutter the llvm namespace
was to add a math namespace and the intrinsic would go there instead.
i.e. llvm.math.fpow, llvm.math.fpowi, llvm.math.fpowr, etc...

Any ideas?

Thanks,
Micah

There's at least one other LLVM user which would find these
useful, and probably more, so it may be appropriate to merge
this into the main tree. I'm interested to hear if anyone
else has an opinion here.

An llvm.math namespace seems like a good idea. Instead of
using "fpow" though, I'd prefer to just use names like
"pow". For consistency, the ISD namespace operators could
be renamed to MATH_POW and similar.

The text in LangRef.html that describes the semantics of
llvm.pow needs improvement. Here's an attempt at an
improved description of error handling for LLVM intrinsic
math functions:

The @llvm.math intrinsics use floating-point exceptions
according to IEEE rules for the corresponding math functions.
LLVM IR does not currently define the initial state of the
floating-point status or control flags, or an interface for
querying or setting them. The value of errno after a call to
an @llvm.math intrinsic is undefined.

What do you think?

Dan

There's at least one other LLVM user which would find these
useful, and probably more, so it may be appropriate to merge
this into the main tree. I'm interested to hear if anyone
else has an opinion here.

I'd rather not see them in the main tree, since there's no real explanation of what the benefits would be vs. the current model of treating libm calls as actual function calls that pass and return arguments of a known type.

The text in LangRef.html that describes the semantics of
llvm.pow needs improvement. Here's an attempt at an
improved description of error handling for LLVM intrinsic
math functions:

The @llvm.math intrinsics use floating-point exceptions
according to IEEE rules for the corresponding math functions.
LLVM IR does not currently define the initial state of the
floating-point status or control flags, or an interface for
querying or setting them. The value of errno after a call to
an @llvm.math intrinsic is undefined.

What is gained by this vs. having a target compile a C libm with LLVM using target builtins for the handful of things it actually supports? I don't see any explanation of the actual problems people are trying to solve here, just "we'd like to make this change".

Nate

Fair enough,
The current issue that I am having with my backend and the language I
have to support via LLVM is that:
1) I need to support a large number of math functions as language
built-in and not as a separate library. These functions are part of the
core language and thus must be supported on a wide-variety of data types
with very specific rules and definitions for each function, which in
some cases differ to the definition that llvm gives to the same function
name. There are 165 math/integer/relational/geometric specific functions
in section 6.11 of the OpenCL spec, Khronos OpenCL Registry - The Khronos Group Inc
when counting for signed/unsigned/floating point variants for some
functions.
2) AMD needs to support these on both GPU and CPU backends so pushing
them to a uniform section is highly desired so we don't have to
duplicate work. Some of these functions are native instructions on the
GPU in either scalar or vector formats but not on the CPU, or vice
versa.
3) The OpenCL language requires scalar and vector versions up to 16
elements for 8/16/32 bit data types and 8 elements for 64bit data types.
Implementing all of these combinations is an immense amount of work and
this is greatly simplified by utilizing the Legalize/Combine
infrastructure already in place to reduce all the vector types to the
scalar versions.
4) GPU's do not have real support for loading of libraries, so expanding
to a library function approach would not be feasible and this approach
looses the flexibility of the Legalize/Combine infrastructure which as
mentioned earlier is highly desired.

Some of the benefits of doing this would be that LLVM would then have
the beginnings of a large built-in reference math library based on, but
not limited to, the OpenCL 1.0 spec. This would allow AMD and possibly
other vendors to utilize this work on various backends without having to
duplicate work. This is work that I am doing internally at AMD anyways,
so for LLVM it will hopefully require minimal work.

Some of the drawbacks is the large amount of instructions that will be
added might require refactoring parts of the codebase and a large amount
of initial changes to update all the code. Also, there are different
definitions for certain functions compared to the current intrinsic,
where round being one, max/fmax being another, that might cause initial
instruction duplication.

Hope this helps clear up the problem I am approaching. This solution
does not remove the ability of using a math library as the functions can
always be expanded to a function call, but allows usage of LLVM
infrastructure with the math library more easily.

Micah

Fair enough,
The current issue that I am having with my backend and the language I
have to support via LLVM is that:
1) I need to support a large number of math functions as language
built-in and not as a separate library. These functions are part of the
core language and thus must be supported on a wide-variety of data types
with very specific rules and definitions for each function, which in
some cases differ to the definition that llvm gives to the same function
name. There are 165 math/integer/relational/geometric specific functions
in section 6.11 of the OpenCL spec, Khronos OpenCL Registry - The Khronos Group Inc
when counting for signed/unsigned/floating point variants for some
functions.

Having worked on an OpenCL implementation myself, there is no requirement for these functions to be part of LLVM in order for you to call them. The same argument could be logically extrapolated to any library in any language; someone could argue they needed to be part of LLVM. That argument doesn't hold any water.

2) AMD needs to support these on both GPU and CPU backends so pushing
them to a uniform section is highly desired so we don't have to
duplicate work. Some of these functions are native instructions on the
GPU in either scalar or vector formats but not on the CPU, or vice
versa.

I don't understand how you plan on avoiding work here; If they're in the compiler, the compiler is going to have to know how to generate code for the various Libm routines that the GPU and CPU don't natively implement, which is most of them, so you're just pushing some particular implementation of libm into the compiler. I don't see how this is beneficial, or saving you work, since the implementation will not be the same.

3) The OpenCL language requires scalar and vector versions up to 16
elements for 8/16/32 bit data types and 8 elements for 64bit data types.
Implementing all of these combinations is an immense amount of work and
this is greatly simplified by utilizing the Legalize/Combine
infrastructure already in place to reduce all the vector types to the
scalar versions.

This is only relevant if you believe that every OpenCL function has to be represented by a first-class intrinsic node in the LLVM IR. I see no evidence that this is the case; Indeed, since different platforms have different requirements for libm functions with respect to rounding and errno, I don't see why the OpenCL set should get special treatment and be enshrined into LLVM IR proper, and seems at odds with the IR's current design goal of having a relatively small number of simple instructions.

4) GPU's do not have real support for loading of libraries, so expanding
to a library function approach would not be feasible and this approach
looses the flexibility of the Legalize/Combine infrastructure which as
mentioned earlier is highly desired.

Whether you can actually dynamically load a code segment is not relevant to the usage of llvm bitcode files as libraries. SimplifyLibCalls can already hack on "known" functions, so that's covered. As for Legalize, it seems questionable to me that Legalize should contain all the code necessary to produce a fully legal libm implementation for every target over a variety of vector widths, considering that "vector libm" isn't even something that exists outside of OpenCL.

Some of the benefits of doing this would be that LLVM would then have
the beginnings of a large built-in reference math library based on, but
not limited to, the OpenCL 1.0 spec. This would allow AMD and possibly
other vendors to utilize this work on various backends without having to
duplicate work. This is work that I am doing internally at AMD anyways,
so for LLVM it will hopefully require minimal work.

It would seem to me that if someone was interested in delivering a portable libm, they could do some through an LLVM IR bitcode file, rather than building the implementation into the compiler itself. This would also be platform agnostic, and a heck of a lot easier to maintain than many thousands of lines of C++ which generates that bitcode file.

Hope this helps clear up the problem I am approaching. This solution
does not remove the ability of using a math library as the functions can
always be expanded to a function call, but allows usage of LLVM
infrastructure with the math library more easily.

I don't see what value you're adding here aside from essentially hard coding a particular libm implementation into the CodeGenerator, not even LLVM proper. For targets with optimized libms, expanding to a function call is almost always the right idea, when it isn't SimplifyLibCalls can pick up the slack, and if your platform doesn't have an optimized libm, a portable one in IR or C that optimize as you have time seems like a far more sane approach.

Nate

I apologize for bringing up an old thread but I couldn’t find anything more recent on this topic.

My target machine has hardware instructions for ldexp, frexp, atan2, asin, acos, atan, rsqrt.
I want some suggestions on how to generate them.
I suppose the options (e.g. for atan2) are:

  1. Match them in TargetLowering::LowerCall().
    How does one insure that the name “atan2” was from math.h and not just some arbitrary name?
  2. Have the front end (clang) generate an intrinsic.
    a. Target specific - is there a target that has done this for math calls?
    b. New generic llvm.atan2.. Why are some libm routines implemented with
    intrinsics, e.g., llvm.sin.
    , llvm.cos.* and some are not?
    c. Are these intrinsics dependent on __builtin_atan2?

Ideas please. Thank you.
brian