The current issue that I am having with my backend and the language I
have to support via LLVM is that:
1) I need to support a large number of math functions as language
built-in and not as a separate library. These functions are part of the
core language and thus must be supported on a wide-variety of data types
with very specific rules and definitions for each function, which in
some cases differ to the definition that llvm gives to the same function
name. There are 165 math/integer/relational/geometric specific functions
in section 6.11 of the OpenCL spec, http://www.khronos.org/registry/cl/
when counting for signed/unsigned/floating point variants for some
Having worked on an OpenCL implementation myself, there is no requirement for these functions to be part of LLVM in order for you to call them. The same argument could be logically extrapolated to any library in any language; someone could argue they needed to be part of LLVM. That argument doesn't hold any water.
2) AMD needs to support these on both GPU and CPU backends so pushing
them to a uniform section is highly desired so we don't have to
duplicate work. Some of these functions are native instructions on the
GPU in either scalar or vector formats but not on the CPU, or vice
I don't understand how you plan on avoiding work here; If they're in the compiler, the compiler is going to have to know how to generate code for the various Libm routines that the GPU and CPU don't natively implement, which is most of them, so you're just pushing some particular implementation of libm into the compiler. I don't see how this is beneficial, or saving you work, since the implementation will not be the same.
3) The OpenCL language requires scalar and vector versions up to 16
elements for 8/16/32 bit data types and 8 elements for 64bit data types.
Implementing all of these combinations is an immense amount of work and
this is greatly simplified by utilizing the Legalize/Combine
infrastructure already in place to reduce all the vector types to the
This is only relevant if you believe that every OpenCL function has to be represented by a first-class intrinsic node in the LLVM IR. I see no evidence that this is the case; Indeed, since different platforms have different requirements for libm functions with respect to rounding and errno, I don't see why the OpenCL set should get special treatment and be enshrined into LLVM IR proper, and seems at odds with the IR's current design goal of having a relatively small number of simple instructions.
4) GPU's do not have real support for loading of libraries, so expanding
to a library function approach would not be feasible and this approach
looses the flexibility of the Legalize/Combine infrastructure which as
mentioned earlier is highly desired.
Whether you can actually dynamically load a code segment is not relevant to the usage of llvm bitcode files as libraries. SimplifyLibCalls can already hack on "known" functions, so that's covered. As for Legalize, it seems questionable to me that Legalize should contain all the code necessary to produce a fully legal libm implementation for every target over a variety of vector widths, considering that "vector libm" isn't even something that exists outside of OpenCL.
Some of the benefits of doing this would be that LLVM would then have
the beginnings of a large built-in reference math library based on, but
not limited to, the OpenCL 1.0 spec. This would allow AMD and possibly
other vendors to utilize this work on various backends without having to
duplicate work. This is work that I am doing internally at AMD anyways,
so for LLVM it will hopefully require minimal work.
It would seem to me that if someone was interested in delivering a portable libm, they could do some through an LLVM IR bitcode file, rather than building the implementation into the compiler itself. This would also be platform agnostic, and a heck of a lot easier to maintain than many thousands of lines of C++ which generates that bitcode file.
Hope this helps clear up the problem I am approaching. This solution
does not remove the ability of using a math library as the functions can
always be expanded to a function call, but allows usage of LLVM
infrastructure with the math library more easily.
I don't see what value you're adding here aside from essentially hard coding a particular libm implementation into the CodeGenerator, not even LLVM proper. For targets with optimized libms, expanding to a function call is almost always the right idea, when it isn't SimplifyLibCalls can pick up the slack, and if your platform doesn't have an optimized libm, a portable one in IR or C that optimize as you have time seems like a far more sane approach.