We would like to propose including support for the OpenCL as_typen built-ins to Clang. If the direction presented in this email is accepted, we are ready to submit a patch with tests.
The as_typen built-ins are described in section 184.108.40.206 of the OpenCL 1.1 spec. Implementing them requires several steps:
Defining the built-ins in Builtins.def to make Clang aware of them. Since these built-ins are overloaded, they will be declared returning “void” and the actual return type will be patched by Sema when the type of the actual argument is known (similarly to what’s currently done for __sync_and_and_fetch and its kin).
A new handler in Sema will check that the invocation of the built-in is valid and will patch its return type
Section 220.127.116.11 states that “The usual type promotion for function arguments shall not be performed” for the as_typen built-ins. Support in Sema will be added to not perform such promotions in this case.
A new function in CodeGen/CGBuiltin.cpp will be added to generate a LLVM bitcast from the built-in call, which is its semantics
We have a doubt regarding the best implementation method for (1). There are almost 60 as_typen built-ins [6 vector sizes (1 – scalar, 2, 3, 4, 8, 16) times 9 types (char, uchar, short, ushort, int, uint, long, ulong, float), plus some scalar types not valid as vectors]. Two approaches come to mind:
Add all these built-ins to Builtins.def (__builtin_as_float4, __builtin_as_int8, etc.). Then, in Sema (for checking and overload return type patching) and CodeGen, use a lookup table keyed by Builtin::ID enumerations to fetch the type information and size for each built-in.
Add a single __builtin_as_typen built-in to Builtins.def, accepting 3 arguments: numeric encoding of type, vector size and the original argument. Use macros in the OpenCL-specific hidden include file (that’s implicitly included into OpenCL kernels) to map specific built-ins like __builtin_as_float4 to invocations of the generic built-in (__builtin_as_typen(3, 4, X) for 3 meaning “float”, for example). In Sema and CodeGen use these passed arguments instead of a lookup table.
We prefer the first approach since it’s cleaner, but it does “pollute” Builtins.def with a large number of OpenCL-only built-ins, which may be seen as a negative. Any additional suggestions will be more than welcome.
On a related note of Builtins.def “pollution”, Basic/Builtins.h&cpp have infrastructure for language-specific built-ins (LanguageID enumeration). However, this is currently only exposed to the LIBBUILTIN macro in Builtins.def, and used for some Objective-C specific library built-ins (OBJC_LANG). Therefore, it comes to mind that a related macro can be defined, named for example LANGBUILTIN, for allowing to add built-ins for specific languages, without specifying a header file (and to keep all code using current BUILTIN untouched). This can then be used to define OpenCL-specific built-ins in a clean manner, allowing uniform checking that some built-in is invoked from a language not supporting it. Perhaps in this case OpenCL built-ins can be split to a separate .def file (similarly to what’s currently done with target-specific built-ins like BuiltinsX86.def) and thus not overwhelm Builtins.def with language-specific definitions.
What are your thoughts on these issues?
Thanks in advance,
SSG – MGP OpenCL Development Center