I would favor calling conventions over metadata for the simple
reason that this maps more cleanly to the device model. Device and
kernel functions are represented differently in PTX, including
(sometimes) the way parameters are passed.
For the record, marking the kernels with "calling conventions"
instead of metadata is fine also for the pocl use case. It's enough
if there is a way to differentiate OpenCL C kernels from the "device
functions" for the reason I discussed in the previous email. That is,
in the pocl point of view we just need a way to pick the
"host-callable" kernel functions as they need the special treatment
before they can be called (like a C function).
Remember OpenCL kernels are also callable from inside another
kernels. It is not a big deal though, as calling conventions in LLVM
IR are just markers to the code generation, they do not have any
effect before that (AFAIK).
What it is needed is a way to differentiate at LLVM IR level between:
1) Normal functions
2) Functions callable from outside and inside (OpenCL kernels would fall
in this category).
3) Functions callable only from outside (I there is such case; I am
not so familiar with CUDA so I do not know if such functions exist on
At least 1 and 2 are needed for OpenCL. Whether this is calling
conventions, metadata, or attributes, do not make such a big
difference, in practical terms. Code generation can apply different
calling conventions based on metadata/attributes, and can also detect
the kernels based on calling conventions, so the options are
BTW what about the other OpenCL data like required_wg_size
affect the possible "kernel treatment" of pocl and can be converted
to some special instructions (I suppose) for the SIMT targets?
Currently only the TCE target in Clang adds metadata for the
required_wg_size kernel attribute (as we need it in "offline
compilation") but IMHO that could be useful in general, as a default
metadata (to enable its support in pocl for all targets, for
Ideally, we would need some standard way of representing this in
Clang. The back-end would then need to convert it to whatever form
the target OpenCL run-time expects.
This is an interesting point. And there might be more information
present on .cl files that needs to get transported into LLVM IR. While
there has been the argument around that OpenCL "is C" so clang should
not need to generate extra stuff for OpenCL input files, the fact is
that it is not plain C. Basically there are two ways to go on:
a) OpenCL is a C-based language (C plus additions) and clang can parse
it, so *all* the information on the .cl file has to be present in
b) OpenCL is just C, so clang does not need to care about extra things
and implementations should parse .cl files to get the extra
information, and potentially preprocess to transform the non-C
constructs into valid C code.
Just staying in between is good for nothing. An given clang has a CL
mode already (-x cl) recognizes the keywords and supports the non-C in
OpenCL (like vector swizzle), I think (b) can be discarded right away.
But then all the info should get in a generic way into the LLVM.
This is a question for cfe-dev.
So adding cfe-dev in copy.