Example for usage of LLVM/Clang/libclc

Hi,

My goal is to use Clang/LLVM/libclc to compile an OpenCL kernel and eventually generate a PTX code. I already did this but I am not sure if the PTX code I am generating is correct (is the one that is supposed to be generated).

For example, currently,

In OpenCL : get_global_id(0) translates to

In LLVM : %call = tail call i32 @get_global_id(i32 0) which translates to

In PTX:

// .globl blur2d
.func (.param .b32 func_retval0) get_global_id
(
.param .b32 get_global_id_param_0
)
;

mov.u32 %r2, 0;
.param .b32 param0;
st.param.b32 [param0+0], %r2;
.param .b32 retval0;
call.uni (retval0),
get_global_id,
(
param0
);

Is this what is supposed to happen ? or there is something wrong ? I am saying this because the get_global_id implementation does not make much sense to me and I am not sure if it used the libclc definitions at all ?

If it is not, any idea how the correct conversion will look like ?

Thanks,

Just as an update, I figured out that Clang was not properly linked to the libclc library and I used a slightly modified version of the script in libclc/compile-test.sh,

" clang -target nvptx-unknown-nvcl -S -emit-llvm -O4 -Iptx-nvidiacl/include -Igeneric/include -include clc/clc.h -Xclang -mlink-bitcode-file -Xclang nvptx–nvidiacl/lib/builtins.opt.bc -Dcl_clang_storage_class_specifiers -Dcl_khr_fp64 “$@” "

which works but it produces LLVM IR code for all OpenCL intrinsics implemented by libclc along with the kernel I am interested in, is their a possibility to avoid this ? and only produce the llvm code for the kernel required ?

Hi,

which works but it produces LLVM IR code for all OpenCL intrinsics
implemented by libclc along with the kernel I am interested in, is their a
possibility to avoid this ? and only produce the llvm code for the kernel
required ?

Mark all functions apart from the kernel entry points with the
internal attribute and then run global dead code elimination (it
should remove most of the unused functions).

You can use the opt tool to do this.

e.g. if you had kernel entry points foo and bar you could run the following

$ opt -internalize-public-api-list=foo,bar -globaldce your_program.bc

transformed_program.bc

Hope that helps.

So, another "department for ignorance and stupid comments" comment:

Would it not just be a matter of removing
-Xclang -mlink-bitcode-file -Xclang nvptx--nvidiacl/lib/builtins.opt.bc
from your command? The seems to be the part that links in builtin functions..

Thanks a lot for both your answers:

@mats:

Would it not just be a matter of removing
-Xclang -mlink-bitcode-file -Xclang nvptx–nvidiacl/lib/builtins.opt.bc

from your command? The seems to be the part that links in builtin functions…

I can not do this as in the code I am using some of these builtin functions but not “all” of them. Currently what happens is that all of them get translated to LLVM-IR, even those I don’t use. I can manually include only those I am interested in for a specific kernel, but that will not be neat solution.

@Dan:

Mark all functions apart from the kernel entry points with the internal attribute and then run global dead code elimination (it should remove most of the unused functions).
You can use the opt tool to do this.
e.g. if you had kernel entry points foo and bar you could run the following
$ opt -internalize-public-api-list= foo,bar -globaldce your_program.bc

transformed_program.bc

That would probably help, but I was aiming for a solution that doesn’t involve knowing the name of kernel entry points, but I can definitely write a small script that would figure out the kernel entry points for me.

So the problem is really how the builtins.opt.bc is produced - it is
one large file with all the functions, and the compiler is really just
doing what you tell it to. It would work better if the process did
what the normal linking of normal libraries does, which is to resolve
only those symbols required from the library. But I don't believe that
is available here, because bitcode files don't contain the relevant
information to "split it apart" (or the "linker" part doesn't know how
to use such information).

Hi Ahmed,

@Dan:

Mark all functions apart from the kernel entry points with the internal
attribute and then run global dead code elimination (it should remove most
of the unused functions).
You can use the opt tool to do this.
e.g. if you had kernel entry points foo and bar you could run the
following
$ opt -internalize-public-api-list= foo,bar -globaldce your_program.bc
transformed_program.bc

That would probably help, but I was aiming for a solution that doesn't
involve knowing the name of kernel entry points, but I can definitely write
a small script that would figure out the kernel entry points for me.

The kernel entry points are encoded in the LLVM IR as metadata when
clang generates the module. If you take a look you'll see there's a
named metadata entry called "opencl.kernels". This contains a list of
numbered metadata where each numbered metadata is kernel entry point,
e.g.

!opencl.kernels = !{!21}
...
!21 = metadata !{void (i32 addrspace(1)*)* @foo}

Using this it should be fairly straight forward to write your own C++
application that uses the llvm libraries to do the following

1. To load the bitcode into memory
2. Find the kernel entry points by reading the metadata
3. Run the internalise pass specifiying the kernel entry points
4. Run the globaldce pass to remove unused functions
5. Output the resulting bitcode.

Here's example code that tries to find kernel entry points [1].
Writing a simple tool that builds against the LLVM libraries is fairly
straight forward and I have an example here [2]. Note the example code
written is here for LLVM3.5 and I don't know about LLVM 3.6
compatibility. If you go down this route I advise you also have a read
of [3] to help you understand how to use CMake to build against LLVM
easily.

[1] https://github.com/mc-imperial/bugle/blob/master/lib/Translator/TranslateModule.cpp#L552
[2] https://github.com/delcypher/srg-llvm-pass-tutorial
[3] http://llvm.org/docs/CMake.html#embedding-llvm-in-your-project

Hope that helps,

Dan.