Hi,
Recently I am using clang and libclc to compile OpenCL source code to PTX. For every “barrier(CL_LOCAL_MEM_FENCE)” I wrote in OpenCL source, I got a function entry call instead of “.bar sync 0” as expected (shown as follows):
.func barrier
(
.reg .b32 barrier_param_0
)
;
// Callseq Start 0
{
.reg .b32 temp_param_reg;
// }
.reg .b32 param0;
mov.b32 param0, %r4;
call.uni
barrier,
(
param0
);
//{
}// Callseq End 0
I believe this is caused by incorrect implementation of barrier function in libclc. But then I found there is an implementation of barrier in “libclc/ptx-nvidiacl/lib/synchronizatoin/barrier.cl”. However, this file is not included in “libclc/ptx-nvidiacl/lib/SOURCE”. After I add a new line “synchronization/barrier.cl” to the SOURCE file, everything works just fine. Could you please take a look and make necessary changes? Thank you!
Best,
Jin Wang