[PATCH] Implement mem_fence on ptx

PTX does not differentiate between read and write fences. Hence, these a
lowered to a mem_fence call. The mem_fence function compiles to the
“member.cta” instruction, which commits all outstanding reads and writes
of a thread such that these become visible to all other threads in the same
CTA (i.e., work-group). The instruction does not differentiate between
global and local memory. Hence, the flags parameter is ignored.

Index: ptx-nvidiacl/lib/SOURCES

Updated version of the patch, which does not emit a fence if neither
CLK_GLOBAL_MEM_FENCE nor CLK_LOCAL_MEM_FENCE is
passed via the flags parameter.

Index: ptx-nvidiacl/lib/SOURCES

Updated version of the patch, which does not emit a fence if neither
CLK_GLOBAL_MEM_FENCE nor CLK_LOCAL_MEM_FENCE is
passed via the flags parameter.

Can you include the explanation/description from v1 in the commit
message?
Reviewed-by: Jan Vesely <jan.vesely@rutgers.edu>

Jan

Updated version of the patch, which does not emit a fence if neither
CLK_GLOBAL_MEM_FENCE nor CLK_LOCAL_MEM_FENCE is
passed via the flags parameter.

Can you include the explanation/description from v1 in the commit
message?

I will.

Jeroen