> This patch introduces two new intrinsics and therefore
> must be used in conjunction with the patches to the LLVM backend.
Hi,
It looks like your mail client has automatically wrapped long lines,
which makes the commit comment hard to read. You should use
git send-email to submit your patches.
> It fixes the
> behaviour of barrier(0) : previously no barrier was generated, this
> is fixed
I think the old behavior is correct. Does the specification say we need
to emit a barrier when the argument is 0?
I believe it does. The specs mentions that: "All work-items in a
work-group executing the kernel
on a processor must execute this function before any
are allowed to continue execution beyond the barrier.
This function must be encountered by all work-items in
a work-group executing the kernel."
without even considering the arguments.
> by making a call to the new intrinsic barrier.nofence(). The patch also
> changes the behaviour of barrier( CLK_LOCAL_MEM_FENCE |
> CLK_GLOBAL_MEM_FENCE
> ) : previously two barriers were generated, now just one with the new
> intrinsic barrier.localglobal().
>
I don't think this is the place to put this optimization. libclc
shouldn't know that llvm.AMDGPU.barrier.local and llvm.AMDGPU.barrier.global
are lowered to the same instruction. We should be doing this optimization
in the LLVM backend.
I also understand the specs in a way that the parameter only refers to
the address space that should be flushed (local/global/both/none). The
specs explicitly allow queuing mem fences (rw) to achieve this.
It's my understanding that issuing multiple barrier instructions is not
against the specs, if we choose to add memory fence semantics to barrier
intrinsic.
IMO having one barrier intrinsic, and implementing barrier as:
barrier(flags)
{
mem_fence(flags)
@llvm.r600.barrier()
}
is the most straightforward way. This is what my old patch
implemented.[0] Unfortunately I never found time to address Matt's
comments.
jan
[0] http://www.pcc.me.uk/pipermail/libclc-dev/2014-April/000284.html