I was looking at the code for atomics that was committed in the last two days and I'm wondering about two things. First, is there a reason that atomic_sub is not introduced in the r600 code (atomic_add is there)? Second, why is atomic_add only defined as the generic @__clc_atomic_add_addr… in the r600 code (since @__clc_atomic_add_addr… occurs in the generic code I expect the definition of atomic_add to be there too)?
I actually just committed atomic_sub and atomic_dec support yesterday
The libclc support was waiting to be added in libclc due to missing
support for the relevant instructions in the llvm R600 back-end. I
finally committed that support to llvm yesterday, and then immediately
after, I pushed the libclc support.
With regards to the _addr* naming. The R600 back-end uses address
space 1 as global, 2 is constant, 3 is local, and private is either 0
or 4 (I obviously haven't used that one much).
By defining the generic functions in terms of address space 1/2/3/4,
we just have to write the assembly functions once, and then we just
have to map which named address space is which ID on various hardware
back-ends. Hence the split for the implementations in generic/ and
the mappings defined in r600/. Theoretically, if we wanted to then
add Nvidia/Intel/X86 support for a given function, we'd just have to
map the correct implementation to the numbered address space needed.
Hope that helps,
Thanks for your answer; it helps. I was wondering what was going on, because having just prototype of atomic_add in the generic part without an implementation was causing some problems for us.
This makes me wonder: Do you happen to know if there are any other prototypes in the generic part that do not have an implementation in the generic part?
I believe that there are a few more. Usually it's things that
explicitly require hardware knowledge/support that can't be
implemented in generic terms.
This will probably include anything that depends on intrinsics for a
specific hardware back-end or which deal explicitly with address
spaces. I believe that we've taken into account the fact that we also
have to handle differing pointer sizes (the Radeon SI hardware uses
different pointer sizes for different address spaces I believe.. or
maybe it's just that R600 and SI have different pointer sizes). Tom
can probably clarify anything here that I've gotten wrong, since he's
much more knowledgeable about the hardware specifics than I.
Anyway, functions without a generic implementation which have
- - get_global/local_* (depends on hardware intrinsics for at least R600)
- write_mem_fence* (not yet implemented at all, but it will probably be similar)
- all of the atomic_* functions (which work with either global or
local address spaces)
- Probably others in the future