Hi Jeff, Hal,
Thanks for your feedback. My comments are inline below.
All of the usage of OpenACC outside of benchmarks/research that I know about is done in Fortran.
I agree that it’s easier to find real apps that use OpenACC in Fortran than those that use OpenACC in C/C++. However, the latter certainly exist. For example:
Two of the three examples you cite are primarily Fortran and using OpenACC exclusively in Fortran subroutines.
Are you saying that the occurrences of “pragma acc” in Nek5000 and NekCEM are unused?
The instances of “pragma acc” in those - it’s the same code in both projects - are either (1) only causing host-device data synchronization or (2) commented-out.
It’s unclear to me what actually happens in the code as currently written. The OpenACC C/C++ code does not more than copy data to/from the device. I didn’t trace the entire code execution but I can’t tell if any code touches the device data that OpenACC is updating. If it is updated, it is updated by Fortran OpenACC code somewhere else in the source tree.
What does the OpenACC standard say about interoperability of compilers+runtimes, as would be required if one used Clang OpenACC for C/C++ and Fortran OpenACC implemented by PGI, Cray, or GCC. OpenMP definitely does not support this, even if a subset of usage may work when one uses the same runtime library with different compilers.
/tmp/Nek5000$ git grep “pragma acc”
jl/gs.c:#pragma acc update host(sendbuf[0:unit_sizebufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_sizebufSize/2]) if(acc)
jl/gs.c:#pragma acc update host(sendbuf[0:unit_sizebufSize/2]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_sizebufSize/2]) if(acc)
jl/gs.c:#pragma acc exit data delete(map0,map1)
jl/gs.c:#pragma acc update host(buf[0:unit_sizebufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_sizebufSize]) if(acc)
jl/gs.c:#pragma acc update host(buf[0:unit_sizebufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:unit_sizebufSize]) if(acc)
jl/gs.c://#pragma acc enter data copyin(stage[0].scatter_map[0:stage[0].s_size],stage[0].scatter_mapf[0:stage[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].scatter_map[i:stage[i].s_size],stage[i].scatter_mapf[i:stage[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage[i].gather_map[i:stage[i].g_size],stage[i].gather_mapf[i:stage[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[0].scatter_map[0:stage2[0].s_size],stage2[0].scatter_mapf[0:stage2[0].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].scatter_map[i:stage2[i].s_size],stage2[i].scatter_mapf[i:stage2[i].s_nt])
jl/gs.c://#pragma acc enter data copyin(stage2[i].gather_map[i:stage2[i].g_size],stage2[i].gather_mapf[i:stage2[i].g_nt])
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc exit data delete(map,mapf)
jl/gs.c:#pragma acc update host(buf[0:vnunit_sizebufSize]) if(acc)
jl/gs.c:#pragma acc update device(buf[0:vnunit_sizebufSize]) if(acc)
jl/gs.c: //#pragma acc exit data delete(ard->map_to_buf[0],ard->map_to_buf[1],ard->map_from_buf[0],ard->map_from_buf[1])
jl/gs.c: //#pragma acc enter data copyin(ard->map_to_buf[0][0:ard->mt_size[0]],ard->map_from_buf[0][0:ard->mf_size[0]],ard->map_to_buf_f[0][0:ard->mt_nt[0]],ard->map_from_buf_f[0][0:ard->mf_nt[0]],ard->map_to_buf[1][0:ard->mt_size[1]],ard->map_from_buf[1][0:ard->mf_size[1]],ard->map_to_buf_f[1][0:ard->mt_nt[1]],ard->map_from_buf_f[1][0:ard->mf_nt[1]])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc update host(a[0:n])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vngs_dom_size[dom]gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(bufPtr)
jl/gs.c:#pragma acc enter data create(bufPtr[0:vngs_dom_size[dom]gsh->r.buffer_size])
jl/gs.c:#pragma acc exit data delete(map_local0,map_local1,flagged_primaries)
jl/gs.c:#pragma acc enter data pcopyin(map[0:m_size],mapf2[0:2mf_temp])
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c://#pragma acc data present(buf[0:l])
jl/gs_acc.c://#pragma acc host_data use_device(buf)
jl/gs_acc.c: //#pragma acc enter data copyin(t_mapf[0:t_m_nt2],mapf[0:m_nt2],snd_mapf[0:snd_m_nt2],rcv_mapf[0:rcv_m_nt2],fp_mapf[0:fp_m_nt2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c: //#pragma acc enter data copyin(t_mapf[0:t_m_nt2],mapf[0:m_nt2],snd_mapf[0:snd_m_nt2],rcv_mapf[0:rcv_m_nt2],fp_mapf[0:fp_m_nt2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc enter data pcopyin(t_mapf[0:t_m_nt2],mapf[0:m_nt2],snd_mapf[0:snd_m_nt2],rcv_mapf[0:rcv_m_nt2],fp_mapf[0:fp_m_nt2], t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data present(u[0:uds],mapf[0:m_nt2],snd_mapf[0:snd_m_nt2],rcv_mapf[0:rcv_m_nt2],fp_mapf[0:fp_m_nt2],t_map[0:t_m_size],map[0:m_size],fp_map[0:fp_m_size],snd_map[0:snd_m_size],rcv_map[0:rcv_m_size])
jl/gs_acc.c://#pragma acc data create(sbuf[0:bl],rbuf[0:bl]) if(bl!=0)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],map[0:m_size],mapf[0:m_nt2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size],fp_mapf[0:fp_m_nt2]) private(i,j) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],fp_map[0:fp_m_size]) private(i,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],snd_mapf[0:snd_m_nt2],sbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],snd_map[0:snd_m_size],sbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc update host(sbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc update device(rbuf[0:bl]) async(vn+2)
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rcv_mapf[0:rcv_m_nt2],rbuf[0:bl]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
jl/gs_acc.c: //#pragma acc parallel loop gang vector present(u[0:uds],rcv_map[0:rcv_m_size],rbuf[0:bl]) private(i,j,k)
jl/gs_acc.c://#pragma acc parallel loop gang vector present(u[0:uds],t_map[0:t_m_size],t_mapf[0:t_m_nt2]) private(i,j,t) async(k+1)
jl/gs_acc.c://#pragma acc loop seq
jl/gs_acc.c://#pragma acc wait
http://mrfil.github.io/PowerGrid/
/tmp/PowerGrid$ git grep -il “pragma acc”
PowerGrid/Gfft.hpp
PowerGrid/Gnufft.hpp
PowerGrid/ftCpu.hpp
PowerGrid/gridding.hpp
PowerGrid/griddingSupport.hpp
From http://mrfil.github.io/PowerGrid/docs/Installation:
We have experience with PGC++ 15.7 from NVIDIA/The Portland Group as the version we have used most extensively. There is a free license available as part of the OpenACC Toolkit for academic users.
GCC 6.1 has OpenACC support but has not yet been tested by the developers, we welcome reports of anyone trying to compile with it. We hope to support it alongside PGI compilers in the near future.
For those lucky enough to have access to Cray supercomputers, the Cray compiler does support OpenACC, but we have not tried to build with it. Because the Cray compilers are not available on desktops, workstations, or non-Cray branded clusters, we have not dedicated resources to testing PowerGrid on it.
So these folks support OpenACC, but haven’t bothered to try the GCC implementation in the 1+ year that it’s been available. How likely are they to use Clang’s?
I cannot answer that. Perhaps they were waiting for GCC support to mature?
Or maybe they aren’t interested using in OpenACC compiler support outside of PGI.
What I’m really getting at here is who is going to use OpenACC support in Clang, particularly if there is no compatible Fortran OpenACC compiler? In addition to justifying the code maintenance effort, users who are not developers are essential for implementation hardening.
Best,
Jeff