How can I use multiple GPUs in OpenMP?

BTW, now you can simply use clang++ -fopenmp --offload-arch=sm_80 to compile. No need to use the long -fopenmp-targets=nvptx64-nvidia-cuda --cuda-gpu-arch=sm_80 -Xopenmp-target -march=sm_80.

We only started supporting this fully in LLVM 15, see this talk. LLVM 14 still requires using -fopenmp-targets=nvptx64 -Xopenmp-target=nvptx64 -march=sm_80 to specify it. I would recommend using LLVM 15 because it fully supports -foffload-lto, which generally improves performance on Nvidia targets.

1 Like

I’d recommend to update to LLVM 15 or even a development version (build from the git version). The FAQ on has information on the latter.

I think it (nowadays) is. Thought that’s not the point of the question.

FWIW, with LLVM 15 and newer you can drop the cuda-gpu-arch and replace -Xopenmp-target -march=sm_80 with --offload-arch=sm_80. If you like cude mode, clang --help-hidden | grep fopenmp` has some more assume flags you might like. As @jhuber6 noted, -foffload-lto is also something to consider for sure. (Side note: Not sure if we already describe all of them on our webpage (, @jhuber6.)

Device 0 and 1 is what should work, so the example as shown looks OK. I’m assuming @jhuber6’s answer will fix your problem. LIBOMPTARGET_INFO=-1 ./testapp will give you more information about what’s going on, e.g., it might tell you that the image is not compatible with the device as sm_80 has not been piped through properly.

1 Like

Thank you for the clarification, Shilei! I now feel convinced that the code should indeed work with device(0) and device(1). As suggested by Johannes and Joseph I will install the latest version of Clang instead and see if the issue persists. :smiling_face:

Thank you, Joseph! I will revisit your slides and try to install a newer version of Clang

Thank you, Johannes! I definitely have a lot to learn about Clang. I will try out the development branch and if I get it to work, I will try out some of the assume flags to improve the performance.

I am very impressed that I got so many replies in such a short time!

I am facing some problems when installing LLVM from Github. I tried

mkdir LLVM
git clone
module load cmake/3.23.2
module load gcc/11.3.0-binutils-2.38
export CC=`which gcc`
export CXX=`which g++`
cmake llvm-project/llvm -DLLVM_ENABLE_PROJECTS='clang;lld' -DLLVM_ENABLE_RUNTIMES='openmp' -DCMAKE_BUILD_TYPE=Release
make -j 16

But I got a lot of errors like the following

Do you know how I can fix those?

See below :wink:

That is because clang picks up the system GCC 4.8.5 in the second build, which doesn’t support fully C++14. Unfortunately in the past you can set GCC_INSTALL_PREFIX when building LLVM, but it is removed recently (Add --gcc-install-dir=, deprecate --gcc-toolchain=, and remove GCC_INSTALL_PREFIX). CCC_OVERRIDE_OPTIONS seems the only way to tell clang to use your own GCC. You can check clang/tools/driver/driver.cpp line 105 to see how to use that environment variable.

1 Like

Okay, I checked the wrong directory before. GCC_INSTALL_PREFIX is not removed yet. You can still use that. I’m not sure if it will be removed soon, but at least it can still work.

The CMake variable GCC_INSTALL_PREFIX is discouraged. Which value do you use? It’s recommended to specify --gcc-install-dir= (e.g. --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/10) if you have a specific GCC version requirement.

The problem is, users don’t have direct control of the second build (runtime build in LLVM_ENABLE_RUNTIMES). It is invoked by CMake directly. In this case, we can either tell clang by using CCC_OVERRIDE_OPTIONS, or clang can use the one specified in GCC_INSTALL_PREFIX when it is built.

I don’t understand the request. At which stage do you specify GCC_INSTALL_PREFIX? Have you checked Clang in Gentoo now sets default runtimes via config file – Michał Górny ?

At which stage do you specify GCC_INSTALL_PREFIX ?

Typical CMake arguments we recommend for building LLVM and OpenMP on a HPC system would be some like:


I omitted a couple of unrelated arguments. Here openmp is set in LLVM_ENABLE_RUNTIMES. In this mode, both clang and llvm will be built first, and then openmp will be configured (CMake) and built. During the configuration, the clang just built will be used as compiler. The second stage is invoked by CMake automatically because runtimes is a CMake target at the top level. So at that stage, clang will use the GCC specified in GCC_INSTALL_PREFIX directly.

I was trying to say that, asking users to specify --gcc-install-dir is not feasible in this case.

Have you checked Clang in Gentoo now sets default runtimes via config file – Michał Górny ?

I did a quick experiment on my system. So it generally requires two steps:

  1. Set CLANG_CONFIG_FILE_USER_DIR when building LLVM. Otherwise there will be no default config file search directory.
  2. Create a config file under CLANG_CONFIG_FILE_USER_DIR, and have --gcc-install-dir in that cfg file.

It generally works, but not as convenient as GCC_INSTALL_PREFIX.

According to the docs, there are supposed to be default search directories (system directory and clang binary directory).

Yes, but when using runtime build (LLVM_ENABLE_RUNTIMES), either users don’t have write access to system folder, or the clang binary folder is not created yet.

I see, so you are still running clang out of the build directory, because it’s being used to build OpenMP and nothing has been installed yet.


Passing --config /path/to/confg.cfg to CMAKE_CXX_FLAGS might be a way to solve this with less steps, but this would also load the config file when building clang/llvm (which may not be what you want). Is there a way to pass cxx flags only to the runtime builds?

AFAIK, only specific CMake arguments can be passed to the runtime builds, like <runtime>_ABC. Then when configuring the runtime, <runtime>_ABC will be passed through. That being said, there is no way to pass CMake related arguments to runtime build. (That’s my impression about one year ago when I enabled runtime build for OpenMP. I’m not sure if it has been improved.) That’s also the main reason that I never use runtime build, because my general setting is release LLVM plus debug OpenMP, and there is no way to do it with runtime build. However, for OpenMP users, runtime build is the recommended method.

Thank you for all of the really good recommendations! I guess the issue of specifying the device number has been fixed in a more recent version of Clang. With the release version compiled from the main branch there is indeed no issue with multi-GPU offloading.

@shiltian Adding -DGCC_INSTALL_PREFIX made the second phase of the compilation work. Thank you so much for suggesting that. I would never have found out how to do that on my own. :smiley:

The following resulted in an installation that allows me to do multi-GPU target offloading in Clang++.

git clone --depth=1
module load cmake/3.23.2
module load gcc/11.3.0-binutils-2.38
export CC=`which gcc`
export CXX=`which g++`