Add --gcc-install-dir=, deprecate --gcc-toolchain=, and remove GCC_INSTALL_PREFIX

For target triples where GCC is the primary compiler, Clang detects a GCC installation and uses files from it (mainly libstdc++, crtbeginS.o and similar crt files). When GCC installations of multiple versions exist, there is no way specifying the desired version (other than hard coding -isystem/etc by oneself).

I have a patch ⚙ D133329 [Driver] Add --gcc-install-dir= which will allow to specify the GCC version:

# Debian and many derivatives
clang++ --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12 -m32 a.cc
clang++ --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/11 a.cc

# Gentoo
clang++ --gcc-install-dir=/usr/lib/gcc/x86_64-gentoo-linux-musl/11.2.0 a.cc

See https://github.com/llvm/llvm-project/issues/57570 for a use case simplifying Gentoo gcc-config.


--gcc-toolchain= was added to replace cmake GCC_INSTALL_PREFIX (which has a long history in Clang).
The option specifies a directory where lib/gcc{,-cross}/$triple/$version can be found.

I think --gcc-toolchain= and GCC_INSTALL_PREFIX are not useful and should be deprecated.

Most users don’t specify the option. When the option is specified as /usr, it is like unspecified.

When the option specifieds a GCC installation in a nonstandard place, that place is typically a sysroot.
When --sysroot= is specified, Clang auto detects lib/gcc{,-cross}/$triple/$version in $sysroot/usr and $sysroot, and --gcc-toolchain= is not needed.

When the GCC installation is not contained in the sysroot, typically the user wants to fix the GCC version as well.
The new --gcc-install-dir= can be used instead.

--gcc-toolchain= has been around for some time there are some use cases, so it cannot be removed anytime soon.
But GCC_INSTALL_PREFIX seems unused and should be removed.

2 Likes

GCC_INSTALL_PREFIX used to be a powerful tool for HPC users where they have to deal with the ancient system GCC 4.8.5 while they actually use something much newer. With GCC_INSTALL_PREFIX, they just need to specify it when building LLVM and that’s it. They don’t have to specify --gcc-toolchain or the new flag every time. It is also very useful for the LLVM_ENABLE_RUNTIMES because we can’t pass any extra compiler argument to the second build.

Now GCC_INSTALL_PREFIX is going to be removed. It seems like the only way to set GCC install automatically is to use CCC_OVERRIDE_OPTIONS, which is definitely not a user friendly setting.

1 Like

@MaskRay Also see How can I use multiple GPUs in OpenMP? - #10 by shiltian

Was GCC_INSTALL_PREFIX actually removed? Which commit was it?

We also use GCC_INSTALL_PREFIX when building clang for RHEL in order to ensure that it uses a newer libstdc++ from gcc-toolset instead of the older system libstdc++. Is the alternative for GCC_INSTALL_PREFIX to use a clang.cfg and add --gcc-install-dir pointing to the correct path?

It’s not actually removed. My bad.

If you use --gcc-toolchain=/usr and rely on it picking up /usr/lib/gcc/x86_64-linux-gnu/4.8.5, you can now do --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/4.8.5.

I know that --gcc-toolchain= automatically picking the newest version of GCC is sometimes convenient, so it will definitely last longer. But the CMake GCC_INSTALL_PREFIX variable does not have to be retained. Its functionality can be replaced with --gcc-toolchain= in a default configuration file (Clang in Gentoo now sets default runtimes via config file – Michał Górny).

If I could suggest the following…

  1. compile time GCC_INSTALL_PREFIX. I’d hate to see this go. This allows me to build a version specifying up-front, once and for all, the gcc location. This is useful building LLVM for HPC systems that don’t have default gcc installations (ex. RHEL 7 as mentioned above) and I can do it without end users having to all the complications.

  2. –gcc-toolchain argument. This allows me, at runtime, to override anything specified as in #1 above. This also lets me deal with pre-compiled LLVMs (ex. AOCC, OneAPI) where my system doesn’t have standard gcc installs (or has multiple versions). The problem is that this command line argument has to be explicitly specified by users (who don’t always understand the intricacies of LLVM/GCC) and also means I may have to modify other software packages build tools that don’t take into account a non-default gcc install.

  3. Something new… A runtime environment variable (called GCC_INSTALL_PREFIX works or something consistent with the above). If this existed it would allow me to specify the gcc without having to explicitly add a command line argument or changing build software. The other big advantage is since this would exist in the user’s environment it’s something that I can incorporate into our module/lmod setup (specifically my gcc modules could setup the gcc to be used by the llvm) and “hide” all thius mess from casual users.
    (extra credit for supporting a list ala /opt/gcc11:/opt/gcc12:… and let llvm continue to pick using it’s current algorithm)

In terms of (increasing) priority: GCC_INSTALL_PREFIX (build) → GCC_INSTALL_PREFIX (runtime env) → --gcc-toolchain

My environment is multi-system, multi-vendor, and I need to support multiple versions of software over an extended period of time for a large number of users so my situation probably is more complicated than the typical user.

Just something to think about…

  1. You can add --gcc-toolchain= to a configuration file (Clang Compiler User’s Manual — Clang 16.0.0git documentation) to avoid GCC_INSTALL_PREFIX.
  2. --gcc-install-dir= is similar to --gcc-toolchain= but it includes the triple/version path components as well: --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/12. You don’t lose functionality.
  3. No, an environment variable would be too magical for the cross compilation use cases.

Can you add 2 more cmake variables to make the build easiers? E.g.

GCC_PATH as a path to the custom gcc
GCC_LIBRARIES_PATH as paths to the custom C++ and other libraries.
The gcc will find headers and all the rest easily. GCC_INSTALL_PREFIX has never worked well and I’m not sure if it is a cmake or LLVM bug.

I use --gcc-toolchain to use Clang as my compiler toolchain but benefit from the libstdc++ that evolves and gets features faster.


I clone/build GCC locally and then use --gcc-toolchain=/usr/local/gcc-dev, and do a sync from upstream every few weeks


It’s really nice because then you get the best of both worlds, the LLVM compiler toolchain and the newest features from libstdc++

Please see the top message. Where you previously use --gcc-toolchain=/usr/local/gcc-dev, now use --gcc-install-dir=/path/to/gcc-dev/$triple/$version

See my previous reply. You can specify --gcc-install-dir= in a configuration file and configure llvm-project build to use that configuration file by default.

We definitely don’t want to add more GCC_PATH style CMake variables to llvm-project. They complicate the build system, clang driver, and usually add configurations most contributors don’t test. The recent configuration file revamp was designed with subsuming such customization needs.

Now it is very compilicated to build clang with using link compile flags then copy the C++ library as runtime etc.
Citing Getting a Modern Host C++ Toolchain
https://www.llvm.org/docs/GettingStarted.html
I’m asking for a solution similar to the way the other build dependencies of LLVM are added. E.g. a custom zlib needs an include and a library path and cmake recognizes it. While a gcc toolchain needs special knowledge again about flags and it is complicated again. I am looking for a simple solution which will make the clang build easy and cmake will find everything needed to build. With libc++ it is even more difficult since shlibs and static libraries break the build while they should make it easier.

@MaskRay The problem with --gcc-install-dir is it requires hard-coding the gcc version into the config file. This is not ideal for OS distributors, since it means that clang must be updated every time gcc is. The solution we were using before for Fedora avoided this problem, by using the LLVM_DEFAULT_TARGET_TRIPLE option which caused the driver to pick the newest gcc version in /usr/lib/$LLVM_DEFAULT_TARGET_TRIPLE/

1 Like

Encoding the version in the --gcc-install-dir= is by design. In Gentoo, one may install multiple GCC installations. The latest one is not necessarily the select one. gcc-config updates a Clang configuration file to specify the appropriate --gcc-install-dir=.

With new addition of --gcc-triple=, the detection logic now looks like:

if (OPT_gcc_install_dir_EQ)
  return OPT_gcc_install_dir_EQ;

if (OPT_gcc_triple)
  candidate_gcc_triples = {OPT_gcc_triple};
else
  candidate_gcc_triples = collectCandidateTriples();
if (OPT_gcc_toolchain)
  prefixes = {OPT_gcc_toolchain};
else
  prefixes = {OPT_sysroot/usr, OPT_sysroot};
for (prefix : prefixes)
  if "$prefix/lib/gcc" exists // also tries $prefix/lib/gcc-cross
    for (triple : candidate_gcc_triples)
      if "$prefix/lib/gcc/$triple" exists
        return "$prefix/lib/gcc/$triple/$version"; // pick the largest version

On Debian and its derivatives where the target triple omits the vendor part, the following ways are roughly equivalent, except that --gcc-install-dir= specifies a version as well:

clang --gcc-toolchain=/usr a.c
clang --gcc-install-dir=/usr/lib/gcc/x86_64-linux-gnu/11 a.c
clang --gcc-triple=x86_64-linux-gnu a.c

[CMake] Deprecate GCC_INSTALL_PREFIX by MaskRay · Pull Request #77537 · llvm/llvm-project · GitHub deprecated the CMake variable GCC_INSTALL_PREFIX.

In spirit I want to deprecate --gcc-toolchain=, but there are currently too many uses (even in clang/test/Driver).


Here is an example that a configuration file is placed beside the clang executable (actually a symlink, so -no-canonical-prefixes is used)

ln -s /tmp/Rel/bin/clang-18 clang
echo '--gcc-toolchain=/usr' > clang.cfg
./clang -no-canonical-prefixes -c a.cc

The CMake variable used to be the only way to get LLVM to build on some HPC machines where the system GCC was too old to use. What’s the expected way to set this for the LLVM/Clang build itself now? Do we need to use CCC_OVERRIDE_OPTIONS?

I deeply do not understand the cross compilation library handling in clang. Just found this thread during another attempt to make sense of it and am confused further.

I note the enthusiasm for reading a text configuration file and do not like it much. There are extra failure modes there which are not present when setting configuration while building the driver.

I can’t tell whether this proposal makes things better or not.

Noticed the warning GCC_INSTALL_PREFIX is deprecated and will be removed.
In my use case, my application use C, C++, HIP they are all just Clang. Without GCC_INSTALL_PREFIX, I need to set --gcc-toochain for each compiler in order to use gcc 12 and my CMake line is extremely lengthy. Using GCC_INSTALL_PREFIX, I only need to set it once when compiling LLVM. It removes the need of setting --gcc-toolchain but GCC_INSTALL_PREFIX is set at the CMake stage of compiling LLVM. Still ugly. I feel configure file is more flexible than LLVM compile-time GCC_INSTALL_PREFIX but the locations of configure files seem limited or need to set at the cmake stage of compiling LLVM. Still very problematic. Can we have an environment variable changing the configure search path? Similar to CUDA and HIP toolchain searching that can be affected by environment variable. With the configure files and their search path via an environment variable, I can set the search path in a module pointing to the configure file with specific gcc in need without explicitly using --gcc-toolchain

It seems that I still need to set one configure file per driver. That is too much work compared to a single GCC_INSTALL_PREFIX