Cleaning up the clang::driver::toolchains::Linux mess?

clang 2.9 hardcodes a huge list of GCC versions and paths and distro
release files in /etc just to find where GCC keeps its libraries and
include files on Linux. This was introduced in
http://llvm.org/viewvc/llvm-project?view=rev&revision=118382 , and it’s
been patched up several dozen times since then with uglier and uglier
distro-specific hacks.

Lately this has been causing trouble for Debian and Ubuntu development
releases, which have been transitioning to multiarch paths for GCC, such
as /usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1; see
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=629861 . But more
fundamentally, it seems fragile, unmaintainable, and most other kinds of
wrong. It’s guaranteed to break again every time a new GCC version or a
new distro release comes out, and even if clang gets fixed promptly, not
every distro has the resources and energy to package a new version of
clang every time that happens.

Is there a sane way to find these paths? It seems GCC has an option to
just output them:

$ gcc -print-file-name=crtbegin.o
/usr/lib/i386-linux-gnu/gcc/i486-linux-gnu/4.6.1/crtbegin.o

This would preferably happen at clang’s configure time so the packager can
override the autodetection if it fails for some reason. It might even be
possible to parse all the necessary distro-specific flags (-z relro,
--hash-style=gnu, etc.) out of `gcc -dumpspecs`.

Anders

Hi,

If you're thinking of doing this sort of refactoring (and I agree it is definitely worthwhile), could you please not just limit it to toolchains::Linux?

I'm thinking more of toolchains::Unknown, which is used when cross-compiling to bare-metal and currently calls GCC to find the assembler and linker (but LLVM/Clang is meant to be a GCC replacement - it shouldn't rely on it!).

I've been doing some work on improving the cross-compilation story in the Clang driver (see http://comments.gmane.org/gmane.comp.compilers.clang.scm/36110, currently stuck at review on cfe-commits), but haven't got around to doing something about toolchain locations yet.

An option to bake in locations at configure time a-la GCC would be excellent for picking up the correct cross-compilation linker, assembler and libraries.

Cheers,

James

Chris did some brief design (search for Universal Driver) a while ago, but it didn't get implemented. I'd love to see the toolchain descriptions moved out into a config file, so when a Linux distro decides to randomly move all of their paths around again users just need to update a config file specifying the default locations, and users can create cross-compile toolchain descriptions just by editing a config file, without needing to hack on clang.

David

It's the only configuration where it really matters. Everyone else uses
a sane setup. For cross-compiling to unknown platforms, you can still
just put the files into a single location relative to the install path
of clang or so.

Joerg

Autodetection of paths from default gcc also make sense, e.g. in Gentoo one can easily switch active gcc version, and Clang could follow it.

I do hope there was a better motivation for r118382 than that. One can’t
just switch from running

gcc -o foo foo.o

to running

"/usr/bin/ld" --eh-frame-hdr -m elf_x86_64 -dynamic-linker
/lib64/ld-linux-x86-64.so.2 -o foo
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../crt1.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../crti.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/crtbegin.o
-L/usr/lib/gcc/x86_64-linux-gnu/4.6
-L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../../lib64 -L/lib/../lib64
-L/usr/lib/../lib64 -L/usr/lib/gcc/x86_64-linux-gnu/4.6/../../.. foo.o
-lgcc --as-needed -lgcc_s --no-as-needed -lc -lgcc --as-needed -lgcc_s
--no-as-needed /usr/lib/gcc/x86_64-linux-gnu/4.6/crtend.o
/usr/lib/gcc/x86_64-linux-gnu/4.6/../../../crtn.o

and call that “not relying on GCC”. Not relying on GCC would mean that I
don’t need to have any parts of GCC installed to use clang. That might be
possible in the bare-metal case, but it doesn’t make sense to talk about
that in the Linux case until clang has its own replacements for
crtbegin.o, crtend.o, libgcc.a, libgcc_s.so, and knows how to find crt1.o,
crti.o, crtn.o, libc.so without backtracking from GCC’s internal paths.

Anders

Hi Anders,

and call that “not relying on GCC”. Not relying on GCC would mean that
I
don’t need to have any parts of GCC installed to use clang. That might
be
possible in the bare-metal case, but it doesn’t make sense to talk
about
that in the Linux case until clang has its own replacements for
crtbegin.o, crtend.o, libgcc.a, libgcc_s.so, and knows how to find
crt1.o,
crti.o, crtn.o, libc.so without backtracking from GCC’s internal paths.

Because of Linux's heavy reliance on GCC anyway (coupled with compilerrt's lack of maturity), this does indeed make sense for Linux.

But the default "Unknown" case, used for baremetal cross-compilation still uses this method also instead of either doing what the *BSDs do which is find as/ld, or even better trying to find an as/ld with the program prefix clang was invoked with (for example "i686-pc-clang" would look for "i686-pc-as").

James

The problem with this is that GCC has a habit of assimilating vaguely-related projects and Linux distributions let it get away with this by not splitting them up again for packaging. For example, the GCC libstdc++ and libobjc do not support building outside of the GCC tree, but they are not specific to GCC - they can be used by other compilers.

The BSDs (including OS X) provide their own build system for components like these, so that they can be built and upgraded independently of GCC. Linux distributions tend to go the lazy route and just have one big-fat gcc package that includes a huge mess of stuff that's nothing to do with the compiler (yet, oddly, the do tend to compile the various GCC front ends in different packages).

If you use a system maintained by people who understand the difference between a compiler, a header, and a library, then you can use clang without GCC. If you use a system maintained by people who only install compiler-independent libraries and headers when you happen to install a compiler, and place these headers and libraries in a package-version-dependent location that is hard-coded into the compiler, then you are going to have problems.

David