RFC: How should Clang/LLVM runtime libraries be installed and found during link steps?

It has come up when reviewing Kostya’s patch to add the necessary support fort linking in the Address Sanitizer runtime library that we need a proper scheme and plan for deploying runtime libraries along with Clang.

I’ve CC’ed llvmdev on this for compiler-rt developers’ input.

The key issues I see when locating runtime libraries are the following:

  • These libraries should be shipped with Clang, not installed separately on the system.
  • They are likely to be somewhat tied to specific versions of Clang/LLVM.
  • To support cross-compiling, we should be able to install multiple copies of these libraries in a single Clang installation, and use the appropriate one when targeting a particular platform.
  • The above cross-compiling concern extends to ABI compatibility – many of the ABIs are already fixed in this space.
  • These are different from builtin header files which can use the preprocessor to internally differentiate their contents based on the target platform.

My proposed solution:

  • Base the path on the shared “resource directory” concept which already exists in Clang.
  • Builtin header files are at /include already.
  • Append a “base” triple directory name.
  • Append a “lib” directory name for runtime libraries
  • Place the runtime libraries as “libcompiler_rt_.a” for each sub-library component “”.

Example for “x/bin/clang” installation:
x/bin/…/lib/clang/3.1/x86_64-linux-gnu/lib/libcompiler_rt_asan.a

What is a “base” triple? It is the simplest form we can reduce a triple to while guaranteeing compatibility. The concept is best explained by an example. All of the following triples reduce to the base triple of “x86_64-linux-gnu”:
x86_64-linux-gnu
x86_64-pc-linux-gnu
x86_64-unknown-linux-gnu
x86_64-redhat-linux
x86_64-redhat-linux6E

A different ABI could be expressed much like ARM’s is with the last element: “*-gnueabi”. This would not reduce to the triple ending in ‘-gnu’. Due to the adhoc and poorly spec’ed nature of existing triples, this will largely be a fixed mapping much like already exists in the Clang driver, but consolidated into LLVM’s core triple logic.

Open Questions:

How do we handle ‘i686’ vs ‘i386’? Is it useful to have separate installed libraries for i386 and i686 in order to get better performance for the latter and maintain compatibility for the former? We could collapse to either i386 or i686, and define either to mean whatever we want (for example, Debian uses i386-linux-gnu for its “base” triple in multiarch, and does not support i386 processors).

This scheme closely mirrors GCC’s existing scheme with one exception: GCC puts the triple first, and the version number second. I don’t think this matters greatly either way, but currently Clang puts its version number first. I think this reflects the inherent design of Clang to be cross-compiling by default, but it would be good to consider (even if we reject) matching GCC’s behavior here.

Should runtime libraries be installed as archives? .o files? .so files? (gasp) bitcode? Some mixture of these? What mixture, and how do we decide? I lean toward .o files as bitcode where the linker supports it, normal .o files where it supports those, and .a files only as a fallback. Not very confident of these preferences though.

2011/11/23 Chandler Carruth <chandlerc@google.com>

It has come up when reviewing Kostya’s patch to add the necessary support fort linking in the Address Sanitizer runtime library that we need a proper scheme and plan for deploying runtime libraries along with Clang.

I’ve CC’ed llvmdev on this for compiler-rt developers’ input.

The key issues I see when locating runtime libraries are the following:

  • These libraries should be shipped with Clang, not installed separately on the system.
  • They are likely to be somewhat tied to specific versions of Clang/LLVM.
  • To support cross-compiling, we should be able to install multiple copies of these libraries in a single Clang installation, and use the appropriate one when targeting a particular platform.
  • The above cross-compiling concern extends to ABI compatibility – many of the ABIs are already fixed in this space.
  • These are different from builtin header files which can use the preprocessor to internally differentiate their contents based on the target platform.

My proposed solution:

  • Base the path on the shared “resource directory” concept which already exists in Clang.
  • Builtin header files are at /include already.
  • Append a “base” triple directory name.
  • Append a “lib” directory name for runtime libraries
  • Place the runtime libraries as “libcompiler_rt_.a” for each sub-library component “”.

Example for “x/bin/clang” installation:
x/bin/…/lib/clang/3.1/x86_64-linux-gnu/lib/libcompiler_rt_asan.a

What is a “base” triple? It is the simplest form we can reduce a triple to while guaranteeing compatibility. The concept is best explained by an example. All of the following triples reduce to the base triple of “x86_64-linux-gnu”:
x86_64-linux-gnu
x86_64-pc-linux-gnu
x86_64-unknown-linux-gnu
x86_64-redhat-linux
x86_64-redhat-linux6E

A different ABI could be expressed much like ARM’s is with the last element: “*-gnueabi”. This would not reduce to the triple ending in ‘-gnu’. Due to the adhoc and poorly spec’ed nature of existing triples, this will largely be a fixed mapping much like already exists in the Clang driver, but consolidated into LLVM’s core triple logic.

This does kind of push the GNU triple language on LLVM/Clang internally, although the jury was still out on the true specification for cross-compiling platform names. Although internal, it really will keep spreading like a “disease” to more exposed parts of Clang.

Open Questions:

How do we handle ‘i686’ vs ‘i386’? Is it useful to have separate installed libraries for i386 and i686 in order to get better performance for the latter and maintain compatibility for the former? We could collapse to either i386 or i686, and define either to mean whatever we want (for example, Debian uses i386-linux-gnu for its “base” triple in multiarch, and does not support i386 processors).

This difference is so 1990’s; Debian is stubborn and they even use completely different triple names by default, just to be different. It’s not an example to follow by far…

This scheme closely mirrors GCC’s existing scheme with one exception: GCC puts the triple first, and the version number second. I don’t think this matters greatly either way, but currently Clang puts its version number first. I think this reflects the inherent design of Clang to be cross-compiling by default, but it would be good to consider (even if we reject) matching GCC’s behavior here.

I don’t see what influence this has on anything but internal design: LLVM/Clang should really decide for itself. Version/Target sounds like the “right” choice for the reason you say.

Should runtime libraries be installed as archives? .o files? .so files? (gasp) bitcode? Some mixture of these? What mixture, and how do we decide? I lean toward .o files as bitcode where the linker supports it, normal .o files where it supports those, and .a files only as a fallback. Not very confident of these preferences though.

libraries → .a/so files… They’re target specific anyways, why even consider bitcode?

Ruben

And it does not play well with universal binaries, which are actually used for darwin compiler_rt libraries. Not that I think this is a major issue, but it probably should be considered too.

– Jean-Daniel

Absolutely! Unfortunately I know nothing about Darwin, so hopefully others can educate me here. I proposed a solution primarily centered around what would work well on Linux.

[snip]

Should runtime libraries be installed as archives? .o files? .so files? (gasp) bitcode? Some mixture of these? What mixture, and how do we decide? I lean toward .o files as bitcode where the linker supports it, normal .o files where it supports those, and .a files only as a fallback. Not very confident of these preferences though.

libraries → .a/so files… They’re target specific anyways, why even consider bitcode?

Bitcode libraries allow their code to be inter-procedurally optimized by libLTO. For example, we used to compile libstdc++ in llvm-gcc to bitcode; this allowed us to inline C++ standard library functions into the main program and perform optimizations such as dead code elimination, inter-procedural constant propagation, etc, etc.

– John T.

For some languages (e.g. OpenCL), the runtime library is a set of
functions which must be inlined, and therefore must be stored as
bitcode. More generally, storing libraries as bitcode would expose
inlining/LTO opportunities.

Thanks,

Hey Chandler,

We already have a certain precedent for how we do this on Darwin. The
current library set is:

So, is there an agreement now?
In particular, is it fine to have the asan run-time for linux x86/x86_64 at
lib/clang/linux/TC.getArchName())/libclang_rt.asan.a ?

Thanks,

–kcc

I am convinced that this is not sufficient for Linux. Daniel doesn’t want to do a more rigorous design of the system at this time, but I completely disagree.

However, feel free to start here, and we can fix it with a better design later. I hate building up this kind of technical debt, but I’m quite busy this week and unlikely to get time to look at how to properly fix this until some time later.