LLVM/Clang and getting rid of the system linker (GNU ld or MSVC link.exe)

Hi,

I've been thinking recently about how Clang can solve it's system
linker dependency problem. I've seen some proposals and thoughts on
the mailing list about this issue, so I think this will be interesting
to explore. Please excuse my non-professional language, I am not a
Computer Science master, just a very interested hobbyist.

Concretely, Clang now uses the system linker (GNU ld or MSVC link.exe)
to turn the series of object files into an executable/library. This is
not very good, because Clang depends on the whims of a completely
different project to work correctly, and this is very platform
dependent on too high a level.

So the current setup is (how I see it):
1. Clang compiles C/C++ to object files (either GNU *.o or MSVC *.obj
files). This happens through LLVM IR and accompanying optimization
runs of the LLVM toolchain.
2. The system linker is responsible for all the usual link stuff:
turning the object files into native binaries, doing some
link-time-optimizations while it works its magic.

How I would propose to have it in an ideal world:

1. Clang compiles C/C++ into LLVM IR.
2. LLVM toolchain stuff optimizes everything as well as it can.
3. LLVM linker (+Clang?) links together the IR files into one object
file (perhaps existing GNU .o or MSVC .obj files), executing its
link-time-optimizations in the process.
4. A *simple* tool turns the complete object file into native
executable format, adding the platform-dependent parts that are
missing from the semi-platform-agnostic file created in the previous
step. This tool can in the first steps of the implementation be the
system linker, but all it would do is do the object->executable
conversion.

The "linker" may be the assembler or something else, this is what I
don't know. What I do know is that this setup (if possible) provides a
way to integrate more LLVM optimizations in a C/C++/<other language of
your choice> toolchain, remove any complicated linker applications,
and be easily extensible to new platforms.

I understand the hand-wavingness of this whole story, but any comments
or thoughts on what is wrong in my reasoning or not as simple as it
seems are very welcome.

Thanks!

Ruben

Hi,

I've been thinking recently about how Clang can solve it's system
linker dependency problem. I've seen some proposals and thoughts on
the mailing list about this issue, so I think this will be interesting
to explore. Please excuse my non-professional language, I am not a
Computer Science master, just a very interested hobbyist.

I would not call depending on the system linker a problem. The system
linker saves us from worrying about the nitty gritty details of how
the system works.

Concretely, Clang now uses the system linker (GNU ld or MSVC link.exe)
to turn the series of object files into an executable/library. This is
not very good, because Clang depends on the whims of a completely
different project to work correctly, and this is very platform
dependent on too high a level.

What do you mean by on too high a level? Also, linkers don't really
change much, and in fact are kinda standardized, so I wouldn't use the
term whim. Dealing with linkers is a rather small part of the Clang
codebase.

So the current setup is (how I see it):
1. Clang compiles C/C++ to object files (either GNU *.o or MSVC *.obj
files). This happens through LLVM IR and accompanying optimization
runs of the LLVM toolchain.
2. The system linker is responsible for all the usual link stuff:
turning the object files into native binaries, doing some
link-time-optimizations while it works its magic.

How I would propose to have it in an ideal world:

1. Clang compiles C/C++ into LLVM IR.
2. LLVM toolchain stuff optimizes everything as well as it can.
3. LLVM linker (+Clang?) links together the IR files into one object
file (perhaps existing GNU .o or MSVC .obj files), executing its
link-time-optimizations in the process.
4. A *simple* tool turns the complete object file into native
executable format, adding the platform-dependent parts that are
missing from the semi-platform-agnostic file created in the previous
step. This tool can in the first steps of the implementation be the
system linker, but all it would do is do the object->executable
conversion.

Here's where the main problem is. There is no *simple* tool to to do
this. You need a full linker to turn an object file into an
executable. You have to link to the c and system libraries for any
real program, even "puts("hello world");". There's a lot of code that
gets run before main is entered.

The "linker" may be the assembler or something else, this is what I
don't know. What I do know is that this setup (if possible) provides a
way to integrate more LLVM optimizations in a C/C++/<other language of
your choice> toolchain, remove any complicated linker applications,
and be easily extensible to new platforms.

I understand the hand-wavingness of this whole story, but any comments
or thoughts on what is wrong in my reasoning or not as simple as it
seems are very welcome.

Thanks!

Ruben

I agree that LLVM should have a linker, and I am currently writing one
(see Object Files in LLVM from last year's dev meeting). I intend for
it to replace the system linker on the major platforms (Win, Linux,
FBSD, Mac).

- Michael Spencer

Hi,

I've been thinking recently about how Clang can solve it's system
linker dependency problem. I've seen some proposals and thoughts on
the mailing list about this issue, so I think this will be interesting
to explore. Please excuse my non-professional language, I am not a
Computer Science master, just a very interested hobbyist.

I would not call depending on the system linker a problem. The system
linker saves us from worrying about the nitty gritty details of how
the system works.

Concretely, Clang now uses the system linker (GNU ld or MSVC link.exe)
to turn the series of object files into an executable/library. This is
not very good, because Clang depends on the whims of a completely
different project to work correctly, and this is very platform
dependent on too high a level.

What do you mean by on too high a level? Also, linkers don't really
change much, and in fact are kinda standardized, so I wouldn't use the
term whim. Dealing with linkers is a rather small part of the Clang
codebase.

Let's talk Windows, cause that's where I'm mostly concerned with this.
Cross-compiling projects (like VLC, and any GNU project) have to use a
GNU toolchain, because GNU ld is the only linker capable of producing
a win32 executable on a non-Windows OS (You are not legally allowed to
use Visual Studio or the Windows SDK on a non-Windows machine). On top
of that, it has its problems: for one, the architecture commandline
arguments are different from the GCC arguments that mean the same,
patching it requires a lot of work, as it's not a simple project, with
a lot of legacy code.
Thirdly, it requires a UNIX shell to build (which is really my biggest
gripe), which either means a toolchain needs to be cross-compiled, or
you need to use slow MSYS/Cygwin to handle that (which I currently
do), and those are not without their caveats. I agree the mingw
runtime also requires a Bash shell, but I don't think this is the
biggest hurdle to overcome. Finally, there's the "whim" part. GNU ld
is tightly bound to GCC, meaning that any change in the latter will be
reflected by a change in the first. This is fine as long as Clang
doesn't decide to do things differently. Again, I don't know the
details, but both projects may conflict someday, and then you're
either stuck with forking the "old" ld, or starting to replace it
then. I'd rather see a replacement now.

On a sidenote, the license of GNU ld is, well, GPL. I'm sure that's a
showstopper for people trying to bundle integrated linker
functionality in their commercial project.

So the current setup is (how I see it):
1. Clang compiles C/C++ to object files (either GNU *.o or MSVC *.obj
files). This happens through LLVM IR and accompanying optimization
runs of the LLVM toolchain.
2. The system linker is responsible for all the usual link stuff:
turning the object files into native binaries, doing some
link-time-optimizations while it works its magic.

How I would propose to have it in an ideal world:

1. Clang compiles C/C++ into LLVM IR.
2. LLVM toolchain stuff optimizes everything as well as it can.
3. LLVM linker (+Clang?) links together the IR files into one object
file (perhaps existing GNU .o or MSVC .obj files), executing its
link-time-optimizations in the process.
4. A *simple* tool turns the complete object file into native
executable format, adding the platform-dependent parts that are
missing from the semi-platform-agnostic file created in the previous
step. This tool can in the first steps of the implementation be the
system linker, but all it would do is do the object->executable
conversion.

Here's where the main problem is. There is no *simple* tool to to do
this. You need a full linker to turn an object file into an
executable. You have to link to the c and system libraries for any
real program, even "puts("hello world");". There's a lot of code that
gets run before main is entered.

Agreed, there is more to this than I let shine out. But one can always
hope and be naive :slight_smile:

Would my LTO optimization story work though? This would only be
limited to exclude external (static) libraries unfortunately.

The "linker" may be the assembler or something else, this is what I
don't know. What I do know is that this setup (if possible) provides a
way to integrate more LLVM optimizations in a C/C++/<other language of
your choice> toolchain, remove any complicated linker applications,
and be easily extensible to new platforms.

I understand the hand-wavingness of this whole story, but any comments
or thoughts on what is wrong in my reasoning or not as simple as it
seems are very welcome.

Thanks!

Ruben

I agree that LLVM should have a linker, and I am currently writing one
(see Object Files in LLVM from last year's dev meeting). I intend for
it to replace the system linker on the major platforms (Win, Linux,
FBSD, Mac).

Aha, so that presentation did turn out to be a WIP project! That's
great news! Would you have a timeline on a usable form of hat project
;)? I would offer to help, but I fear that my knowledge is not near
enough to be useful.

Thanks!

3. LLVM linker (+Clang?) links together the IR files into one object
file (perhaps existing GNU .o or MSVC .obj files), executing its
link-time-optimizations in the process.

You can already do this by calling opt etc by hand.

4. A *simple* tool turns the complete object file into native
executable format, adding the platform-dependent parts that are
missing from the semi-platform-agnostic file created in the previous
step. This tool can in the first steps of the implementation be the
system linker, but all it would do is do the object->executable
conversion.

A linker is not a simple tool. Depending on the platform, a lot of
processing has to be done. Dropping common sections, building
constructor tables, dealing with all the different relocation types etc.
It is good old rocket science -- not exactly extremely advanced stuff,
but a good lot things to get right.

Joerg

I would not call depending on the system linker a problem. The system
linker saves us from worrying about the nitty gritty details of how
the system works.

It also means that you can build shared libraries/DLLs with Clang and
they can be linked into programs built with the system's standard
toolchain, and vice-versa. This is going to be crucial to my potential
adoption of Clang: I supply software components, not applications, and
not all of my customers will be able to switch to Clang at the same time
as me.

best,

It would be nice if we could have a toolchain description in clang that would do this, so when you invoke clang with a list of .o files, it links the ones that contain LLVM IR, optimises them, and then passes them off to the system linker.

David

-- Sent from my Apple II