Hi,
I've been thinking recently about how Clang can solve it's system
linker dependency problem. I've seen some proposals and thoughts on
the mailing list about this issue, so I think this will be interesting
to explore. Please excuse my non-professional language, I am not a
Computer Science master, just a very interested hobbyist.
I would not call depending on the system linker a problem. The system
linker saves us from worrying about the nitty gritty details of how
the system works.
Concretely, Clang now uses the system linker (GNU ld or MSVC link.exe)
to turn the series of object files into an executable/library. This is
not very good, because Clang depends on the whims of a completely
different project to work correctly, and this is very platform
dependent on too high a level.
What do you mean by on too high a level? Also, linkers don't really
change much, and in fact are kinda standardized, so I wouldn't use the
term whim. Dealing with linkers is a rather small part of the Clang
codebase.
Let's talk Windows, cause that's where I'm mostly concerned with this.
Cross-compiling projects (like VLC, and any GNU project) have to use a
GNU toolchain, because GNU ld is the only linker capable of producing
a win32 executable on a non-Windows OS (You are not legally allowed to
use Visual Studio or the Windows SDK on a non-Windows machine). On top
of that, it has its problems: for one, the architecture commandline
arguments are different from the GCC arguments that mean the same,
patching it requires a lot of work, as it's not a simple project, with
a lot of legacy code.
Thirdly, it requires a UNIX shell to build (which is really my biggest
gripe), which either means a toolchain needs to be cross-compiled, or
you need to use slow MSYS/Cygwin to handle that (which I currently
do), and those are not without their caveats. I agree the mingw
runtime also requires a Bash shell, but I don't think this is the
biggest hurdle to overcome. Finally, there's the "whim" part. GNU ld
is tightly bound to GCC, meaning that any change in the latter will be
reflected by a change in the first. This is fine as long as Clang
doesn't decide to do things differently. Again, I don't know the
details, but both projects may conflict someday, and then you're
either stuck with forking the "old" ld, or starting to replace it
then. I'd rather see a replacement now.
On a sidenote, the license of GNU ld is, well, GPL. I'm sure that's a
showstopper for people trying to bundle integrated linker
functionality in their commercial project.
So the current setup is (how I see it):
1. Clang compiles C/C++ to object files (either GNU *.o or MSVC *.obj
files). This happens through LLVM IR and accompanying optimization
runs of the LLVM toolchain.
2. The system linker is responsible for all the usual link stuff:
turning the object files into native binaries, doing some
link-time-optimizations while it works its magic.
How I would propose to have it in an ideal world:
1. Clang compiles C/C++ into LLVM IR.
2. LLVM toolchain stuff optimizes everything as well as it can.
3. LLVM linker (+Clang?) links together the IR files into one object
file (perhaps existing GNU .o or MSVC .obj files), executing its
link-time-optimizations in the process.
4. A *simple* tool turns the complete object file into native
executable format, adding the platform-dependent parts that are
missing from the semi-platform-agnostic file created in the previous
step. This tool can in the first steps of the implementation be the
system linker, but all it would do is do the object->executable
conversion.
Here's where the main problem is. There is no *simple* tool to to do
this. You need a full linker to turn an object file into an
executable. You have to link to the c and system libraries for any
real program, even "puts("hello world");". There's a lot of code that
gets run before main is entered.
Agreed, there is more to this than I let shine out. But one can always
hope and be naive 
Would my LTO optimization story work though? This would only be
limited to exclude external (static) libraries unfortunately.
The "linker" may be the assembler or something else, this is what I
don't know. What I do know is that this setup (if possible) provides a
way to integrate more LLVM optimizations in a C/C++/<other language of
your choice> toolchain, remove any complicated linker applications,
and be easily extensible to new platforms.
I understand the hand-wavingness of this whole story, but any comments
or thoughts on what is wrong in my reasoning or not as simple as it
seems are very welcome.
Thanks!
Ruben
I agree that LLVM should have a linker, and I am currently writing one
(see Object Files in LLVM from last year's dev meeting). I intend for
it to replace the system linker on the major platforms (Win, Linux,
FBSD, Mac).
Aha, so that presentation did turn out to be a WIP project! That's
great news! Would you have a timeline on a usable form of hat project
;)? I would offer to help, but I fear that my knowledge is not near
enough to be useful.
Thanks!