Clang builds a working Linux Kernel (Boots to RL5 with SMP, networking and X, self hosts)

Clang can now compile a functional Linux Kernel (version 2.6.36, SMP).

General Details

This is pretty awesome.

-eric

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Clang can now compile a functional Linux Kernel (version 2.6.36, SMP).

Very cool!

I'm going to try to clean up some of my modifications to Clang (mostly hacks in
CodeGen stuff, local labels (not 100% done yet), explicit register variables,
a more complete implementation of GNU inline assembly constraints). Some of my
changes implement the sort of cryptic GNUtensions that I sense most Clang devs
would find distasteful (I haven't added support for anything explicitly stated
as unsupported on the clang website).

We do have the -fheinous-gnu-extensions option, if we are forced to implement something in Clang that is exceedingly tasteless.

I don't know much about the Linux kernel community, but I've found that many other projects have been very receptive to fixes that get their code building with Clang.

  - Doug

  * The kernel can successfully boot to runlevel 3 on a secondary test
machine, a microATX desktop box (Intel Atom). I haven't tried to start X on
this box yet.

Tested this box with the latest changes, everything's working well :).

  * SELinux, Posix ACLs, IPSec, eCrypt, anything that uses the crypto API -
None of these will compile, due to either an ICE or variable-length arrays in
    structures (don't remember which, it's in my notes somewhere). If it's
    variable-length arrays or another intentionally unsupported GNUtension,
I'm hoping it's just used in some isolated implementation detail (or details),
    and not a fundamental part of the crypto API (honestly just haven't had a
    chance to dive into the crypto source yet). I'm really hoping it's an
issue in Clang, though, as it's easier for me to hack Clang and I'm trying to
    avoid kernel patches as much as possible.

Most of SELinux compiles. POSIX ACLs, IPSec and eCrypt are all fully functional.

  * IPv6 and Netfilters/Router stuff - Some of this is tied to the above
issues with the crypto API, but IPv6 and Netfilters each have their own fatal
    errors.

Clang can now compile the complete Linux network stack.

  * VDSO - VDSO breaks in strange ways with clang, at least, it did a week ago
    when I put some time into investigating this. ATM, building VDSO with GCC
    works, but I believe that this is still causing issues. I think the issue
    here is similiar in nature to the issue with LKMs.

VDSO compiles with Clang.

  * Boot - The very early kernel boot code breaks with clang, because of
obscure inline assembly GNUtensions (.code16gcc stuff). I have no clue what
needs to be done to fix this, but as I actually know where this problem is,
it should be (relatively) easy to fix.

The boot subsystem can be built with Clang.

  * Modules - Module loading is totally broken. I'm pretty sure I just figured
    out why, though (I'd elaborate, but this is pretty lengthy, and I might as
    well just go implement it).

Loadable kernel modules are functional.

Also, SMP issues have been dealt with. I've found a number of suitable unit-/
conformance-/functionality-/performance-/stress- tests which I intend to run on
the latest Clang-compiled kernel, and GCC-compiled kernel with the same
configuration.

GCC is no longer needed at all during the build process. GNU ld is still needed;
GNU's assembler is not. Linux successfully compiles with the integrated-assembler.

Question: Are there any open source linkers out there aside from GNU ld, that
aren't derived from GNU ld? I don't have anything against GNU ld, I'm just
curious about other options.

- --
Bryce Lelbach aka wash
http://groups.google.com/group/ariel_devel

My apologizes for double posting. I would be remissing in neglecting to mention
that I have done very little work here, relative to the efforts of the community.
In particular, Alp Toker and his team at Nuanti were instrumental in getting us
to this stage.

I should have performance-/functionality-/stress- test results (run on two
kernels, each built with the same configuration, one with the GNU toolchain,
one with Clang).

Unfortunately, my web server is currently not externally accessible due to
networking issues at my school (not likely to be cleared up for a week or so).
Any suggestions for a respectable hosting location? I would use github (one of
the few hosts I'm comfortable with), but it doesn't seem ideal for hosting web
pages.

I am not going to be making the build process and necessary source code openly
available for a few days. I need to run regressions and unit-tests on
clang-built Linux kernels, which I'm sure will uncover subtler issues that need
to be fixed.

- --
Bryce Lelbach aka wash
http://groups.google.com/group/ariel_devel

Bryce Lelbach wrote:

Question: Are there any open source linkers out there aside from GNU ld, that
aren't derived from GNU ld?

Yes - sun's ld is open source and works *very* well. It supports x86/amd64/sparc natively and I believe has also been ported to ARM and ppc. If anyone is serious about this (maybe some of the BSD folks) I'd maybe be happy to find the cycles and create a standalone tarball.

Technically the new gold linker is written mostly by Google people. I'd not put it in the same category as most "GNU" tools, but it is bundled with binutils and GPL.

I think the BSD folks may be interested. Which license covers SUN ld?

FreeBSD use GNU ld currently but don't like the GPLv3 license that binutils has switched to, so wew're at binutils v 2.15 (soon 2.17). gold has always been GPLv3. LLVM has llvm-ld, but there were problems with it last time I asked here (with native code generation, I think). There's a linker planned for the elftoolchain project, but it's still only a Trac ticket as far as I know.

I'd like to play with LTO for FreeBSD builds using Clang, but the options are limited as far as I can see.

Erik

Sun ld is CDDL, and I believe that the FreeBSD policy is still that CDDL stuff should only be used for optional parts of the base system. Whether this includes developer tools is not clear. OpenBSD has less strict requirements for developer tools than for the rest of the base system, but I'm not sure that FreeBSD is quite as relaxed.

That said, the version of GNU ld in the base system is quite old, and the alternatives are either GPLv3 or CDDL, so I'm not sure which is less unattractive to the FreeBSD team. The third alternative is to write a new linker, which is a possibility if someone's willing to pay for it, but probably not something anyone (even me) would do entirely for fun.

David

-- Sent from my Difference Engine

Now that LLVM is supposedly able to emit ELF object files without assistance from external tools, what is holding back llvm-ld from being used to produce the final executable? Is llvm-ld a priority of the MC subproject?

Erik

sorry forgot to send to list.

Now that LLVM is supposedly able to emit ELF object files without assistance from external tools, what is holding back llvm-ld from being used to produce the final executable? Is llvm-ld a priority of the MC subproject?

LLVM should be able to produce .o files now. To produce .so and
executable files it needs to be able implement the relocations instead
of just listing them. To replace the system linker you will also need
to be able to read ELF and implement support for linker scripts.

Erik

Cheers,
Rafael

sorry forgot to send to list.

I think the BSD folks may be interested. Which license covers SUN ld?

FreeBSD use GNU ld currently but don't like the GPLv3 license that binutils has switched to, so wew're at binutils v 2.15 (soon 2.17). gold has always been GPLv3. LLVM has llvm-ld, but there were problems with it last time I asked here (with native code generation, I think). There's a linker planned for the elftoolchain project, but it's still only a Trac ticket as far as I know.

I'd like to play with LTO for FreeBSD builds using Clang, but the options are limited as far as I can see.

Sun ld is CDDL, and I believe that the FreeBSD policy is still that CDDL stuff should only be used for optional parts of the base system. Whether this includes developer tools is not clear. OpenBSD has less strict requirements for developer tools than for the rest of the base system, but I'm not sure that FreeBSD is quite as relaxed.

That said, the version of GNU ld in the base system is quite old, and the alternatives are either GPLv3 or CDDL, so I'm not sure which is less unattractive to the FreeBSD team. The third alternative is to write a new linker, which is a possibility if someone's willing to pay for it, but probably not something anyone (even me) would do entirely for fun.

Now that LLVM is supposedly able to emit ELF object files without assistance from external tools, what is holding back llvm-ld from being used to produce the final executable? Is llvm-ld a priority of the MC subproject?

as far as i remember llvm-ld is a script, it either produces a script
to run the bitcode file with a stub executable, or in native mode it
uses llc, the system assembler and linker to produce an executable, it
was a convenience tool not a linker.

Thanks. I just realized it's also stated in the manual for llvm-ld (http://llvm.org/cmds/llvm-ld.html): "Native code generation is performed by converting the linked bitcode into native assembly (.s) or C code and running the system compiler (typically gcc) on the result."

  I wish we would develop a linker
in this project, the guy who goes by the name Bigcheese on the IRC
channel is beginning work on one AFAIK

Indeed. Here's the announcement of the Object File Library: http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/thread.html#34412

I'm looking forward to this!

Erik

sorry forgot to send to list.

I think the BSD folks may be interested. Which license covers SUN ld?

FreeBSD use GNU ld currently but don't like the GPLv3 license that binutils has switched to, so wew're at binutils v 2.15 (soon 2.17). gold has always been GPLv3. LLVM has llvm-ld, but there were problems with it last time I asked here (with native code generation, I think). There's a linker planned for the elftoolchain project, but it's still only a Trac ticket as far as I know.

I'd like to play with LTO for FreeBSD builds using Clang, but the options are limited as far as I can see.

Sun ld is CDDL, and I believe that the FreeBSD policy is still that CDDL stuff should only be used for optional parts of the base system. Whether this includes developer tools is not clear. OpenBSD has less strict requirements for developer tools than for the rest of the base system, but I'm not sure that FreeBSD is quite as relaxed.

That said, the version of GNU ld in the base system is quite old, and the alternatives are either GPLv3 or CDDL, so I'm not sure which is less unattractive to the FreeBSD team. The third alternative is to write a new linker, which is a possibility if someone's willing to pay for it, but probably not something anyone (even me) would do entirely for fun.

Now that LLVM is supposedly able to emit ELF object files without assistance from external tools, what is holding back llvm-ld from being used to produce the final executable? Is llvm-ld a priority of the MC subproject?

as far as i remember llvm-ld is a script, it either produces a script
to run the bitcode file with a stub executable, or in native mode it
uses llc, the system assembler and linker to produce an executable, it
was a convenience tool not a linker. I wish we would develop a linker
in this project, the guy who goes by the name Bigcheese on the IRC
channel is beginning work on one AFAIK

isn't apples ld64 opensource?

Kind of. This is Apple's Open Source: release some core component sources, but keep dependencies closed to prevent user to rebuild the software.

They never released libunwind for instance. This is pretty silly, as the latter is part of lldb (until lldb switch to its own unwinder) , and was released with it.

-- Jean-Daniel

It may be (I've not checked), but it's not useful. Darwin uses Mach-O, but everyone else in *NIX-land uses ELF, so a Mach-O linker would not be much use for *BSD (or for Linux or Windows).

David

-- Sent from my Apple II

Yes ! And before writing a linker you need a low-level library that
manipulate and do abstraction of binary objects formats.
For example in gnu ld it's called "libbfd".

An alternative to "libbfd" for llvm project is proposed by bigcheese here:
http://lists.cs.uiuc.edu/pipermail/llvmdev/2010-September/034412.html

Quick update on this, for those interested:

Clang can now compile the majority of drivers and all of the kernel (save Xen)
without problem. The Debian default kernel configuration compiles and boots. We
are still having some intermittent issues (functions omitted by clang because they
are only called in inline asm), but I think these will be worked out in a few
days.

A Debian package is available on github, with a Clang compiled kernel. The kernel
is a no-preemption build, with a complete network stack, crypto API, SELinux,
SMP, KVM, etc (the only thing missing is Xen, basically). The package includes
about 80% of the drivers shipped with Linux, mostly compiled as loadable kernel
modules (2390 loadable kernel modules in total in this build). Basically, once we
get the remaining omitted function issues worked out, you'd be able to use a
binary Clang Linux kernel such as this one as a stand-in replacement for just
about any typical Linux distro's shipped kernel.

http://github.com/brycelelbach/lll/downloads

Install at your own risk. If you get any kernel oops (you'll know if you do),
write down/record the RIP address and the kernel symbol associated with the oops.

I had to hack the following drivers to get them to compile, so take extra caution
if you're considering trying this out on a box with any of these:

  * pmcraid
  * et131x
  * Thinkpad laptops
  * wimax i2400m

Instructions for building from source are available at the github referenced
above. I would advise against it; the process takes two to two and a half hours
on relatively modern machines. You need to do "git submodule init" and then
"git submodule update" after checking out the repository (or something like that).

The Clang and LLVM forks merge changes from the Clang and LLVM git mirrors every
three days (well, I merge the changes by hand, but the point is, it happens every
three days, more or less). The Clang and LLVM git mirrors are updated daily.
I merge changes from upstream Linux whenever I feel like having a migraine.

ATM, extensively benchmarking/regression testing is not really an option; I simply
don't have the CPU power to be able to work on this and run benchmarks. The fastest
of my three machines compiles a full Kernel + drivers in about an hour with Clang,
and it's a good bit longer with GCC.

Big thanks to PaX team and Alp Toker/Nuanti for all the work they've put into this
and their willingness to share it with me.