given the recent interest both on the list and elsewhere in building a working
linux kernel, here's my 2 cents. i began this work some half a year ago when
2.7 came out but got held up by other projects so i could only finish it recently.
my approach is different from others who have been working on this in that i
went for patching linux itself in order to compile and link with clang properly.
it turns out that with a hundred or so lines patched in linux and a recent clang
(read: use svn HEAD) it's very easy to build a working kernel now. obviously some
of these patches are workarounds for features lacking in clang so the right
approach there is to change clang. some patches are needed for linux bugs, there's
nothing clang can (or should) do about them i think. here's a summary of the issues
i ran into in no particular order:
1. early boot code and .codegcc16/mregparm
i'm not sure if it's codegcc16 or not, but something makes clang ignore
-mregparm when compiling the early linux boot code so there'll be a mismatch
between how arguments are passed from C code and how assembly code expects
them. the workaround is to explicitly annotate some functions with the attribute.
2. probably related to the above, __builtin_memcpy and __builtin_memset also
ignore -mregparm and cause the same kind of trouble at runtime so i worked it
around by using explicit inline asm.
3. sse code in kernel
in general linux is already built with -mno-sse and others but some Makefiles
such as the x86 boot code forget to use it with bad consequences for early boot
(read: the kernel doesn't even decompress ;).
4. unused variable/function elimination
it seems that clang is more aggressive than gcc and eliminates more actually
required data/code than desired. earliest causalty is the boot code as usual
but there're also some module parameter related structures affected. the fix
is needed on the linux side of course.
5. asm 'p' constraint
this was fixed last week in subversion, so i'm omitting the patch for it, but
if someone really wants to use an earlier clang (such as the 2.8 release), then
just duplicate the percpu_read macro into percpu_read_stable.
6. .gnu.linkonce.d.* section usage
it seems that clang can emit code/data into sections that the linux linker
scripts were not aware of.
7. extern and __attribute__((visibility("hidden"))) usage in the vdso
it seems that this construct doesn't work with clang so i worked it around for
now by abusing the weak attribute and the linker's ability to merge such symbols.
8. const merging in the vdso
possibly related to the above, the linker(?) merges const variables when their
value is the same which, while technically correct, defeats some self-checking
code in the vdso so i had to deconstify the affected variables.
9. lack of __label__ support
linux needs this for implementing an arch-independent way to acquire the current
program counter or something close to it at least, for now the workaround is an
arch specific inline asm block.
10. clang crash on __verify_pcpu_ptr use
when compiling i think init/main.c, clang crashes on the above macro. i tried to
extract a minimal example but that failed to produce any errors, so probably there
is more context needed to trigger the segfault. interestingly, the workaround for
getting this compiled was to turn the body of the macro into a statement expression
but otherwise it's the same code inside.
11. excessive inlining and stack usage
while apparently gcc and clang make different inlining decisions, they're both
bad at reusing the stack for the local variables of the inlined functions and
sometimes produce high stack usage. linux already has an explicit way to prevent
such undesired inlining, i just had to annotate a few more functions (but it's
not meant to be exhaustive, it's based on my own config only).
11. uninitialized variable handling
this one was a fun one to debug (no :P). apparently the getdents code computes
a structure offset by computing a pointer difference - where the pointer in
question is uninitialized. gcc seemingly manages to produce the desired offset
whereas clang produces a 0 for the uninitialized pointers and hence for their
difference as well, resulting in getdents not returning any entries in this
particular case. very funny when you enter a directory but cannot list its
content, although initramfs scripts tend not to appreciate it :). fortunately
clang --analyze warns about such problems but then it crashes on a few more
constructs so it's not an entirely painless exercise to go through the whole
tree looking for such uninitialized variable usage (i checked most things but
drivers/ and the non-x86 arch subtrees).
12. variable length arrays in crypto/netfilter/crc
this is an already known issue (in that clang is not going to support this
gcc extension), so the workaround/fix was to rewrite the linux code.
13. ignoring -fcall-saved-xxx
it seems that clang for some reason ignores -fcall-saved-xxx and miscompiles some
code relying on it (lib/hweight.c) so as a workaround i removed this optimization
from linux but obviously clang should be fixed instead.
beyond the above fixes here and there, there're some opportunities to make better
use of clang specific features as well, so if anyone feels inclined...
14. clang's address_space attribute extension
this would probably allow to simplify all the x86 per-cpu accessors (ditto
for userland btw).
15. fix analyzer crashes
as i mentioned above, there're a few constructs that make the analyzer crash
on the linux tree, it'd probably be easy to fix them for someone familiar with
the internals. the easiest way to run the analyzer (and to reproduce the problems)
is to issue make CC=.../clang C=2 CHECK="clang --analyze" .
16. fix issues found by clang --analyze
this is a bigger undertaking as the false positive ratio is quite low in my
experience and there're many issues it finds (mostly unused variables or useless
variable writes that sometimes can point at deeper issues such as not doing
anything with error return values but i saw also potential NULL derefs).
17. extend the analyzer to understand the sparse defines
sparse is a standalone static analyzer built for linux and several important
subsystems have already been properly marked up for sparse analysis so it'd be
nice if clang could make use of this information (in fact, some analysis could
probably be done at normal compile time already since the checks are cheap).