clang building linux

hello folks,

given the recent interest both on the list and elsewhere in building a working
linux kernel, here's my 2 cents. i began this work some half a year ago when
2.7 came out but got held up by other projects so i could only finish it recently.

my approach is different from others who have been working on this in that i
went for patching linux itself in order to compile and link with clang properly.
it turns out that with a hundred or so lines patched in linux and a recent clang
(read: use svn HEAD) it's very easy to build a working kernel now. obviously some
of these patches are workarounds for features lacking in clang so the right
approach there is to change clang. some patches are needed for linux bugs, there's
nothing clang can (or should) do about them i think. here's a summary of the issues
i ran into in no particular order:

1. early boot code and .codegcc16/mregparm

   i'm not sure if it's codegcc16 or not, but something makes clang ignore
   -mregparm when compiling the early linux boot code so there'll be a mismatch
   between how arguments are passed from C code and how assembly code expects
   them. the workaround is to explicitly annotate some functions with the attribute.

2. probably related to the above, __builtin_memcpy and __builtin_memset also
   ignore -mregparm and cause the same kind of trouble at runtime so i worked it
   around by using explicit inline asm.

3. sse code in kernel

   in general linux is already built with -mno-sse and others but some Makefiles
   such as the x86 boot code forget to use it with bad consequences for early boot
   (read: the kernel doesn't even decompress ;).

4. unused variable/function elimination

   it seems that clang is more aggressive than gcc and eliminates more actually
   required data/code than desired. earliest causalty is the boot code as usual
   but there're also some module parameter related structures affected. the fix
   is needed on the linux side of course.

5. asm 'p' constraint

   this was fixed last week in subversion, so i'm omitting the patch for it, but
   if someone really wants to use an earlier clang (such as the 2.8 release), then
   just duplicate the percpu_read macro into percpu_read_stable.

6. .gnu.linkonce.d.* section usage

   it seems that clang can emit code/data into sections that the linux linker
   scripts were not aware of.

7. extern and __attribute__((visibility("hidden"))) usage in the vdso

   it seems that this construct doesn't work with clang so i worked it around for
   now by abusing the weak attribute and the linker's ability to merge such symbols.

8. const merging in the vdso

   possibly related to the above, the linker(?) merges const variables when their
   value is the same which, while technically correct, defeats some self-checking
   code in the vdso so i had to deconstify the affected variables.

9. lack of __label__ support

   linux needs this for implementing an arch-independent way to acquire the current
   program counter or something close to it at least, for now the workaround is an
   arch specific inline asm block.

10. clang crash on __verify_pcpu_ptr use

    when compiling i think init/main.c, clang crashes on the above macro. i tried to
    extract a minimal example but that failed to produce any errors, so probably there
    is more context needed to trigger the segfault. interestingly, the workaround for
    getting this compiled was to turn the body of the macro into a statement expression
    but otherwise it's the same code inside.

11. excessive inlining and stack usage

    while apparently gcc and clang make different inlining decisions, they're both
    bad at reusing the stack for the local variables of the inlined functions and
    sometimes produce high stack usage. linux already has an explicit way to prevent
    such undesired inlining, i just had to annotate a few more functions (but it's
    not meant to be exhaustive, it's based on my own config only).

11. uninitialized variable handling

    this one was a fun one to debug (no :P). apparently the getdents code computes
    a structure offset by computing a pointer difference - where the pointer in
    question is uninitialized. gcc seemingly manages to produce the desired offset
    whereas clang produces a 0 for the uninitialized pointers and hence for their
    difference as well, resulting in getdents not returning any entries in this
    particular case. very funny when you enter a directory but cannot list its
    content, although initramfs scripts tend not to appreciate it :). fortunately
    clang --analyze warns about such problems but then it crashes on a few more
    constructs so it's not an entirely painless exercise to go through the whole
    tree looking for such uninitialized variable usage (i checked most things but
    drivers/ and the non-x86 arch subtrees).

12. variable length arrays in crypto/netfilter/crc

    this is an already known issue (in that clang is not going to support this
    gcc extension), so the workaround/fix was to rewrite the linux code.

13. ignoring -fcall-saved-xxx

    it seems that clang for some reason ignores -fcall-saved-xxx and miscompiles some
    code relying on it (lib/hweight.c) so as a workaround i removed this optimization
    from linux but obviously clang should be fixed instead.

beyond the above fixes here and there, there're some opportunities to make better
use of clang specific features as well, so if anyone feels inclined... :wink:

14. clang's address_space attribute extension

    this would probably allow to simplify all the x86 per-cpu accessors (ditto
    for userland btw).

15. fix analyzer crashes

    as i mentioned above, there're a few constructs that make the analyzer crash
    on the linux tree, it'd probably be easy to fix them for someone familiar with
    the internals. the easiest way to run the analyzer (and to reproduce the problems)
    is to issue make CC=.../clang C=2 CHECK="clang --analyze" .

16. fix issues found by clang --analyze

    this is a bigger undertaking as the false positive ratio is quite low in my
    experience and there're many issues it finds (mostly unused variables or useless
    variable writes that sometimes can point at deeper issues such as not doing
    anything with error return values but i saw also potential NULL derefs).

17. extend the analyzer to understand the sparse defines

    sparse is a standalone static analyzer built for linux and several important
    subsystems have already been properly marked up for sparse analysis so it'd be
    nice if clang could make use of this information (in fact, some analysis could
    probably be done at normal compile time already since the checks are cheap).

cheers,
  PaX Team

pax-linux-2.6.35.7-test25-clang-only.patch (29 KB)

hello folks,

Hello :). I've heard many good things about you guys.

given the recent interest both on the list and elsewhere in building a working
linux kernel, here's my 2 cents. i began this work some half a year ago when
2.7 came out but got held up by other projects so i could only finish it
recently.

my approach is different from others who have been working on this in that i
went for patching linux itself in order to compile and link with clang
properly. it turns out that with a hundred or so lines patched in linux and a
recent clang (read: use svn HEAD) it's very easy to build a working kernel
now. obviously some of these patches are workarounds for features lacking in
clang so the right approach there is to change clang. some patches are needed
for linux bugs, there's nothing clang can (or should) do about them i think.
here's a summary of the issues i ran into in no particular order:

1. early boot code and .codegcc16/mregparm

   i'm not sure if it's codegcc16 or not, but something makes clang ignore
   -mregparm when compiling the early linux boot code so there'll be a
mismatch between how arguments are passed from C code and how assembly code
expects them. the workaround is to explicitly annotate some functions with
the attribute.

I have added support for mregparm; it's crude, and needs to be cleaned up, but
the early boot code now works.

2. probably related to the above, __builtin_memcpy and __builtin_memset also
   ignore -mregparm and cause the same kind of trouble at runtime so i worked
it around by using explicit inline asm.

I'm curious as to where this was causing problems?

3. sse code in kernel

   in general linux is already built with -mno-sse and others but some
Makefiles such as the x86 boot code forget to use it with bad consequences
for early boot (read: the kernel doesn't even decompress ;).

Yah, that one took me a little while to figure out :x.

4. unused variable/function elimination

   it seems that clang is more aggressive than gcc and eliminates more
actually required data/code than desired. earliest causalty is the boot code
as usual but there're also some module parameter related structures affected.
the fix is needed on the linux side of course.

I'm working on a static analysis to show us where this is.

5. asm 'p' constraint

   this was fixed last week in subversion, so i'm omitting the patch for it,
but if someone really wants to use an earlier clang (such as the 2.8
release), then just duplicate the percpu_read macro into percpu_read_stable.

I have this fixed locally, not sure if I pulled this change in (I'm thinking I
didn't).

6. .gnu.linkonce.d.* section usage

   it seems that clang can emit code/data into sections that the linux linker
   scripts were not aware of.

I've been stumped by this one. I'm surprised no one's run into it before.

7. extern and __attribute__((visibility("hidden"))) usage in the vdso

   it seems that this construct doesn't work with clang so i worked it around
for now by abusing the weak attribute and the linker's ability to merge such
symbols.

I also took this course of action, though I'm hoping to try and make this work in
clang properly.

8. const merging in the vdso

   possibly related to the above, the linker(?) merges const variables when
their value is the same which, while technically correct, defeats some
self-checking code in the vdso so i had to deconstify the affected variables.

Thanks! This was very helpful.

9. lack of __label__ support

   linux needs this for implementing an arch-independent way to acquire the
current program counter or something close to it at least, for now the
workaround is an arch specific inline asm block.

Alp Toker has a patch for this (patch to clang).

10. clang crash on __verify_pcpu_ptr use

    when compiling i think init/main.c, clang crashes on the above macro. i
tried to extract a minimal example but that failed to produce any errors, so
probably there is more context needed to trigger the segfault. interestingly,
the workaround for getting this compiled was to turn the body of the macro
into a statement expression but otherwise it's the same code inside.

Huh. I've never run into this. Would you be willing to pass along the build
environment (configuration, platform, compiler version, etc)?

11. excessive inlining and stack usage

    while apparently gcc and clang make different inlining decisions, they're
both bad at reusing the stack for the local variables of the inlined
functions and sometimes produce high stack usage. linux already has an
explicit way to prevent such undesired inlining, i just had to annotate a few
more functions (but it's not meant to be exhaustive, it's based on my own
config only).

11. uninitialized variable handling

    this one was a fun one to debug (no :P). apparently the getdents code
computes a structure offset by computing a pointer difference - where the
pointer in question is uninitialized. gcc seemingly manages to produce the
desired offset whereas clang produces a 0 for the uninitialized pointers and
hence for their difference as well, resulting in getdents not returning any
entries in this particular case. very funny when you enter a directory but
cannot list its content, although initramfs scripts tend not to appreciate
it :). fortunately clang --analyze warns about such problems but then it
crashes on a few more constructs so it's not an entirely painless exercise to
go through the whole tree looking for such uninitialized variable usage (i
checked most things but drivers/ and the non-x86 arch subtrees).

Nice!

12. variable length arrays in crypto/netfilter/crc

    this is an already known issue (in that clang is not going to support this
    gcc extension), so the workaround/fix was to rewrite the linux code.

Yah, I did the same thing, pretty straightforward.

13. ignoring -fcall-saved-xxx

    it seems that clang for some reason ignores -fcall-saved-xxx and
miscompiles some code relying on it (lib/hweight.c) so as a workaround i
removed this optimization from linux but obviously clang should be fixed
instead.

I have a workaround for hweight.c, but I'm also working on adding support for
- -fcall-saved-xxx.

beyond the above fixes here and there, there're some opportunities to make
better use of clang specific features as well, so if anyone feels
inclined... :wink:

14. clang's address_space attribute extension

    this would probably allow to simplify all the x86 per-cpu accessors (ditto
    for userland btw).

I don't think we'd get upstream to accept these, but +1 from me.

15. fix analyzer crashes

    as i mentioned above, there're a few constructs that make the analyzer
crash on the linux tree, it'd probably be easy to fix them for someone
familiar with the internals. the easiest way to run the analyzer (and to
reproduce the problems) is to issue make CC=.../clang C=2 CHECK="clang
--analyze" .

I'm somewhat familiar with the Clang internals, I'll look into this sometime
this weekend, hopefully.

16. fix issues found by clang --analyze

    this is a bigger undertaking as the false positive ratio is quite low in
my experience and there're many issues it finds (mostly unused variables or
useless variable writes that sometimes can point at deeper issues such as not
doing anything with error return values but i saw also potential NULL derefs).

17. extend the analyzer to understand the sparse defines

    sparse is a standalone static analyzer built for linux and several
important subsystems have already been properly marked up for sparse analysis
so it'd be nice if clang could make use of this information (in fact, some
analysis could probably be done at normal compile time already since the
checks are cheap).

It sounds like we (whoever's working on this at Pax and myself) could benefit
from trading information and code here. I'm pretty comfortable with the Clang
internals and less familiar with the Linux Kernel, and it sounds like you guys
are more familiar with the Kernel and less familiar with Clang's internals.

Thanks for posting this information. You've shown me a subtler number of things
that I missed!

- --
Bryce Lelbach aka wash
http://groups.google.com/group/ariel_devel

hi,

> 2. probably related to the above, __builtin_memcpy and __builtin_memset also
> ignore -mregparm and cause the same kind of trouble at runtime so i worked
> it around by using explicit inline asm.

I'm curious as to where this was causing problems?

there's a structure copy in arch/x86/boot/memory.c (see the patch) that in clang
ends up calling the builtin (vs. an inlined 'rep movs' in gcc) and since the calling
convention was wrong on it, i had to make it an explicit memcpy (that'd call my
own quick hack in turn).

> 4. unused variable/function elimination
>
> it seems that clang is more aggressive than gcc and eliminates more
> actually required data/code than desired. earliest causalty is the boot code
> as usual but there're also some module parameter related structures affected.
> the fix is needed on the linux side of course.

I'm working on a static analysis to show us where this is.

how are you going to do it? in the boot code the problem is that linux (ab)uses
the linker script to build an otherwise C level array (some video mode setting
stuff) where each element is static in its respective compilation unit (and all
the functions referenced from there are static too) so all a static analyzer
could notice is that there's not much useful code/data emitted into the object
file, it won't know that at a higher level (and in a separate compilation unit)
those omitted structures/etc would actually be used.

> 5. asm 'p' constraint
>
> this was fixed last week in subversion, so i'm omitting the patch for it,
> but if someone really wants to use an earlier clang (such as the 2.8
> release), then just duplicate the percpu_read macro into percpu_read_stable.

I have this fixed locally, not sure if I pulled this change in (I'm thinking I
didn't).

it was fixed last week in svn.

> 10. clang crash on __verify_pcpu_ptr use
>
> when compiling i think init/main.c, clang crashes on the above macro. i
> tried to extract a minimal example but that failed to produce any errors, so
> probably there is more context needed to trigger the segfault. interestingly,
> the workaround for getting this compiled was to turn the body of the macro
> into a statement expression but otherwise it's the same code inside.

Huh. I've never run into this. Would you be willing to pass along the build
environment (configuration, platform, compiler version, etc)?

i just use svn head of llvm/clang and whatever linux/pax version is the latest.
i'll mail you a config but i don't think there's anything special in it, it's
a basic amd64/smp setup and in general, the linux percpu code doesn't depend on
.config much.

> 14. clang's address_space attribute extension
>
> this would probably allow to simplify all the x86 per-cpu accessors (ditto
> for userland btw).

I don't think we'd get upstream to accept these, but +1 from me.

given that there're compiler specific headers to support various features in
them, i'd say this is no different and has a fair chance albeit the changes
needed will require some good abstraction of the x86 percpu accessor macros.

It sounds like we (whoever's working on this at Pax and myself) could benefit
from trading information and code here. I'm pretty comfortable with the Clang
internals and less familiar with the Linux Kernel, and it sounds like you guys
are more familiar with the Kernel and less familiar with Clang's internals.

sure, no problem, although my goal is to eventually catch up with the llvm/clang
side of things as well ;).

Please do file bug reports on compiler problems. Sometimes they get fixed.

pax-linux-2.6.35.7-test25-clang-only.patch (29 KB)