linux/i386 and mregparm

Hi folks,

continuing the effort of getting clang to compile the linux kernel, i recently
got it to work on i386 as well. in the following i'll describe some issues i
ran into.

1. patching linux

the first attached patch is against linux 2.6.36.4 (well, it's against the
PaX patch but should mostly apply to vanilla as well). it should also work
on amd64 but i didn't try it this time. compared to a few months ago, there
are some changes needed on both sides again.

on the linux side, clang (albeit inadvertantly) caught a bug where a function
declaration had a different section attribute than its definition and apparently
clang takes the former while gcc takes the latter into account in the end (and
neither warned about the fact ;). other changes in linux were needed due to clang
bugs and features that i'll elaborate on in the next sections.

2. patching clang

the second attached patch is against r126479 and is needed to fix a bug that
was introduced, or more precisely, unearthed by the recent -mregparm support
patch. in particular, before -mregparm was supported, the only way to change
a function's parameter passing convention was to use a function attribute, so
in lieu of such an attribute the default regparm=0 was used.

consequently, specifying an explicit regparm=0 attribute had no real effect
and clang had no special treatment for this case. now with -mregparm the default
regparm value can be changed for a given compilation unit therefore the old
assumption that the default regparm value is 0 no longer holds. this becomes
a problem when one uses a non-0 mregparm and wants to override it to 0 for some
functions.

just such a case arises with linux (due to another clang/llvm bug/feature, see
below) so this needed fixing to get a proper kernel. i hope i got the implementation
right, and would appreciate help with writing a test case as i'm not familiar
with the test system.

3. builtin functions and regparm interaction

the above mentiond bug/feature is documented in bug #3997 already, basically
the issue is that when llvm emits builtin functions calls, it has to use some
calling convention and arguably the desired regparm value may not be available
or even well defined at that stage. to this i'd like to add two observations:

- with mregparm support it'd be possible to pass this value down to the llvm
   layer and take it into account when generating said functions calls,

- gcc does this already albeit it gives a warning when in hosted mode, but not
   in freestanding mode

since for now builtins are always emitted with regparm=0, normal callers must
follow the same so in linux these functions need an explicit attribute to override
the mregparm=3 as used for everything else in the kernel. this is manageable but
was somewhat annoying to debug one by one :wink: and also i don't think it'd be accepted
upstream.

4. unimplemented gcc command line switches

there're some switches that are used by the linux Makefiles but not yet implemented
in clang. beyond the noise (unless surpressed with -Qunused-arguments) there's actually
a problem caused by one of them: -fno-optimize-sibling-calls. it appears that this
optimization cannot be controlled in llvm and as a result some assumptions that the
kernel makes about the call chain depth at certain points will be invalid (some
tracing/logging code wants to look up the callers 2-3 levels up and due to this
optimization it can result in the code dereferencing userland frame pointer values).
for now i worked this around at the few places i ran into so far, but this should be
fixed in llvm and clang i think.

5. -Wformat false positives

there's some recent change here which makes clang complain about a lot of format
strings that are valid in linux (being a freestanding environment and sporting its
own format string parser, not to mention extensions). i don't know what the right
solution here would be though, but this causes lots of noise during compilation
(and turning the warning off may miss real problems).

6. weak functions and optimization

it appears to be yet another bug in that when a weak function with an empty body
is encountered in a compilation unit, the optimizer assumes that that's there is
to this function and omits passing arguments to calls to the weak function (and
presumably non-empty bodies would trigger other kind of optimizations not necessarily
valid for the overrides).

obviously this is incorrect since the whole point of a weak function is that it
can be overridden in another compilation unit and therefore no assumptions can be
made about it in the optimizer. this particular problem arises in linux in a few
places as weak functions are sometimes used to implement arch specific overrides.

7. bounds checking false positives

this came up in the signal handling code, in particular there's a _NSIG_WORDS define
that's used like this:

   switch (_NSIG_WORDS) {
   case 4: /*...*/
   case 2: /*...*/
   case 1: /*...*/
   }

and the different cases index into an array. now the problem is that _NSIG_WORDS
is 2 for i386 but clang still evaluates case 4 and warns about the out-of-bounds
array accesses in there. a similar false positive arises in expressions like this:

  sizeof(long) == 8 ? /*...*/ : /*...*/ ;

where the code meant for the 64 bit archs gets evaluated even on 32 bit archs
and usually gives some warning, depending on the exact statements used. it'd be
nice to fix this somehow.

8. rip relative addressing in mcmodel=kernel

this is amd64 related but i thought i'd mention it here. the third attached patch
allows llvm to generate rip relative accesses for kernel mode code as well (this
is what gcc does too), and this in turn reduces the size of a relocatable kernel.

this may be a linux specific feature, basically as its name says, this allows the
kernel image to be loaded at a suitably aligned but otherwise arbitrary address in
memory where the kernel will relocate itself (some post-link processing collects
and creates a special section with relocation info, its size can be reduced with
rip relative addressing).

about the the commented out chunk in X86::isOffsetSuitableForCodeModel, i'm not
sure if such checking makes sense for kernel mode, so as a quick hack i just got
rid of it, but i don't know what the right solution there would be.

also i was lazy to separate it out, the llvm Makefile patch simply makes the svn
update process more verbose so that one can actually see which module is being
updated, feel free to ignore it but i found it useful for myself ;).

9. integrated-as support and linux

last but not least, it'd be nice one day to allow the use of integrated-as with
linux as well, but that requires implementing support for a few directives, i
recall pushsection/popsection at least but there was also some issue with the
assembler not detecting the proper size for bit operations (even though it could
have deduced them from the arguments like it does for some insns already).

so that's all in a nutshell, i'm now wondering if someone could tell me
  - if the mregparm fix is acceptable and can make it into 2.9
  - which problems should get a bugzilla entry (i.e., they'll be fixed one day)
  - what to do with the issues that are perhaps considered as features :wink:

cheers,

PaX Team

pax-linux-2.6.36.4-test21-clang-only.patch (48.7 KB)

clang-mregparm-fix.patch (11 KB)

llvm-amd64-kernel-pic.patch (5.59 KB)

Hi folks,

continuing the effort of getting clang to compile the linux kernel, i recently
got it to work on i386 as well. in the following i'll describe some issues i
ran into.

Have you seen http://llvm.org/bugs/show_bug.cgi?id=4068 ? Please file
bugs related to the kernel as "blocking" that bug.

1. patching linux

the first attached patch is against linux 2.6.36.4 (well, it's against the
PaX patch but should mostly apply to vanilla as well). it should also work
on amd64 but i didn't try it this time. compared to a few months ago, there
are some changes needed on both sides again.

on the linux side, clang (albeit inadvertantly) caught a bug where a function
declaration had a different section attribute than its definition and apparently
clang takes the former while gcc takes the latter into account in the end (and
neither warned about the fact ;). other changes in linux were needed due to clang
bugs and features that i'll elaborate on in the next sections.

Please file a bug for the section attribute thing.

2. patching clang

the second attached patch is against r126479 and is needed to fix a bug that
was introduced, or more precisely, unearthed by the recent -mregparm support
patch. in particular, before -mregparm was supported, the only way to change
a function's parameter passing convention was to use a function attribute, so
in lieu of such an attribute the default regparm=0 was used.

consequently, specifying an explicit regparm=0 attribute had no real effect
and clang had no special treatment for this case. now with -mregparm the default
regparm value can be changed for a given compilation unit therefore the old
assumption that the default regparm value is 0 no longer holds. this becomes
a problem when one uses a non-0 mregparm and wants to override it to 0 for some
functions.

just such a case arises with linux (due to another clang/llvm bug/feature, see
below) so this needed fixing to get a proper kernel. i hope i got the implementation
right, and would appreciate help with writing a test case as i'm not familiar
with the test system.

Please put this into a separate email; it's likely to get lost in
here. (Also, I remember seeing a reference to a similar patch, but
I'm not sure where.)

3. builtin functions and regparm interaction

the above mentiond bug/feature is documented in bug #3997 already, basically
the issue is that when llvm emits builtin functions calls, it has to use some
calling convention and arguably the desired regparm value may not be available
or even well defined at that stage. to this i'd like to add two observations:

- with mregparm support it'd be possible to pass this value down to the llvm
layer and take it into account when generating said functions calls,

- gcc does this already albeit it gives a warning when in hosted mode, but not
in freestanding mode

since for now builtins are always emitted with regparm=0, normal callers must
follow the same so in linux these functions need an explicit attribute to override
the mregparm=3 as used for everything else in the kernel. this is manageable but
was somewhat annoying to debug one by one :wink: and also i don't think it'd be accepted
upstream.

Bug 3997 covers part of it, but part of it looks like a separate
issue; please file a bug on the issue with strcpy etc.

4. unimplemented gcc command line switches

there're some switches that are used by the linux Makefiles but not yet implemented
in clang. beyond the noise (unless surpressed with -Qunused-arguments) there's actually
a problem caused by one of them: -fno-optimize-sibling-calls. it appears that this
optimization cannot be controlled in llvm and as a result some assumptions that the
kernel makes about the call chain depth at certain points will be invalid (some
tracing/logging code wants to look up the callers 2-3 levels up and due to this
optimization it can result in the code dereferencing userland frame pointer values).
for now i worked this around at the few places i ran into so far, but this should be
fixed in llvm and clang i think.

Please file a bug for fno-optimize-sibling-calls. Please file bugs
for any other useful flags that are unimplemented. Note that clang
will sometimes complain that supported arguments are irrelevant to the
requested operation.

5. -Wformat false positives

there's some recent change here which makes clang complain about a lot of format
strings that are valid in linux (being a freestanding environment and sporting its
own format string parser, not to mention extensions). i don't know what the right
solution here would be though, but this causes lots of noise during compilation
(and turning the warning off may miss real problems).

Your description doesn't make it clear why clang is warning but gcc isn't.

6. weak functions and optimization

it appears to be yet another bug in that when a weak function with an empty body
is encountered in a compilation unit, the optimizer assumes that that's there is
to this function and omits passing arguments to calls to the weak function (and
presumably non-empty bodies would trigger other kind of optimizations not necessarily
valid for the overrides).

obviously this is incorrect since the whole point of a weak function is that it
can be overridden in another compilation unit and therefore no assumptions can be
made about it in the optimizer. this particular problem arises in linux in a few
places as weak functions are sometimes used to implement arch specific overrides.

Fixed in r126720.

7. bounds checking false positives

this came up in the signal handling code, in particular there's a _NSIG_WORDS define
that's used like this:

switch (_NSIG_WORDS) {
case 4: /*...*/
case 2: /*...*/
case 1: /*...*/
}

and the different cases index into an array. now the problem is that _NSIG_WORDS
is 2 for i386 but clang still evaluates case 4 and warns about the out-of-bounds
array accesses in there. a similar false positive arises in expressions like this:

sizeof(long) == 8 ? /*...*/ : /*...*/ ;

where the code meant for the 64 bit archs gets evaluated even on 32 bit archs
and usually gives some warning, depending on the exact statements used. it'd be
nice to fix this somehow.

Please file a bug. Although, there have been very recent changes
here, so please update to trunk first.

8. rip relative addressing in mcmodel=kernel

this is amd64 related but i thought i'd mention it here. the third attached patch
allows llvm to generate rip relative accesses for kernel mode code as well (this
is what gcc does too), and this in turn reduces the size of a relocatable kernel.

this may be a linux specific feature, basically as its name says, this allows the
kernel image to be loaded at a suitably aligned but otherwise arbitrary address in
memory where the kernel will relocate itself (some post-link processing collects
and creates a special section with relocation info, its size can be reduced with
rip relative addressing).

about the the commented out chunk in X86::isOffsetSuitableForCodeModel, i'm not
sure if such checking makes sense for kernel mode, so as a quick hack i just got
rid of it, but i don't know what the right solution there would be.

also i was lazy to separate it out, the llvm Makefile patch simply makes the svn
update process more verbose so that one can actually see which module is being
updated, feel free to ignore it but i found it useful for myself ;).

I don't think mcmodel=kernel has gotten much use; that said, please
send the patch separately to the llvm-commits list.

9. integrated-as support and linux

last but not least, it'd be nice one day to allow the use of integrated-as with
linux as well, but that requires implementing support for a few directives, i
recall pushsection/popsection at least but there was also some issue with the
assembler not detecting the proper size for bit operations (even though it could
have deduced them from the arguments like it does for some insns already).

There are a couple of issues attached to
http://llvm.org/bugs/show_bug.cgi?id=4068 related to assembly; if you
have others, please file. Note that there are some issues with gas
assuming the size of an operation when it really shouldn't, and we
don't want to mimic those bugs in LLVM.

-Eli

Hi Eli,

I don't think mcmodel=kernel has gotten much use; that said, please
send the patch separately to the llvm-commits list.

I believe it's used by FreeBSD (actually this was the first "testcase"
for kernel cmodel)

Hello

about the the commented out chunk in X86::isOffsetSuitableForCodeModel, i'm not
sure if such checking makes sense for kernel mode, so as a quick hack i just got
rid of it, but i don't know what the right solution there would be.

The right solution is to leave the check there. There is a comment
describing what's going on.
The check the made specifically for kernel code model.

with rip relative addressing (which is what my patch enables even for kernel
mode) you'll get negative offsets for symbols below the current rip so this
check is wrong in that case (i learned it the hard way when the rest of the
patch still didn't produce the desired asm and had to debug it down to this
check ;).

for the non-rip relative case i don't see why accepting negative offsets is
wrong whereas it's considered correct for the small model. i.e., what is the
programming construct that
1. produces such negative offsets and
2. is correct for small mode and
3. is wrong for the kernel?

now in case there is a reason to keep this check then i don't know how to
determine at this point whether the offset to be checked is going to be used
for rip relative addressing or not.

yes, we use mcmodel=kernel for our amd64 kernel. it works just fine for us

for the non-rip relative case i don't see why accepting negative offsets is
wrong whereas it's considered correct for the small model. i.e., what is the
programming construct that
1. produces such negative offsets and
2. is correct for small mode and
3. is wrong for the kernel?

According to x86-64 ABI (see
http://www.x86-64.org/documentation/abi.pdf, section 3.5.1 for more
information), kernel resides in negative 32 bit space.
This means that if we'll have negative offset here it might just wrap
around and we won't be able to fit the stuff into 32 bit and have very
big positive addresses. See the note in that section about the offsets
wrt symbolic references.

PS: gcc behaves the same way wrt negative offsets for kernel code model.

> for the non-rip relative case i don't see why accepting negative offsets is
> wrong whereas it's considered correct for the small model. i.e., what is the
> programming construct that
> 1. produces such negative offsets and
> 2. is correct for small mode and
> 3. is wrong for the kernel?
According to x86-64 ABI (see
http://www.x86-64.org/documentation/abi.pdf, section 3.5.1 for more
information), kernel resides in negative 32 bit space.
This means that if we'll have negative offset here it might just wrap
around and we won't be able to fit the stuff into 32 bit and have very
big positive addresses. See the note in that section about the offsets
wrt symbolic references.

the same is true for the positive 32 (well, 31) bit address space where
the main executable resides yet clang doesn't forbid *all* positive offsets
on grounds that they may wrap and address the negative address space.

PS: gcc behaves the same way wrt negative offsets for kernel code model.

example from an amd64 linux kernel:

fff8305dd7e: 48 8b 35 eb 05 e5 ff mov -0x1afa15(%rip),%rsi # ffffffff82eae370 <key_type_dns_resolver>

this negative offset is what is caught by the existing check. now whether
it should end up in there or not, i can't tell (it may be a bug), but as
i said, i needed a quick hack to get rip relative addressing to work so i
just went with disabling this range check. if there's a better, i'm all
ears ;).

with rip relative addressing (which is what my patch enables even for kernel
mode) you'll get negative offsets for symbols below the current rip so this
check is wrong in that case (i learned it the hard way when the rest of the
patch still didn't produce the desired asm and had to debug it down to this
check ;).

Are you sure you're not mixing the code model and PIC relocation model?
For it seems you're looking for PIC codegen.

i'm not mixing them up in that i know they're different things, but i do
(want to) mix them in the sense that i'd like clang/llvm to generate PIC
for the kernel code model as well, as this is what gcc does.

its practical use is that the size of the relocatable kernel (maybe it's
a linux specific feature that noone else really cares about :wink: is much
reduced.

the same is true for the positive 32 (well, 31) bit address space where
the main executable resides yet clang doesn't forbid *all* positive offsets
on grounds that they may wrap and address the negative address space.

Not at all! Everything is perfectly as allowed by the ABI (this the
code just few
lines upper the code you commented out):

  // For small code model we assume that latest object is 16MB before end of 31
  // bits boundary. We may also accept pretty large negative constants knowing
  // that all objects are in the positive half of address space.
  if (M == CodeModel::Small && Offset < 16*1024*1024)
    return true;

fff8305dd7e: 48 8b 35 eb 05 e5 ff mov -0x1afa15(%rip),%rsi # ffffffff82eae370 <key_type_dns_resolver>

This is completely different thing. The hook is used to fill the
signed offset field of the global symbol.
We *have* to make sure that the address is still valid after actual linking.

i said, i needed a quick hack to get rip relative addressing to work so i
just went with disabling this range check. if there's a better, i'm all
ears ;).

Leave it as-is, the current code does correct things per x86-64 ABI.
Check test/CodeGen/X86/codemodel.ll which tests various different
aspects of small vs kernel code models.
You can translate the IR to C code for better readability, but
basically the code just does something like this:

int *foo;

int bar(void) {
   return foo[offset];
}

for different values of the offset (small positive, big positive, smal
negative, big negative).

i'm not mixing them up in that i know they're different things, but i do
(want to) mix them in the sense that i'd like clang/llvm to generate PIC
for the kernel code model as well, as this is what gcc does.

Then you should rely on relocation model switch, not to hack on random
stuff around.
Also, note that according to the ABI specification I already shown the
link to, there is no PIC for kernel code model (see the same section
3.5.1).

when i said PIC, i didn't really mean the full-blown userland PIC infrastructure,
but only the use of rip relative addressing which gcc already employs for the
kernel code model as well. and to get this to work i needed a hack but if you
know where to do it properly, i'm all ears ;).

speaking of the ABI though, on the small model it says:

    This allows the compiler to encode symbolic references with offsets in
    the range from -(2^31) to 2^24 or from 0x80000000 to 0x01000000 directly
    in the sign extended immediate operands, with offsets in the range from
    0 to 2^31-2^24 or from 0x00000000 to 0x7f000000 in the zero extended
    immediate operands [...]

as you may note, the range is different for offsets that get zero extended
vs. those that get sign extended, and isOffsetSuitableForCodeModel checks
only the sign extended range (i assume it's only called for insns that do
sign extension). now if you look at the kernel code model, you'll find this:

     This code model has advantages similar to those of the small model, but
     allows encoding of zero extended symbolic references only for offsets from
     2^31 to 2^31 + 2^24 or from 0x80000000 to 0x81000000. The range offsets
     for sign extended reference changes to 0 to 2^31 + 2^24 or 0x00000000 to
     0x81000000.

once again, there're two different ranges allowed for the offset yet llvm
allows neither fully, not even the sign extended case.

once again, there're two different ranges allowed for the offset yet llvm
allows neither fully, not even the sign extended case.

Assembler treats imm field as signed int. Thus we check only sext case.
Note that both cases are handled correctly, because (in kernel code
model) too large positive offsets cannot be represented with 32-bit
signed offset (the range is -2^31 ... 2^31-1) .

In any case, currently negative offsets are properly excluded for
kernel code mode and big positive ones - for small code mode.

> once again, there're two different ranges allowed for the offset yet llvm
> allows neither fully, not even the sign extended case.
Assembler treats imm field as signed int. Thus we check only sext case.

it depends on the insn, some sign extend, some zero extend so the callers
of this function had better be for sign extending insns ;).

Note that both cases are handled correctly, because (in kernel code
model) too large positive offsets cannot be represented with 32-bit
signed offset (the range is -2^31 ... 2^31-1) .

i don't understand what you mean by 'too large positive offsets' in that
the offset range that a 32 bit signed int can represent is the same regardless
of the code model (which is an artificial choice made by a given ABI)...

in any case, the ABI does allow a certain negative range for the kernel
code model that llvm doesn't, this omission may very well be the source of
my problem in fact, i'll check it out next time i boot my amd64 env.

In any case, currently negative offsets are properly excluded for
kernel code mode [...]

if you want to follow the ABI then it's not correct to exclude all
of them.

if you want to follow the ABI then it's not correct to exclude all
of them.

As you cited the ABI document by yourself:
"The range offsets for sign extended reference changes to 0 to
  2^31 + 2^24 or 0x00000000 to 0x81000000."

No negative offsets here, sorry.

0x80000000-0x81000000 is negative when treated as a 32 bit signed
value and isOffsetSuitableForCodeModel checks for >0 only. this is
clear from the language used for the small code model as well:

    This allows the compiler to encode symbolic references with offsets in
    the range from -(2^31) to 2^24 or from 0x80000000 to 0x01000000 directly
    in the sign extended immediate operands,

if 0x80000000 was supposed to be treated as an unsigned value there
then the current check against <16MB would be wrong too.

I had hoped r126720 would also fix the problem I've reported in PR 9352,
but unfortunately it did not. But it seems to be an issue of the same
type, maybe this is just as easy to fix? :slight_smile:

Completely different layer.

-Eli